Volume 9, no. 4Pages 86 - 95

Modification of Random Forest Based Approach for Streaming Data with Concept Drift

A.V. Zhukov, D.N. Sidorov
In this paper concept drift classification method was presented. Concept drift methods have potential in complex systems analysis and other processes which have stochastic nature like wind power. We present decision tree ensemble classification method based on the Random Forest algorithm for concept drift. Inspired by Accuracy Weighted Ensemble (AWE) method the weighted majority voting ensemble aggregation rule is employed. Base learner weight in our case is computed for each sample evaluation using base learners accuracy and intrinsic proximity measure of Random Forest. Our algorithm exploits ensemble pruning as a forgetting strategy. We present results of empirical comparison of our method and other state-of-the-art concept drift classifiers.
Full text
Keywords
decision tree; concept drift; ensemble learning; classification; random forest.
References
1. Tomin N., Zhukov A., Sidorov D., Kurbatsky V., Panasetsky D., Spiryaev V. Random Forest Based Model for Preventing Large-Scale Emergencies in Power Systems. International Journal of Artificial Intelligence, 2015, vol. 13, no. 1, pp. 221-228.
2. Breiman L. Random Forests. Machine Learning, 2001, vol. 45, no. 1, pp. 5-32. DOI: 10.1023/A:1010933404324
3. Breiman L. Bagging Predictors. Machine Learning. 1996, vol. 24, no. 2, pp. 123-140. DOI: 10.1023/A:1018054314350.
4. Ho Tin Kam. The Random Subspace Method for Constructing Decision Forests. Pattern Analysis and Machine Intelligence, IEEE Transactions, 1998, vol. 20, no. 8, pp. 832-844. DOI: 10.1109/34.709601
5. Zliobaite Indre. Learning under Concept Drift: an Overview. arXiv preprint arXiv:1010.4784. 2010.
6. Haixun Wang, Wei Fan, Yu P.S., Han J. Mining Concept-Drifting Data Streams Using Ensemble Classifiers. Proceedings of SIGKDD, August 24-27, 2003, Washington, DC, 2003, pp. 226-235.
7. Gama J. Knowledge Discovery from Data Streams. Singapore, CRC Press Publ., 2010. DOI: 10.1201/EBK1439826119
8. Kuncheva L. Classifier Ensembles for Changing Environment. Multiple Classifier Systems, 2004 5th Intl. Workshop, Springer-Verlag, 2004, pp. 1-15. DOI: 10.1007/978-3-540-25966-4_1
9. Haixun Wang, Wei Fan, Yu P.S., Han J. Mining Concept-Drifting Data Streams Using Ensemble Classifiers. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2003, pp. 226-235. DOI: 10.1145/956750.956778
10. Aiping Wang, Guowei Wan, Zhiquan Cheng, Sikun Li. An Incremental Extremely Random Forest Classifier for Online Learning and Tracking. Image Processing (ICIP), 2009 16th IEEE International Conference. IEEE, 2009, pp. 1449-1452.
11. Geurts P., Ernst D., Wehenkel L. Extremely Randomized Trees. Machine Learning, 2006, vol. 63, no. 1, pp. 3-42. DOI: 10.1007/s10994-006-6226-1
12. Santner J., Saffari A., Leistneret C. et al. On-Line Random Forests. Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference. IEEE, 2009, pp. 1393-1400.
13. Oza N.C. Online Bagging and Boosting. Systems, Man and Cybernetics, 2005 IEEE International Conference. IEEE, vol. 3, 2005, pp. 2340-2345. DOI: 10.1109/icsmc.2005.1571498
14. Abdulsalam H., Skillicorn D.B., Martin P. Classification Using Streaming Random Forests. Knowledge and Data Engineering, IEEE Transactions. 2011, vol. 23, no. 1, pp. 22-36.
15. Lakshminarayanan B., Roy D.M., Teh Yee Whye. Mondrian Forests: Efficient Online Random Forests. Advances in Neural Information Processing Systems, 2014, pp. 3140-3148.
16. Kelly M.G., Hand D.J., Adams N.M. The Impact of Changing Populations on Classifier Performance. Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 1999, pp. 367-371. DOI: 10.1145/312129.312285
17. Zhukov A., Kurbatsky V., Tomin N. et al. Random Forest Based Model for Emergency State Monitoring in Power Systems. Mathematical Method for Pattern Recognition: Book of Abstract of the 17th All-Russian Conference with Interneational Participation. Svetlogorsk, TORUS PRESS, 2015, pp. 274.
18. Scornet E. Random Forests and Kernel Methods. IEEE Transactions on Information Theory, 2016, vol. 62, no. 3, pp. 1485-1500. DOI: 10.1109/TIT.2016.2514489
19. Blake C.L., Merz C.J. UCI Repository of Machine Learning Databases. 1998.
20. Brzezinski D. Mining Data Streams with Concept Drift. Diss. MS Thesis. Dept. of Computing Science and Management. Poznan University of Technology, 2010.
21. Brzezinski D., Stefanowski J. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. Neural Networks and Learning Systems, IEEE Transactions, 2014, vol. 25, no. 1, pp. 81-94.