Evaluation of the Capabilities of Integration of Rough Sets and Genetic Algorithms for Data Mining and Rule Extraction in Domestic Water Usage in Tehran City

Document Type : علمی - پژوهشی

Authors

1 M.Sc. of GIS & Remote Sensing, Tarbiat Modares University, Tehran

2 Associate Prof., Department of GIS Engineering, Faculty of Geodesy & Geomatic Engineering, K.N.Toosi University of Tech, Tehran

3 Assistant Prof., Department of GIS & RS, Tarbiat Modares University, Tehran

Abstract

Management and planning of urban water supply in metropolis is very important. Development of the region urban and make cities to metropolis and increase of effective complex factor on water usage in the cities make consumption management and supply and distribution Water difficult. So rule extraction plays an important role in exploring patterns over data and decreasing complex. Rough Set Algorithm, which was developed in 1980s by Pawlak, is a powerful and flexible method to deal with uncertain and ambiguous data which has been used in this research to extract dominant rules over data set. The method used in this paper is combination of the rough set and genetic algorithms from data mining methods to develop rule extraction and data classification of water usage in Tehran city as the studying area. Socio-economic, environmental, time and water consumption and management zones have been used as the explanatory variables for prediction of the water use that database divided to 2 part 60% for result extraction and 40% as test set. Independent test sets have been used for evaluation of the accuracy of the extracted rules. Results have shown that, combination of the genetic algorithms and Rough Set leads to extraction of more reliable rules. Classification accuracy of the extracted rules from Rough sets was 77 percent. But optimization of rules by combination of the genetic algorithm with Rough sets, resulted in classification accuracy of 88 percent in 6th generation with average speed of convergence. By using the same speed of convergence in the accuracy increased to 92 percent in 10th generation. According to the extracted rules, important effective factors on annual water consumption are respectively the resident population, water price, population density, family size, spatial location (latitude), education levels, and per capita green spaces.

Keywords


  1. ببران، ص.، 1387، بحران وضعیت آب در جهان و ایران، راهبرد، دورة 16، ش 48، صص. 212- 193.
  2. حسین‌زاده، ل.، 1386، دسته‌بندی مشتریان هدف در صنعت بیمه با استفاده از داده‌کاوی، پایان‌نامة کارشناسی ارشد مدیریت فناوری اطلاعات، دانشگاه تربیت مدرس.
  3. صالحی، م.، باوی، ا.، 1389، الگوریتم‌های ژنتیک و بهینه‌سازی سازه‌های مرکب، تهران، انتشارات عابد.
  4. طالقانی، م.، 1385، اطلس کامل تهران 85، مؤسسة جغرافیایی و کارتوگرافی گیتاشناسی.
  5. Cabena, P., Hadjinian, P., Stadler, R., Verhees, J & Zanasi, A., 1998, Discovering Data Mining: From Concept to Implementation, Prentice Hall, Upper Saddle River, NJ.
  6. Dempster, A.P., 1967, Upper and Lower Probabilities Induced by a Multivalued Mapping, Annals of Mathematical Statistics, 38, PP. 325-339.
  7. Dubois, D. & Prade, H., 1990, Rough Fuzzy Sets and Fuzzy Rough Sets, International Journal of General Systems, 17, PP. 191-209.
  8. Dubois, D. & Prade, H., 1992, Putting Rough Sets and Fuzzy Sets Together, in R. Slowinski, ed. Intelligent Decision Support - Handbook of Applications and Advances of the Rough Sets Theory, Dordrecht: Kluwer Academic Publishers.
  9. Gorsevski, P.V., Jankowski, P., 2008, Discerning Landslide Susceptibility Using Rough Sets, Computers, Environment and Urban Systems, 32, PP. 53-65.
  10. Hsue Cheng, C., 2010, A Hybrid Model Based on Rough Sets Theory and Genetic Algorithms for Stock Price Forecasting, Elsevier Information Sciences, 180, P. 20.
  11. Larose, D., 2005, Discovering Knowledge (In Data An Introduction to Data Mining), Ebook.
  12. McKee, T. & Lensberg, T., 2002, Genetic Programming and Rough Sets: A Hybrid Approach to Bankruptcy Classification, Elsevier/Information Technology, 138, P. 16.
  13. Mortazavi, M., 2009, Comparison of Multi-Objective Genetic Algorithm with Ant Colony Optimization: A Case Study for Canberra Water Supply System, 33rd IAHR Congress: Water Engineering for a Sustainable Environment.
  14. Pawlak, Z., 1982, Rough Sets, International Journal of Computer and information Science, 11(5), PP. 341-356.
  15. Pheng Khoo, L., 2002, Feature Extraction Using Rough Set Theory and Genetic Algorithms an Application for the Simplification of Product Quality Evaluation, Elsevier/Computers & Industrial Engineering, 43, P. 16.
  16. Pheng Khoo, L. & Yin Zhai, L., 2001, A Prototype Genetic Algorithm Enhanced Rough Set Based Rule Induction System, Elsevier/Computers & Industrial Engineering, 46, P. 12.
  17. Santana, L., 2010, DEMORS: A Hybrid Multi-Objective Optimization Algorithm Using Differential Evolution and Rough Set Theory for Constrained Problems, Elsevier/ Computers & Operations Research, 37, P. 11.
  18. Sheng, 2008, Modular Feature Selection Using Relative Importance Factors, Department of Electrical and Computer Engineering National University of Singapore, 13.
  19. Slowinski, R. & Stefanowski, J., 1989, Rough Classification in Incomplete Information Systems, Mathematical & Computer Modeling, 12.
  20. Thangavel, 2009, Dimensionality Reduction Based on Rough Set Theory: A Review, Elsevier/Applied Soft Computing, 9, P. 12.
  21. Triantaphyllou, E. & Felici, G., 2006, Data Mining & Knowledge Discovery Approaches Based on Rule Induction Techniques, Springer.
  22. Yau Liang, W. & Che Huang, C., 2009, The Generic Genetic Algorithm Incorporates with Rough Set Theory – An Application of the Web Services Composition, Elsevier/Expert Systems with Application, 36, P. 8.
  23. Wygralak, W., 1989, Rough Sets and Fuzzy Sets - Some Remarks on Interrelations, Fuzzy Sets and Systems, 29, PP. 241-243.
  24. Yin Zhai, L., Pheng Khoo, L., Cheong Fok, S., 2006, Knowledge Acquisition and Uncertainty in Fault Diagnosis: A Rough Sets Perspective, Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques, Springer, Heidelberg, PP. 359-394