Introduction of MICE Method for Imputation Missing Meteorological Data and Comparison by Regression; Case Study: 130 Years of Monthly Temperature in Mashhad, Jask and Bushehr

Document Type : Original Article/Regular article

Authors

1 Ph.D. in Agricultural Meteorology, Ferdowsi University of Mashhad, Faculty of Agriculture, Department of Water Engineering, Iran

2 MSc in Hydrology, Islamic Azad University, Mashhad Branch, Iran

Abstract

Requiring accurate, complete and reliable data is the first step in climate studies. Incomplete data challenges climate analysis. Missing (incomplete) data is often found in meteorology. Therefore, completing the data (imputation) is the primary need for analysis. There are several ways to imputation missing data that vary depending on the data type and climatic characteristics of each region. Precipitation and temperature are the most important variables of meteorology and climatology. The length of the statistical period plays a pivotal role in the accurate analysis of these variables. The monthly temperature of three cities in Iran, including Mashhad, Bushehr and Jask, has been available in a book called World Weather Records since about 1890. This information contains missing data, especially during World War II (1941-1949). This missing data is more visible.  The purpose of this study is to increase the accuracy of estimating these missing data by introducing the applied MICE method and providing a complete series of monthly temperatures over 130 years. Stations from neighboring countries were selected as independent (predictor) stations in the patterns. First, the missing monthly temperature data of these three stations were estimated by fitting regression patterns (RMSE of 0.71 to 0.94 οC). The classical regression method requires the study of basic hypotheses and pattern pathology. These patterns were also estimated by the MICE method (RMSE of 0.39 to 0.82 οC). The results of the study and implementation of this package in Rstudio show the superiority of this method. This method is designed for missing data, does not have regression problems, and has many capabilities. Therefore, it is recommended to estimate missing meteorological data.

Keywords


ارقامی، ن.ر.، سنجری، ن. و بزرگ‌نیا، الف. 1380. مقدمه‌ای بر بررسی‌های نمونه‌ای. چاپ چهارم، انتشارات دانشگاه فردوسی مشهد.
خلیلی، ع. و بذرافشان، ج. 1387. ارزیابی مخاطره تداوم خشک‌سالی با استفاده از داده‌های بارندگی سالانه قرن گذشته در ایستگاه‌های قدیمی ایران. مجله ژئوفیزیک ایران، 2(2): 13- 23.
رضایی‌ پژند، ح. و بزرگ نیا. الف. 1381. تحلیل رگرسیون غیرخطی و کاربردهای آن. انتشارات دانشگاه فردوسی مشهد. 
فرزندی، م.، رضایی پژند, ح. و ثنائی نژاد، ح. 1393. ترمیم و گسترش 127 سال دمای ماهانه مشهد. مجله پژوهش‌های اقلیم‌شناسی، 5(17): 111- 123. 

Deng Y, Chang C, Ido MS, Long Q. 2016, Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data. Sci Rep. 6:21689 .
Edmond F.S., Victor A.K. and Khalid M. 1973. Floods and droughts, Water Resources Publications. Proceedings of the Second International Symposium in Hydrology, 679 pages.
Ghahraman B. and Ahmadi F. 2007. Applica tion of geo statistics in time series: Mashhad Annual Rainfall. Iran-Watershed Management Science & Engineering, 1(1):7-15.  
Iqbal M., Wen J., Wang Sh., Tian Hu. and Adnan M. 2018. Variations of precipitation characteristics during the period 1960-2014 in the Source Region of the Yellow River, China. Journal of Arid Land, 10(3): 388-401.  
Jacob D., Reed D.W. and Robson A.J. 1999. Choosing a pooling group. Flood Estimation Handbook. Vol. 3. Institute of Hydrology, Wallingford, UK.
Little R.JA. and Rubin D.B. 2002. Statistical analysis with missing data. John Wiley & Sons.‏
Melissa J. A, Elizabeth A. S., Constantine F., and Philip J. L., 2011, Multiple imputation by chained equations: what is it and how does it work?, Int J Methods Psychiatr Res. 20(1): 40–49.
Porto de Carvalho  J.R, Boffinho Almeida Monteir, J.E., Nakai, A.M., Assad E.D., 2017, Model for Multiple Imputation to Estimate Daily Rainfall Data and Filling of Faults. Revista Brasileira de Meteorologia, 32(4): 575-583.
Ranhao, S., Baiping, Z., and Jing, T., 2008, A Multivariate Regression Model for Predicting Precipitation in the Daqing Mountains, Mountain Research and Development, 28(3):318-325.
Scheffer J. 2002. Dealing with missing data. Research Letters in the Information and Mathematical Sciences, 3:153-160. 
Smithsonian Institution. 1934. World weather records, 1921-1930, Smithson. Miss C. Collect. pp 639.
Smithsonian Institution. 1947. World weather records, 1931-1940, Smithson. Miss C. Collect. pp 666
Smithsonian Institution. 1927. World weather records, 1750-1920, Smithson. Miss C. Collect. pp 1199.
Van Buuren S. 2018. Flexible Imputation of Missing Data. 2nd. Chapman & Hall/CRC Interdisciplinary Statistics.
Van Buuren S. and Groothuis-Oudshoorn K. 2011. mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45( 3): 1-67.
Yozgatligil C., Aslan S., Iyigun C. and Batmaz I. 2013. Comparison of missing value imputation methods in time series: the case of Turkish meteorological data. Theory Apply Climatology, 112: 143–167.
CAPTCHA Image