KLASIFIKASI SITUASI BENCANA ALAM BANJIR MENGGUNAKAN SUPPORT VECTOR MACHINE BERDASARKAN DATA TWITTER
Abstract
Bencana alam banjir sering terjadi di Indonesia karena banyaknya pulau dan iklim tropisnya. Setiap harinya tweet masyarakat mengenai banjir di Twitter bertambah banyak dan dapat mencapai ribuan hanya dalam beberapa hari saja. Tujuan dari penelitian ini adalah membuat suatu model klasifikasi untuk melakukan klasifikasi situasi bencana alam banjir berdasarkan data Twitter. Penelitian ini menggunakan algoritma Support Vector Machine (SVM) yang merupakan salah satu metode dari data mining untuk melakukan klasifikasi. Tahapan yang dilakukan dalam penelitian ini, diantaranya, pengumpulan data, pelabelan manual membagi data ke dalam tiga jenis situasi yaitu ringan, sedang, dan berat. Pembobotan menggunakan TF-IDF dan dilakukan proses training untuk menghasilkan sebuah model. Hasil pengujian model dengan confusion matrix dan K-fold cross validation menghasilkan nilai akurasi sebesar 90,61% dan nilai F1-score sebesar 90,64%. Hasil klasifikasi tweet terkait data banjir menunjukkan bahwa sebanyak 67,40% tweet masuk ke dalam kategori ringan, 19,79% tweet kategori sedang, dan 12,81% tweet kategori berat.
Kata kunci : Banjir, Data Mining, Klasifikasi, SVM , Twitter
Due to its numerous islands and warm environment, Indonesia frequently experiences flood natural disasters. Tweets about floods on Twitter grow every day and can reach thousands in a matter of days. This study's objective is to develop a classification model for categorizing flood-related natural catastrophe events using data from Twitter. The Support Vector Machine (SVM) algorithm, a data mining technique for categorizing, is used in this work. Data gathering, manual labeling, and segmenting the data into three categories—mild, moderate, and severe—were all steps taken in this study. A training process is carried out to create a model before weighting is applied. The accuracy value and F1-score obtained by evaluating the model using the confusion matrix and K-fold cross-validation are 90.61% and 90.64%, respectively. 67.40% of the tweets classified as having flood-related data fell into the light category, followed by 19.79% of tweets classified as medium tweets, and 12.81% of tweets classified as heavy tweets.
Keywords: Classification, Data Mining, Flood, SVM, Twitter
Full Text:
PDFReferences
Abdelkader, H. E., Gad, A. G., Abohany, A. A., & Sorour, S. E. (2022). An Efficient Data Mining Technique for Assessing Satisfaction Level With Online Learning for Higher Education Students during the COVID-19. IEEE Access, 10, 6286–6303. https://doi.org/10.1109/ACCESS.2022.3143035
Akbar, M. R., Slamet, I., & Handajani, S. S. (2020). Sentiment analysis using tweets data from Twitter of Indonesian’s Capital City changes using classification method support vector machine. International Conference on Science and Applied Science (ICSAS2020), 2296, 020041. https://doi.org/10.1063/5.0030357
BNPB. (2021). Info Bencana: Data dan Informasi Kebencanaan Bulanan Teraktual, Edisi April 2021 (April 2021). Badan Nasional Penanggulangan Bencana.
Chen, M., Yao, C., Li, X., & Shen, L. (2023). A Text Classification Model Based on Gaussian Multi-Head Self Attention Mechanism for Chinese Medical Data. 2023 IEEE 2nd International Conference on Electrical Engineering, Big Data and Algorithms, EEBDA 2023, 229–232. https://doi.org/10.1109/EEBDA56825.2023.10090693
Febriansyah, A., Ramadhan, A., Gustiawan, M., R, M. R., Maulana, R., Y, R. J., G.E, R., & Firmansyah, R. (2020). Penerapan Machine Learning Dalam Mitigasi Banjir Menggunakan Data Mining. Jurnal Nasional Komputasi Dan Teknologi Informasi (JNKTI), 3(3), 215–218. https://doi.org/10.32672/JNKTI.V3I3.2427
Habibi, M., & Kusumaningtyas, K. (2023). Customer Experience Analysis Skincare Products Through Social Media Data Using Topic Modeling and Sentiment Analysis. JOURNAL OF SCIENCE AND APPLIED ENGINEERING, 6(1), 1–9. https://doi.org/10.31328/JSAE.V6I1.4169
Habibi, M., & Winarko, E. (2017). Analisis Sentimen dan Klasifikasi Komentar Mahasiswa pada Sistem Evaluasi Pembelajaran Menggunakan Kombinasi KNN Berbasis Cosine Similarity dan Supervised Model. In Departemen Ilmu Komputer dan Elektronika, Fakultas Matematika dan Ilmu Pengetahuan Alam. Universitas Gadjah Mada.
Hasanah, M. A., Soim, S., & Handayani, A. S. (2021). Implementasi CRISP-DM Model Menggunakan Metode Decision Tree dengan Algoritma CART untuk Prediksi Curah Hujan Berpotensi Banjir. Journal of Applied Informatics and Computing, 5(2), 103–108. https://doi.org/10.30871/JAIC.V5I2.3200
Hickman, L., Thapa, S., Tay, L., Cao, M., & Srinivasan, P. (2020). Text Preprocessing for Text Mining in Organizational Research: Review and Recommendations. Organizational Research Methods, October, 1–33. https://doi.org/10.1177/1094428120971683
Lin, Y. (2021, January 25). 10 Twitter Statistics Every Marketer Should Know in 2021 [Infographic]. Oberlo. https://www.oberlo.com/blog/twitter-statistics
Maarif, M. R., Saleh, A. R., Habibi, M., Fitriyani, N. L., & Syafrudin, M. (2023). Energy Usage Forecasting Model Based on Long Short-Term Memory (LSTM) and eXplainable Artificial Intelligence (XAI). Information 2023, Vol. 14, Page 265, 14(5), 265. https://doi.org/10.3390/INFO14050265
Mendrofa, Y. (2019). Implementasi Algortima C4.5 Untuk Memprediksi Tingkat Kerusakan Akibat Banjir (Studi Kasus : BPBD Prov.Sumut). Jurnal Pelita Informatika, 7(4), 584–592.
Mubaroq, I. M., & Setiawan, E. B. (2020). The Effect of Information Gain Feature Selection for Hoax Identification in Twitter Using Classification Method Support Vector. 5(2), 107–118. https://doi.org/10.21108/indojc.2020.5.2.499
Nasron, U. A., & Habibi, M. (2020). Analysis of Marketplace Conversation Trends on Twitter Platform Using K-Means. Compiler, 9(1), 51–61. https://doi.org/10.28989/compiler.v9i1.579
Pisner, D. A., & Schnyer, D. M. (2020). Support vector machine. In A. Mechelli & S. Vieira (Eds.), MACHINE LEARNING Methods and Applications to Brain Disorders (pp. 101–121). Elsevier. https://doi.org/10.1016/B978-0-12-815739-8.00006-7
Rahman, M. F., Alamsah, D., Darmawidjadja, M. I., & Nurma, I. (2017). Klasifikasi Untuk Diagnosa Diabetes Menggunakan Metode Bayesian Regularization Neural Network (RBNN). Jurnal Informatika, 11(1), 36. https://doi.org/10.26555/jifo.v11i1.a5452
Rahmawati, S., & Habibi, M. (2020). Public Sentiments Analysis about Indonesian Social Insurance Administration Organization on Twitter. IJID (International Journal on Informatics for Development), 9(2), 87–93. https://doi.org/10.14421/IJID.2020.09205
Roy, J., & Saha, S. (2022). Ensemble hybrid machine learning methods for gully erosion susceptibility mapping: K-fold cross validation approach. Artificial Intelligence in Geosciences, 3, 28–45. https://doi.org/10.1016/J.AIIG.2022.07.001
Schnyer, D. M., Clasen, P. C., Gonzalez, C., & Beevers, C. G. (2017). Evaluating the diagnostic utility of applying a machine learning algorithm to diffusion tensor MRI measures in individuals with major depressive disorder. Psychiatry Research: Neuroimaging, 264, 1–9. https://doi.org/10.1016/j.pscychresns.2017.03.003
Syamsurizal, Cumel, Zamri, D., & Rahmaddeni. (2022). Perbandingan Metode Data Mining untuk Prediksi Banjir Dengan Algoritma Naïve Bayes dan KNN. SENTIMAS: Seminar Nasional Penelitian Dan Pengabdian Masyarakat, 40–48. https://journal.irpi.or.id/index.php/sentimas/article/view/353
Wibawa, A. P., Guntur, M. P. A., Akbar, M. F., & Dwiyanto, F. A. (2018). Metode-metode Klasifikasi. 3(1), 134–138.
Widiastuti, W. (2017). Penggunaan Twitter untuk Mendeteksi Banjir Melalui Pendekatan Text Mining dan Evaluasinya. SEMINAR NASIONAL STATISTIKA FMIPA UNPAD 2017 (SNS VI), 47–54.
Yan, T., Shen, S. L., Zhou, A., & Chen, X. (2022). Prediction of geological characteristics from shield operational parameters by integrating grid search and K-fold cross validation into stacking classification algorithm. Journal of Rock Mechanics and Geotechnical Engineering, 14(4), 1292–1303. https://doi.org/10.1016/J.JRMGE.2022.03.002
Zhou, F., Zhang, Z., & Chen, D. (2021). Real-time fault diagnosis using deep fusion of features extracted by PeLSTM and CNN. Fault Diagnosis and Prognosis Techniques for Complex Engineering Systems, 353–401. https://doi.org/10.1016/B978-0-12-822473-1.00003-3
DOI: http://dx.doi.org/10.36723/juri.v15i1.333
Refbacks
- There are currently no refbacks.
Ciptaan disebarluaskan di bawah Lisensi Creative Commons Atribusi-NonKomersial 4.0 Internasional.
View My Stats