Deteksi Hate Speech Unsur Sara Pada Komentar Media Sosial Menggunakan Pendekatan Two-Stage Classification Dengan Algoritma Indobert Dan Support Vector Machine

Authors

  • Ovy Marsya Zieera Universitas Negeri Surabaya Author
  • Monica Cinthya Universitas Negeri Surabaya Author

DOI:

https://doi.org/10.70134/identik.v3i4.1698

Keywords:

Hate Speech, SARA, IndoBERT, Support Vector Machine, LLM Ensemble Voting

Abstract

The rapid development of social media in Indonesia has increased public interaction on platforms such as YouTube, Instagram, and TikTok. However, this has also driven the proliferation of hate speech, particularly content containing elements of (SARA). This study proposes a Two-Stage Classification approach to address this challenge. In the first stage, the IndoBERT model (indobenchmark/indobert-base-p1) is fine-tuned to classify comments into Hate Speech and Non-Hate Speech. In the second stage, Support Vector Machine (SVM) with TF-IDF feature extraction and a custom SARA lexicon is used to further classify hate speech comments into SARA-based hate speech (HS_SARA) and general hate speech (HS_Umum). The dataset consists of 36,000 comments scraped from YouTube, Instagram, and TikTok on viral SARA-related topics. Data labeling was conducted using LLM Ensemble Voting involving three AI models followed by validation by three human annotators. The results show that IndoBERT in Stage 1 achieved an accuracy of 82.56% on the test set. In Stage 2, the SVM model achieved an accuracy of 95.07%, precision of 95.31%, recall of 95.07%, and F1-score of 95.07%, with cross-validation confirming stability at a mean accuracy of 96.74% (std = 0.19%). These findings demonstrate that the Two-Stage Classification approach effectively improves the specificity of hate speech detection by separating tasks in a sequential manner.

Downloads

Download data is not yet available.

Published

2026-06-30

How to Cite

Deteksi Hate Speech Unsur Sara Pada Komentar Media Sosial Menggunakan Pendekatan Two-Stage Classification Dengan Algoritma Indobert Dan Support Vector Machine. (2026). Jurnal Ilmu Ekonomi, Pendidikan Dan Teknik , 3(4), 162-168. https://doi.org/10.70134/identik.v3i4.1698

Similar Articles

21-30 of 72

You may also start an advanced similarity search for this article.