Deteksi Hate Speech Unsur Sara Pada Komentar Media Sosial Menggunakan Pendekatan Two-Stage Classification Dengan Algoritma Indobert Dan Support Vector Machine

Ovy Marsya Zieera; Monica Cinthya

doi:10.70134/identik.v3i4.1698

Authors

Ovy Marsya Zieera Universitas Negeri Surabaya Author
Monica Cinthya Universitas Negeri Surabaya Author

DOI:

https://doi.org/10.70134/identik.v3i4.1698

Keywords:

Hate Speech, SARA, IndoBERT, Support Vector Machine, LLM Ensemble Voting

Abstract

The rapid development of social media in Indonesia has increased public interaction on platforms such as YouTube, Instagram, and TikTok. However, this has also driven the proliferation of hate speech, particularly content containing elements of (SARA). This study proposes a Two-Stage Classification approach to address this challenge. In the first stage, the IndoBERT model (indobenchmark/indobert-base-p1) is fine-tuned to classify comments into Hate Speech and Non-Hate Speech. In the second stage, Support Vector Machine (SVM) with TF-IDF feature extraction and a custom SARA lexicon is used to further classify hate speech comments into SARA-based hate speech (HS_SARA) and general hate speech (HS_Umum). The dataset consists of 36,000 comments scraped from YouTube, Instagram, and TikTok on viral SARA-related topics. Data labeling was conducted using LLM Ensemble Voting involving three AI models followed by validation by three human annotators. The results show that IndoBERT in Stage 1 achieved an accuracy of 82.56% on the test set. In Stage 2, the SVM model achieved an accuracy of 95.07%, precision of 95.31%, recall of 95.07%, and F1-score of 95.07%, with cross-validation confirming stability at a mean accuracy of 96.74% (std = 0.19%). These findings demonstrate that the Two-Stage Classification approach effectively improves the specificity of hate speech detection by separating tasks in a sequential manner.

Downloads

Download data is not yet available.

Deteksi Hate Speech Unsur Sara Pada Komentar Media Sosial Menggunakan Pendekatan Two-Stage Classification Dengan Algoritma Indobert Dan Support Vector Machine

Authors

DOI:

Keywords:

Abstract

Downloads

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Make a Submission

Verified Loa

Main Menu

chat us

journal template

recomended tool

Keywords

visitors

Information