ConfliBERT: Conflict Based Language Models and Military Exercise Data

Presenters/Authors(s)

Spencer Perkins, Vito D'Orazio

Mentor

Dr. Vito D'Orazio

Abstract or Description

Manual analysis and classification of extensive text data is time consuming and inefficient, and can significantly slow progress of research. Large Language Models (LLMs) can aid in classification tasks of large corpora. LLMs are machine learning models specializing in text analysis, and can perform a number of text classification tasks, including Named Entity Recognition, Question Answering, and Binary Classification. This project focuses on the ConfliBERT LLM, a model based on BERT architecture trained on political conflict data. Therefore it should excel at text analysis of documents within its domain–in this case, Binary Classification of Multinational Military Exercise (MME) data. MMEs are large scale military training exercises conducted between multiple countries, and can serve as insight into nations’ relationships. My research focuses on an MME database containing MMEs from 1980 to 2010, which I aim to expand using a downloaded collection of 1.5 million potential MME sources from 2011 to 2021. It would be inefficient to sort through this data manually–this is where ConfliBERT is applied. I test the effectiveness of ConfliBERT at classification of MME data against that of the base BERT model as well as a tuned Support Vector Machine (SVM) model, to demonstrate ConfliBERT’s efficiency at this domain-specific task. Results display ConfliBERT’s superior performance to BERT and SVM, showing ConfliBERT’s usefulness as a Binary Classification tool for political conflict related data.

Presenters/Authors(s)

Category

Mentor

Abstract or Description

Comments