aTA (artificial TA): A Closed-Domain Long-Form Question-Answering Chatbot
Daniil Arushanov, Cesar Ramirez, Artsiom Skarakhod, Kaitlyn Nugent
Farnoush Banaei-Kashani
Recently, with the release of its sparse-attention Routing Transformer (RT), Google has achieved state-of-the-art results on the natural language processing (NLP) task of open-domain long-form question answering (LFQA) [1]. While demonstrating dominating performance in open-domain LFQA (i.e., in automatically generating long and coherent answers to user questions in an any/open knowledge domain), because of its generic and open-domain nature, this chatbot often fails to generate accurate and meaningful answers. In this project, we extend this work by introducing a generic data curation workflow along with the curated datasets that allow for domain-specific training of the aforementioned chatbot for closed-domain LFQA. Our goal is to restrict the problem size down to individual domains to improve answer generation quality, yet simultaneously provide an automated framework to train chatbots that are well versed across 165 domains. In particular, as a case study, we use our proposed workflow to generate a chatbot that is specialized to answer questions on the topic of “data science”.
Reference
[1] K. Krishna, A. Roy, and M. Iyyer, “Hurdles to progress in long-form question answering,” In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4940–4957, May 2021.
Enter the password to open this PDF file.
-
-
-
-
-
-
-
-
-
-
-
-
-
-