arxiv:2503.00673

Towards Refining Developer Questions using LLM-Based Named Entity Recognition for Developer Chatroom Conversations

Published on Mar 2

Authors:

Hao Li ,

Abstract

A labeling approach using a Large Language Model for Named Entity Recognition, Intent Detection, and Resolution Classification in software engineering chatrooms improves question clarity and resolvability.

AI-generated summary

In software engineering chatrooms, communication is often hindered by imprecise questions that cannot be answered. Recognizing key entities can be essential for improving question clarity and facilitating better exchange. However, existing research using natural language processing techniques often overlooks these software-specific nuances. In this paper, we introduce Software-specific Named Entity Recognition, Intent Detection, and Resolution Classification (SENIR), a labeling approach that leverages a Large Language Model to annotate entities, intents, and resolution status in developer chatroom conversations. To offer quantitative guidance for improving question clarity and resolvability, we build a resolution prediction model that leverages SENIR's entity and intent labels along with additional predictive features. We evaluate SENIR on the DISCO dataset using a subset of annotated chatroom dialogues. SENIR achieves an 86% F-score for entity recognition, a 71% F-score for intent detection, and an 89% F-score for resolution status classification. Furthermore, our resolution prediction model, tested with various sampling strategies (random undersampling and oversampling with SMOTE) and evaluation methods (5-fold cross-validation, 10-fold cross-validation, and bootstrapping), demonstrates AUC values ranging from 0.7 to 0.8. Key factors influencing resolution include positive sentiment and entities such as Programming Language and User Variable across multiple intents, while diagnostic entities are more relevant in error-related questions. Moreover, resolution rates vary significantly by intent: questions about API Usage and API Change achieve higher resolution rates, whereas Discrepancy and Review have lower resolution rates. A Chi-Square analysis confirms the statistical significance of these differences.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.00673 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.00673 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.00673 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.