license: mit
language:
- en
metrics:
- accuracy
- precision
- recall
- f1
pipeline_tag: text-classification
library_name: sklearn
tags:
- finance
- sentiment-analysis
- embeddings
- gradient-boosting
- classical-ml
- market-analysis
- nlp
- weekly-sentiment
π° NLP Stock Sentiment Analysis β Embedding-Based Models for Market Signals
This model card documents the resources and workflow described in the paper:
βNLP Stock Sentiment Analysis: A Comparative Embedding-Based Approachβ
(Preprint submitted to arXiv, 2025)
Author: Joyjit Roy
Zenodo DOI: https://doi.org/10.5281/zenodo.17510735
GitHub: https://github.com/joyjitroy/Machine_Learning/tree/main/NLP_Stock_Sentiment_Analysis
This project provides a complete NLP workflow for classifying the sentiment of headline-level financial news and aggregating weekly sentiment indicators. It serves as a reproducible reference for embedding-based sentiment analysis using classical machine learning models.
π Project Overview
The goal is to transform unstructured financial news into structured sentiment scores that can support market commentary and trend evaluation. Three embedding-based approaches were explored:
1. Word2Vec (300D)
- Trained locally on the news corpus
- Mean pooled per headline
2. GloVe (100D)
- Pretrained vectors
- Mean pooled per headline
3. SentenceTransformer Embeddings (384D)
- Model:
all-MiniLM-L6-v2 - Direct sentence embeddings without fine tuning
For each embedding approach, a Gradient Boosting classifier is trained and evaluated on the same dataset. Weekly sentiment summaries are generated to demonstrate applied use cases for financial analysis.
Note: These models are scikit learn models and are not directly deployed as Hugging Face inference endpoints.
π½ Dataset Summary
The dataset contains 349 financial news headlines, each paired with OHLCV market indicators and a sentiment label:1 = positive, 0 = neutral, -1 = negative.
Dataset is available via Zenodo:
π https://doi.org/10.5281/zenodo.17510735
π Data Dictionary
| Column | Description |
|---|---|
Date |
Date the news item was released |
News |
Headline or snippet text |
Open |
Opening price (USD) |
High |
Highest price (USD) of the day |
Low |
Lowest price (USD) of the day |
Close |
Adjusted closing price (USD) |
Volume |
Total shares traded |
Label |
Sentiment (1=positive, 0=neutral, -1=negative) |
π Model Performance (Validation)
Across all embedding variants, the tuned GloVe + Gradient Boosting model achieved the strongest validation results:
- Accuracy: 0.714
- Precision: 0.758
- Recall: 0.714
- F1 Score: 0.694
- Error Rate: 0.286
Training accuracy is perfect for all models due to dataset size; validation metrics are therefore used for fair comparison. Full results appear in the associated paper (Table 3).
π§ Weekly Sentiment Summaries
To show how sentiment predictions can support market interpretation:
- Daily predictions are aggregated by week
- Sentiment ratios (positive / neutral / negative) are computed
- Weekly summaries are generated using Mistral-7B-Instruct
- These summaries provide narrative insight into market mood
This illustrates a practical workflow where classical NLP models feed into downstream financial analysis.
π Intended Use
This repository is intended for:
- Research on embedding-based sentiment classification
- Educational exploration of Word2Vec, GloVe, and Sentence Transformer embeddings
- Demonstrating weekly sentiment aggregation
- Benchmarking classical ML approaches for small financial text datasets
β οΈ Limitations
- Small dataset (349 samples)
- Potential overfitting of classical models
- Not designed for automated trading or real-time systems
- Weekly summaries rely on LLM outputs and may include stylistic bias
- No direct price prediction or financial forecasting
This model is best used for experimentation and learning.
π Citation
If you use this work, please cite:
Roy, Joyjit. βNLP Stock Sentiment Analysis: A Comparative Embedding-Based Approach.β arXiv preprint, 2025.
Dataset: Joyjit Roy. βNLP Stock Sentiment Analysis β AI for Market Signals.β Zenodo. https://doi.org/10.5281/zenodo.17510735
