briefai
/

LongShort-Dolly-2-7B

@@ -13,12 +13,18 @@ tags:
 - KPI Extraction
 ---
 # LongShort-Dolly-2-7B
-- Model creator: [Brief AI](https://huggingface.co/briefai)
-- Original model: [Llama 2 7B Chat](https://huggingface.co/databricks/dolly-v2-7b)
 ### Model Description
-This model leverages Dolly-2-7B architecture to extract financial KPIs from the earnings call documents.
 ## Prompt template: LongShort-Dolly-2-7B
@@ -35,16 +41,207 @@ Extract all the finance-based performance indicators and evaluation metrics.
 [/INST]
 ```
-## Uses
-Financial KPI Extraction
-## Evaluation Results
-Accuracy of the model = KPIs Found / ((KPIs Found) + (KPIs Not Found))
-LongShort-Dolly-2-7B gives 30.7% accuracy on a validation set of 10% of the original training dataset.
-## Thanks

 - KPI Extraction
 ---
 # LongShort-Dolly-2-7B
 ### Model Description
+LongShort-Dolly-2-7B is a large language model fine-tuned on earnings call documents to extract financial KPIs from the earnings call documents. It is based on the Dolly-2-7B Architecture.
+- Model creator: [Brief AI](https://huggingface.co/briefai)
+- Original model: [Dolly-2-7B](https://huggingface.co/databricks/dolly-v2-7b)
+### Dataset Description
+- Data Source: Factiva
+- Data Description: 28K+ Earnings Call Documents
+- Data Scope: 1K+ public companies
+- Fine Tuning Data: Collection of 60K+ samples.
 ## Prompt template: LongShort-Dolly-2-7B
 [/INST]
 ```
+## Basics
+*This section provides information about the model type, version, license, funders, release date, developers, and contact information.*
+*It is useful for anyone who wants to reference the model.*
+**Developed by:**  [Brief AI Team](https://huggingface.co/briefai)
+**Model Type:** Transformer-based Large Language Model
+**Version:** 1.0.0
+**Languages:** English
+**License:** Apache 2.0
+**Release Date Estimate:** Wednesday, 29.November.2023
+**Send Questions to:** [email protected]
+**Cite as:** Brief AI LongShort Language Model
+**Funded by:**  UChicago Data Science Institute
+**Mentored by:**  Nick Kadochnikov
+## Technical Specifications
+*This section includes details about the model objective and architecture, and the compute infrastructure.*
+*It is useful for people interested in model development.*
+Please see [the LongShort training README](https://github.com/brief-ai-uchicago/LongShort-Dataset) for full details on replicating training.
+### Model Architecture and Objective
+* Modified from Dolly-2-7B
+**Objective:** Financial KPI extraction from earnings call documents.
+### Hardware and Software - Compute Infrastructure
+* 4 NVIDIA L4 GPUs & 48 vCPUs
+* Environment: PyTorch (pytorch-2.0 w/ CUDA-11.8; see [Github link](https://github.com/pytorch/pytorch))
+* CPU: GCP G2 Standard 48 (Platform: Intel Cascade Lake) (Accelerator Optimized)
+* CPU memory: 192GB RAM
+* GPU memory: 30GB per GPU
+## Training
+*This section provides information about the training.*
+*It is useful for people who want to learn more about the model inputs and training footprint.*
+The following bits and bytes quantization config was used during training:
+* quant_method: bitsandbytes
+* load_in_8bit: False
+* load_in_4bit: True
+* llm_int8_threshold: 6.0
+* llm_int8_skip_modules: None
+* llm_int8_enable_fp32_cpu_offload: False
+* llm_int8_has_fp16_weight: False
+* bnb_4bit_quant_type: nf4
+* bnb_4bit_use_double_quant: True
+* bnb_4bit_compute_dtype: float16
+Framework versions
+* PEFT 0.4.0
+### Training Data
+*This section provides a high-level overview of the training data. It is relevant for anyone who wants to know the basics of what the model is learning.*
+Details for the dataset can be found in [LongShort Dataset](https://github.com/brief-ai-uchicago/LongShort-Dataset)
+Training data includes:
+-   5000 Earnings Call Documents
+## How to use
+This model can be easily used and deployed using HuggingFace's ecosystem. This needs `transformers` and `accelerate` installed. The model can be downloaded as follows:
+[LongShort-Dolly-2-7B](https://huggingface.co/briefai/LongShort-Dolly-2-7B)
+## Intended Use
+This model is being created in order to enable public research on large language models (LLMs). LLMs are intended to be used for language generation or as a pre-trained base model that can be further fine-tuned for specific tasks. The use cases below are not exhaustive.
+### Direct Use
+-   Text generation
+-   Exploring characteristics of language generated by a language model
+    -   Examples: Cloze tests, counterfactuals, generations with reframings
+### Downstream Use
+-   Tasks that leverage language models include: Information Extraction, Question Answering, Summarization
+#### Out-of-scope Uses
+Using the model in [high-stakes](#high-stakes) settings is out of scope for this model.  The model is not designed for [critical decisions](#critical-decisions) nor uses with any material consequences on an individual's livelihood or wellbeing. The model outputs content that appears factual but may not be correct.
+Out-of-scope Uses Include:
+-   Usage for evaluating or scoring individuals, such as for employment, education, or credit
+-   Applying the model for critical automatic decisions, generating factual content, creating reliable summaries, or generating predictions that must be correct
+#### Misuse
+Intentionally using the model for harm, violating [human rights](#human-rights), or other kinds of malicious activities, is a misuse of this model. This includes:
+-   Spam generation
+-   Disinformation and influence operations
+-   Disparagement and defamation
+-   Harassment and abuse
+-   [Deception](#deception)
+-   Unconsented impersonation and imitation
+-   Unconsented surveillance
+-   Generating content without attribution to the model, as specified in the [RAIL License, Use Restrictions](https://huggingface.co/spaces/bigscience/license)
+## Intended Users
+### Direct Users
+-   General Public
+-   Researchers
+-   Students
+-   Educators
+-   Engineers/developers
+-   Non-commercial entities
+-   Financial Industry
+# Risks and Limitations
+*This section identifies foreseeable harms and misunderstandings.*
+Model may:
+-   Overrepresent some viewpoints and underrepresent others
+-   Contain stereotypes
+-   Contain [personal information](#personal-data-and-information)
+-   Generate:
+    -   Hateful, abusive, or violent language
+    -   Discriminatory or prejudicial language
+    -   Content that may not be appropriate for all settings, including sexual content
+-   Make errors, including producing incorrect information as if it were factual
+-   Generate irrelevant or repetitive outputs
+-   Induce users into attributing human traits to it, such as sentience or consciousness
+# Evaluation
+*This section describes the evaluation protocols and provides the results.*
+Result: LongShort-Falcon-7B gives 45.4% accuracy on a validation set of 10% of the original training dataset.
+**Train-time Evaluation:**
+Final checkpoint after 700 epochs:
+- Training Loss: 1.645
+# Recommendations
+*This section provides information on warnings and potential mitigations.*
+-   Indirect users should be made aware when the content they're working with is created by the LLM.
+-   Users should be aware of [Risks and Limitations](#risks-and-limitations), and include an appropriate age disclaimer or blocking interface as necessary.
+-   Users of the model should provide mechanisms for those affected to provide feedback, such as an email address for comments.
+# Model Card Authors
+Vishal Parameshwaran, Garima Sohi, Jose Gerala, Sanchit Narayan Kumar