Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,13 +14,15 @@ pipeline_tag: feature-extraction
|
|
| 14 |
|
| 15 |
# NILC Portuguese Word Embeddings — Wang2Vec CBOW 300d
|
| 16 |
|
| 17 |
-
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
The embeddings were trained on a large Portuguese corpus (Brazilian + European), composed of 17 corpora (~1.39B tokens).
|
| 21 |
-
Training was carried out with the following algorithms: **Word2Vec** [1], **FastText** [2], **Wang2Vec** [3], and **GloVe** [4].
|
| 22 |
|
| 23 |
-
This repository contains the **Wang2Vec CBOW 300d** model in **safetensors** format.
|
| 24 |
|
| 25 |
---
|
| 26 |
|
|
@@ -95,12 +97,18 @@ Hartmann, N. et al. (2017), STIL 2017.
|
|
| 95 |
|
| 96 |
### BibTeX
|
| 97 |
```bibtex
|
| 98 |
-
@inproceedings{
|
| 99 |
-
title={
|
| 100 |
-
author={Hartmann, Nathan
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 104 |
```
|
| 105 |
|
| 106 |
---
|
|
|
|
| 14 |
|
| 15 |
# NILC Portuguese Word Embeddings — Wang2Vec CBOW 300d
|
| 16 |
|
| 17 |
+
This repository contains the **Wang2Vec CBOW 300d** model in **safetensors** format.
|
| 18 |
+
|
| 19 |
+
## About
|
| 20 |
+
|
| 21 |
+
NILC-Embeddings is a repository for storing and sharing **word embeddings** for the Portuguese language. The goal is to provide ready-to-use vector resources for **Natural Language Processing (NLP)** and **Machine Learning** tasks.
|
| 22 |
+
|
| 23 |
+
The embeddings were trained on a large Portuguese corpus (Brazilian + European), composed of 17 corpora (~1.39B tokens). Training was carried out with the following algorithms: **Word2Vec**, **FastText**, **Wang2Vec**, and **GloVe**.
|
| 24 |
|
|
|
|
|
|
|
| 25 |
|
|
|
|
| 26 |
|
| 27 |
---
|
| 28 |
|
|
|
|
| 97 |
|
| 98 |
### BibTeX
|
| 99 |
```bibtex
|
| 100 |
+
@inproceedings{{hartmann-etal-2017-portuguese,
|
| 101 |
+
title = {{{{P}}ortuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks}},
|
| 102 |
+
author = {{Hartmann, Nathan and Fonseca, Erick and Shulby, Christopher and Treviso, Marcos and Silva, J{{'e}}ssica and Alu{{'i}}sio, Sandra}},
|
| 103 |
+
year = 2017,
|
| 104 |
+
month = oct,
|
| 105 |
+
booktitle = {{Proceedings of the 11th {{B}}razilian Symposium in Information and Human Language Technology}},
|
| 106 |
+
publisher = {{Sociedade Brasileira de Computa{{\c{{c}}}}{{\~a}}o}},
|
| 107 |
+
address = {{Uberl{{\^a}}ndia, Brazil}},
|
| 108 |
+
pages = {{122--131}},
|
| 109 |
+
url = {{https://aclanthology.org/W17-6615/}},
|
| 110 |
+
editor = {{Paetzold, Gustavo Henrique and Pinheiro, Vl{{'a}}dia}}
|
| 111 |
+
}}
|
| 112 |
```
|
| 113 |
|
| 114 |
---
|