Christoph Holthaus
commited on
Commit
·
464e9a9
1
Parent(s):
10006aa
INFO: Latest results and new focus
Browse files
README.md
CHANGED
|
@@ -13,6 +13,10 @@ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-
|
|
| 13 |
|
| 14 |
This is a test ...
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
TASKS:
|
| 17 |
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
|
| 18 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|
|
|
|
| 13 |
|
| 14 |
This is a test ...
|
| 15 |
|
| 16 |
+
LAST REVALATION: IT WORKS, but on Huggingface its PAINSTAKINGLY SLOW. Probably the reason why its not done yet. Its like 0,5tok/s on smallest quant for 7b mistral.
|
| 17 |
+
Idea: Fix it with intel-specific https://github.com/intel/intel-extension-for-transformers and check if it changes anything. Maybe checkout first if in the container there is a way to determine cpu type, if the integration is not trivial. (Or just make it trivial)
|
| 18 |
+
|
| 19 |
+
|
| 20 |
TASKS:
|
| 21 |
- rewrite generation from scratch or use the one of mistral space if possible. alternative use https://github.com/abetlen/llama-cpp-python#chat-completion or https://huggingface.co/spaces/deepseek-ai/deepseek-coder-7b-instruct/blob/main/app.py
|
| 22 |
- write IN LARGE LETTERS that this is not the original model but a quantified one that is able to run on free CPU Inference
|