Update README.md
Browse files
README.md
CHANGED
|
@@ -43,7 +43,7 @@ InfLLM-V2-Long-Sparse-Base supports both dense attention inference and sparse at
|
|
| 43 |
- Dense attention inference: vLLM, SGLang, Huggingface Transformers
|
| 44 |
- Sparse attention inference: Huggingface Transformers, CPM.cu
|
| 45 |
|
| 46 |
-
**To facilitate researches in sparse attention, we provide [InfLLM-V2 Kernels](https://github.com/OpenBMB/infllmv2_cuda_impl) and [CPM.cu
|
| 47 |
|
| 48 |
### Inference with Transformers
|
| 49 |
InfLLM-V2-Long-Sparse-Base requires `transformers>=4.56`.
|
|
|
|
| 43 |
- Dense attention inference: vLLM, SGLang, Huggingface Transformers
|
| 44 |
- Sparse attention inference: Huggingface Transformers, CPM.cu
|
| 45 |
|
| 46 |
+
**To facilitate researches in sparse attention, we provide [InfLLM-V2 Kernels](https://github.com/OpenBMB/infllmv2_cuda_impl) and [CPM.cu](https://github.com/OpenBMB/CPM.cu.git).**
|
| 47 |
|
| 48 |
### Inference with Transformers
|
| 49 |
InfLLM-V2-Long-Sparse-Base requires `transformers>=4.56`.
|