openbmb
/

InfLLM-V2-Long-Sparse-Base

sparse-attention

Model card Files Files and versions

suhmily commited on 12 days ago

Commit

ef54e62

·

verified ·

1 Parent(s): 0ce9cf6

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -43,7 +43,7 @@ InfLLM-V2-Long-Sparse-Base supports both dense attention inference and sparse at
 - Dense attention inference: vLLM, SGLang, Huggingface Transformers
 - Sparse attention inference: Huggingface Transformers, CPM.cu
-**To facilitate researches in sparse attention, we provide [InfLLM-V2 Kernels](https://github.com/OpenBMB/infllmv2_cuda_impl) and [CPM.cu, a high-performance CUDA implementation](https://github.com/OpenBMB/CPM.cu.git).**
 ### Inference with Transformers
 InfLLM-V2-Long-Sparse-Base requires `transformers>=4.56`.

 - Dense attention inference: vLLM, SGLang, Huggingface Transformers
 - Sparse attention inference: Huggingface Transformers, CPM.cu
+**To facilitate researches in sparse attention, we provide [InfLLM-V2 Kernels](https://github.com/OpenBMB/infllmv2_cuda_impl) and [CPM.cu](https://github.com/OpenBMB/CPM.cu.git).**
 ### Inference with Transformers
 InfLLM-V2-Long-Sparse-Base requires `transformers>=4.56`.