Flash Attention Req When Rrunning Model
Hey, I recently found this model and think it is really cool.
However, it seems that I need Flash Attention to run it.
Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?
Or at least supply some way of running the model without Flash Attention installed?
Hey, I recently found this model and think it is really cool.
However, it seems that I need Flash Attention to run it.
Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?
Or at least supply some way of running the model without Flash Attention installed?
Refer to this: https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536
You can run with CPU or GPU by sdpa with HF inference. (vLLM maybe OOM)
I had a lot of headache trying to install flash attention so i used https://github.com/mjun0812/flash-attention-prebuild-wheels