rednote-hilab/dots.ocr · Flash Attention Req When Rrunning Model

Flash Attention Req When Rrunning Model

#13

by Chillarmo - opened Aug 8

Discussion

Chillarmo

Aug 8

•

edited Aug 8

Hey, I recently found this model and think it is really cool.

However, it seems that I need Flash Attention to run it.

Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?

Or at least supply some way of running the model without Flash Attention installed?

redmoe-ai-v1

rednote-hilab org Aug 9

Hey, I recently found this model and think it is really cool.

However, it seems that I need Flash Attention to run it.

Could you guys make a version that does not require Flash Attention (Maybe using SDPA)?

Or at least supply some way of running the model without Flash Attention installed?

Refer to this: https://github.com/rednote-hilab/dots.ocr/issues/1#issuecomment-3148962536
You can run with CPU or GPU by sdpa with HF inference. (vLLM maybe OOM)

sayed0am

28 days ago

I had a lot of headache trying to install flash attention so i used https://github.com/mjun0812/flash-attention-prebuild-wheels

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment