Nice version (thank you!) with some hiccups! :)

#1
by dehnhaide - opened

Here u go, from my testing the performance is quite good, and it is easier to run for vram poor folk compared to glm https://huggingface.co/remichu/MiniMax-M2.1-exl3
Many thanks remichu, but I've run into a problem... while trying to load the model in TabbyAPI, i get:
"NotImplementedError: Tensor-parallel is not currently implemented for MiniMaxM2ForCausalLM"

I use indeed "tensor_parallel: true" on my setup (1x4090 + 4x3090) is there something I'm missing! Thanks for the help!
O. Iv'e managed to get it working with "tensor_parallel: false" ... however the result is highly abundant of Chinese characters, I mean, much more than in GGUF version. Could that be smth about the quantization?

Here u go, from my testing the performance is quite good, and it is easier to run for vram poor folk compared to glm https://huggingface.co/remichu/MiniMax-M2.1-exl3
Many thanks remichu, but I've run into a problem... while trying to load the model in TabbyAPI, i get:
"NotImplementedError: Tensor-parallel is not currently implemented for MiniMaxM2ForCausalLM"

I use indeed "tensor_parallel: true" on my setup (1x4090 + 4x3090) is there something I'm missing! Thanks for the help!
O. Iv'e managed to get it working with "tensor_parallel: false" ... however the result is highly abundant of Chinese characters, I mean, much more than in GGUF version. Could that be smth about the quantization?

tensor parallel is not supported for this kodel by exl3. Regarding chinese character, can u share a sample pro,pt anf setting so i can try to reproduce?

Sign up or log in to comment