Llama 3.1 8B UltraLong 1M Instruct 4-bit MLX

MLX version of Llama 3.1 8B UltraLong 1M Instruct

This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct using mlx-lm version 0.22.5.

Model Details

Maximum context window: 1M tokens

For more details, please refer to arXiv.

Use with mlx

pip install -U mlx-lm

python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-4bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"

Downloads last month: 7

Safetensors

Model size

1B params

Tensor type

BF16

U32

Model tree for TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-4bit

Base model

nvidia/Llama-3.1-Nemotron-8B-UltraLong-1M-Instruct

Quantized

(19)

this model

Collection including TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-4bit

Llama 3.1 8B UltraLong MLX

Collection

4 items • Updated Apr 16