Llama 3.1 8B UltraLong MLX
					Collection
				
				4 items
				• 
				Updated
					
				
MLX version of Llama 3.1 8B UltraLong 1M Instruct
This model was converted to MLX format from nvidia/Llama-3.1-8B-UltraLong-1M-Instruct using mlx-lm version 0.22.5.
Maximum context window: 1M tokens
For more details, please refer to arXiv.
pip install -U mlx-lm
python -m mlx_lm.generate --model TheCluster/Llama-3.1-8B-UltraLong-1M-Instruct-mlx-4bit --max-tokens 65536 --temperature 0.5 --prompt "Your big prompt"