How did you quantize it?

by win10 - opened Dec 4, 2025

Dec 4, 2025

How did you quantize it? I'm trying to use an LLM compressor to quantize a 72-bit NVFP4 model, but it won't run. My setup is 128GB RAM + RTX Pro 6000 96GB.

Firworks

Owner Dec 4, 2025

I shared my quantization script over here:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc

It should get you close and GPT-5.1 can tweak it if a particular model needs some customization.

win10

Dec 4, 2025

I shared my quantization script over here:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc

It should get you close and GPT-5.1 can tweak it if a particular model needs some customization.

I try to get it to work but it always produces the error:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 246, in prepare _fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "F:\nvfp4_example.py", line 19, in
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 5048, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 5468, in _load_pretrained_model
_error_msgs, disk_offload_index = load_shard_file(args) ^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 831, in load_shard_file
state_dict = load_state_dict(
^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 484, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: The pagination file is too small to complete the operation. (OS error 1455)

win10 changed discussion status to closed Dec 4, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment