How did you quantize it?

#1
by win10 - opened

How did you quantize it? I'm trying to use an LLM compressor to quantize a 72-bit NVFP4 model, but it won't run. My setup is 128GB RAM + RTX Pro 6000 96GB.

I shared my quantization script over here:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc

It should get you close and GPT-5.1 can tweak it if a particular model needs some customization.

I shared my quantization script over here:
https://huggingface.co/Firworks/MiroThinker-v1.0-30B-nvfp4/discussions/1#69269c6d40ce1d3b1a6ca1cc

It should get you close and GPT-5.1 can tweak it if a particular model needs some customization.

I try to get it to work but it always produces the error:
Traceback (most recent call last):
File "", line 1, in
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 122, in spawn_main
exitcode = _main(fd, parent_sentinel)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 131, in _main
prepare(preparation_data)
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 246, in prepare _fixup_main_from_path(data['init_main_from_path'])
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\multiprocessing\spawn.py", line 297, in _fixup_main_from_path
main_content = runpy.run_path(main_path,
^^^^^^^^^^^^^^^^^^^^^^^^
File "", line 291, in run_path
File "", line 98, in _run_module_code
File "", line 88, in _run_code
File "F:\nvfp4_example.py", line 19, in
model = AutoModelForCausalLM.from_pretrained(MODEL_ID, torch_dtype="auto") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\models\auto\auto_factory.py", line 604, in from_pretrained
return model_class.from_pretrained(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 277, in _wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 5048, in from_pretrained
) = cls._load_pretrained_model(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 5468, in _load_pretrained_model
_error_msgs, disk_offload_index = load_shard_file(args) ^^^^^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 831, in load_shard_file
state_dict = load_state_dict(
^^^^^^^^^^^^^^^^
File "C:\Users\jmes1\AppData\Local\Programs\Python\Python311\Lib\site-packages\transformers\modeling_utils.py", line 484, in load_state_dict
with safe_open(checkpoint_file, framework="pt") as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
OSError: The pagination file is too small to complete the operation. (OS error 1455)

win10 changed discussion status to closed

Sign up or log in to comment