not compatible to newest LM-Studio eg llama.cpp 1.52

#20
by kalle07 - opened

not compatible with newest LM-Studio and llama.cpp 1.52

downloaded today:
bartowski/swiss-ai_Apertus-8B-Instruct-2509-GGUF
bit shame , why you have to create a new kind of architecture? "apertus" ?

2025-10-05 11:23:17 [DEBUG]
 [LM Studio] GPU Configuration:
  Strategy: evenly
  Priority: []
  Disabled GPUs: []
  Limit weight offload to dedicated GPU Memory: ON
  Offload KV Cache to GPU: ON
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Live GPU memory info (source 'LMS Core'):
  GPU 0: NVIDIA GeForce RTX 4060 Ti (Used: 774.70 MB, Total: 17.18 GB, Free: 16.40 GB)
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Model load size estimate with raw num offload layers 'max' and context length '8096':
  Model: 8.82 GB
  Context: 1.67 GB
  Total: 10.49 GB
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Resolved GPU config options:
  Num Offload Layers: max
  Num CPU Expert Layers: 0
  Main GPU: 0
  Tensor Split: [0]
  Disabled GPUs: []
2025-10-05 11:23:17 [DEBUG]
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
2025-10-05 11:23:17 [DEBUG]
 CUDA : ARCHS = 750,800,890,900,1000,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
2025-10-05 11:23:17 [DEBUG]
 llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) (0000:01:00.0) - 15225 MiB free
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: loaded meta data with 48 key-value pairs and 324 tensors from F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = apertus
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv   1:                              xielu.alpha_n arr[f32,32]      = [40.750000, 31.625000, 22.875000, 16....
llama_model_loader: - kv   2:                              xielu.alpha_p arr[f32,32]      = [166.000000, 174.000000, 128.000000, ...
llama_model_loader: - kv   3:                                 xielu.beta arr[f32,32]      = [0.500000, 0.500000, 0.500000, 0.5000...
llama_model_loader: - kv   4:                                  xielu.eps arr[f32,32]      = [-0.000001, -0.000001, -0.000001, -0....
llama_model_loader: - kv   5:                               general.type str              = model
llama_model_loader: - kv   6:                               general.name str              = Apertus 8B Instruct 2509
llama_model_loader: - kv   7:                            general.version str              = 2509
llama_model_loader: - kv   8:                           general.finetune str              = Instruct
llama_model_loader: - kv   9:                           general.basename str              = Apertus
llama_model_loader: - kv  10:                         general.size_label str              = 8B
llama_model_loader: - kv  11:                            general.license str              = apache-2.0
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Apertus 8B 2509
llama_model_loader: - kv  14:               general.base_model.0.version str              = 2509
llama_model_loader: - kv  15:          general.base_model.0.organization str              = Swiss Ai
llama_model_loader: - kv  16:              general.base_model.0.repo_url str              = https://huggingface.co/swiss-ai/Apert...
llama_model_loader: - kv  17:                               general.tags arr[str,5]       = ["multilingual", "compliant", "swiss-...
llama_model_loader: - kv  18:                        apertus.block_count u32              = 32
llama_model_loader: - kv  19:                     apertus.context_length u32              = 65536
llama_model_loader: - kv  20:                   apertus.embedding_length u32              = 4096
llama_model_loader: - kv  21:                apertus.feed_forward_length u32              = 21504
llama_model_loader: - kv  22:               apertus.attention.head_count u32              = 32
llama_model_loader: - kv  23:            apertus.attention.head_count_kv u32              = 8
llama_model_loader: - kv  24:                     apertus.rope.freq_base f32              = 12000000.000000
llama_model_loader: - kv  25:   apertus.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  26:                         apertus.vocab_size u32              = 131072
llama_model_loader: - kv  27:               apertus.rope.dimension_count u32              = 128
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = tekken
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,131072]  = ["<unk>", "<s>", "</s>", "<pad>", "[/...
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,269443]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �...
llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 68
llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 3
llama_model_loader: - kv  37:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  38:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  39:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  40:                    tokenizer.chat_template str              = {%- macro render_typescript_type(para...
llama_model_loader: - kv  41:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  42:               general.quantization_version u32              = 2
llama_model_loader: - kv  43:                          general.file_type u32              = 7
llama_model_loader: - kv  44:                      quantize.imatrix.file str              = /models_out/Apertus-8B-Instruct-2509-...
llama_model_loader: - kv  45:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav5.txt
llama_model_loader: - kv  46:             quantize.imatrix.entries_count u32              = 192
llama_model_loader: - kv  47:              quantize.imatrix.chunks_count u32              = 822
llama_model_loader: - type  f32:  130 tensors
llama_model_loader: - type q8_0:  194 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 7.97 GiB (8.50 BPW)
2025-10-05 11:23:17 [DEBUG]
 llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'apertus'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf', try reducing --n-gpu-layers if you're running out of VRAM
2025-10-05 11:23:17 [DEBUG]
 lmstudio-llama-cpp: failed to load model. Error: error loading model: error loading model architecture: unknown model architecture: 'apertus'
Swiss AI Initiative org

you need a newer version of llama.cpp, it has to be version b6686 or newer. LMstudio might first have to update the llama.cpp engine (v1.52.0 is an still old one, based on llama.cpp release b6651 )

+1. Bartowski's and other GGUF editions work fine for me in LM Studio v0.3.30. See screenshot and a bit more info in my blog post

Sign up or log in to comment