not compatible to newest LM-Studio eg llama.cpp 1.52
#20
by
						
kalle07
	
							
						- opened
							
					
not compatible with newest LM-Studio and llama.cpp 1.52
downloaded today:
bartowski/swiss-ai_Apertus-8B-Instruct-2509-GGUF
bit shame , why you have to create a new kind of architecture? "apertus" ? 
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] GPU Configuration:
  Strategy: evenly
  Priority: []
  Disabled GPUs: []
  Limit weight offload to dedicated GPU Memory: ON
  Offload KV Cache to GPU: ON
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Live GPU memory info (source 'LMS Core'):
  GPU 0: NVIDIA GeForce RTX 4060 Ti (Used: 774.70 MB, Total: 17.18 GB, Free: 16.40 GB)
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Model load size estimate with raw num offload layers 'max' and context length '8096':
  Model: 8.82 GB
  Context: 1.67 GB
  Total: 10.49 GB
2025-10-05 11:23:17 [DEBUG]
 [LM Studio] Resolved GPU config options:
  Num Offload Layers: max
  Num CPU Expert Layers: 0
  Main GPU: 0
  Tensor Split: [0]
  Disabled GPUs: []
2025-10-05 11:23:17 [DEBUG]
 ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
  Device 0: NVIDIA GeForce RTX 4060 Ti, compute capability 8.9, VMM: yes
2025-10-05 11:23:17 [DEBUG]
 CUDA : ARCHS = 750,800,890,900,1000,1200 | USE_GRAPHS = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
2025-10-05 11:23:17 [DEBUG]
 llama_model_load_from_file_impl: using device CUDA0 (NVIDIA GeForce RTX 4060 Ti) (0000:01:00.0) - 15225 MiB free
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: loaded meta data with 48 key-value pairs and 324 tensors from F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = apertus
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv   1:                              xielu.alpha_n arr[f32,32]      = [40.750000, 31.625000, 22.875000, 16....
llama_model_loader: - kv   2:                              xielu.alpha_p arr[f32,32]      = [166.000000, 174.000000, 128.000000, ...
llama_model_loader: - kv   3:                                 xielu.beta arr[f32,32]      = [0.500000, 0.500000, 0.500000, 0.5000...
llama_model_loader: - kv   4:                                  xielu.eps arr[f32,32]      = [-0.000001, -0.000001, -0.000001, -0....
llama_model_loader: - kv   5:                               general.type str              = model
llama_model_loader: - kv   6:                               general.name str              = Apertus 8B Instruct 2509
llama_model_loader: - kv   7:                            general.version str              = 2509
llama_model_loader: - kv   8:                           general.finetune str              = Instruct
llama_model_loader: - kv   9:                           general.basename str              = Apertus
llama_model_loader: - kv  10:                         general.size_label str              = 8B
llama_model_loader: - kv  11:                            general.license str              = apache-2.0
llama_model_loader: - kv  12:                   general.base_model.count u32              = 1
llama_model_loader: - kv  13:                  general.base_model.0.name str              = Apertus 8B 2509
llama_model_loader: - kv  14:               general.base_model.0.version str              = 2509
llama_model_loader: - kv  15:          general.base_model.0.organization str              = Swiss Ai
llama_model_loader: - kv  16:              general.base_model.0.repo_url str              = https://huggingface.co/swiss-ai/Apert...
llama_model_loader: - kv  17:                               general.tags arr[str,5]       = ["multilingual", "compliant", "swiss-...
llama_model_loader: - kv  18:                        apertus.block_count u32              = 32
llama_model_loader: - kv  19:                     apertus.context_length u32              = 65536
llama_model_loader: - kv  20:                   apertus.embedding_length u32              = 4096
llama_model_loader: - kv  21:                apertus.feed_forward_length u32              = 21504
llama_model_loader: - kv  22:               apertus.attention.head_count u32              = 32
llama_model_loader: - kv  23:            apertus.attention.head_count_kv u32              = 8
llama_model_loader: - kv  24:                     apertus.rope.freq_base f32              = 12000000.000000
llama_model_loader: - kv  25:   apertus.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  26:                         apertus.vocab_size u32              = 131072
llama_model_loader: - kv  27:               apertus.rope.dimension_count u32              = 128
llama_model_loader: - kv  28:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  29:                         tokenizer.ggml.pre str              = tekken
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  30:                      tokenizer.ggml.tokens arr[str,131072]  = ["<unk>", "<s>", "</s>", "<pad>", "[/...
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  31:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
2025-10-05 11:23:17 [DEBUG]
 llama_model_loader: - kv  32:                      tokenizer.ggml.merges arr[str,269443]  = ["Ġ Ġ", "Ġ t", "e r", "i n", "Ġ �...
llama_model_loader: - kv  33:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  34:                tokenizer.ggml.eos_token_id u32              = 68
llama_model_loader: - kv  35:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  36:            tokenizer.ggml.padding_token_id u32              = 3
llama_model_loader: - kv  37:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  38:               tokenizer.ggml.add_sep_token bool             = false
llama_model_loader: - kv  39:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  40:                    tokenizer.chat_template str              = {%- macro render_typescript_type(para...
llama_model_loader: - kv  41:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  42:               general.quantization_version u32              = 2
llama_model_loader: - kv  43:                          general.file_type u32              = 7
llama_model_loader: - kv  44:                      quantize.imatrix.file str              = /models_out/Apertus-8B-Instruct-2509-...
llama_model_loader: - kv  45:                   quantize.imatrix.dataset str              = /training_dir/calibration_datav5.txt
llama_model_loader: - kv  46:             quantize.imatrix.entries_count u32              = 192
llama_model_loader: - kv  47:              quantize.imatrix.chunks_count u32              = 822
llama_model_loader: - type  f32:  130 tensors
llama_model_loader: - type q8_0:  194 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 7.97 GiB (8.50 BPW)
2025-10-05 11:23:17 [DEBUG]
 llama_model_load: error loading model: error loading model architecture: unknown model architecture: 'apertus'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model 'F:\...\swiss-ai_Apertus-8B-Instruct-2509-Q8_0.gguf', try reducing --n-gpu-layers if you're running out of VRAM
2025-10-05 11:23:17 [DEBUG]
 lmstudio-llama-cpp: failed to load model. Error: error loading model: error loading model architecture: unknown model architecture: 'apertus'
+1. Bartowski's and other GGUF editions work fine for me in LM Studio v0.3.30. See screenshot and a bit more info in my blog post

