- Home
- Discussion Forum
- AROS FORUMS
- Development General
- AROS ABIv1 SMP.
AROS ABIv1 SMP.
Last updated on 12 hours ago
terminillsMember
Posted 17 days agoHere's a screenshot of AROS ABIv1 SMP Running on a 128 Core Server(currently in QEMU). I've been sending patches over to kalamatee to review and in time expect AROS to be bootable on high end servers. Does it need to? Absolutely not however it will help with stability in the long term.
4 users reacted to this post
aha, retrofaza, deadwood, Argo
You do not have access to view attachments
Jeff1138Member
Posted 17 days agoHi,
Interesting, over half the cores are at 100%, can you tell us what is running?
Interesting, over half the cores are at 100%, can you tell us what is running?
terminillsMember
Posted 17 days ago@Jeff1138 - Hi,
Interesting, over half the cores are at 100%, can you tell us what is running?
I'm working on this.
000000f4e35723e3 0x000000000107ca80 | 000 | [LlamaCpp] Fatal signal handlers installed
000000f5381cbe3d 0x000000000107ca80 | 000 | [LlamaCpp][SMP] detected_cores=128 ggml_max_threads=512
000000f538f0e258 0x000000000107ca80 | 000 | [LlamaCpp][SMP] mode=strict-affinity (set LLAMACPP_AROS_STRICT_AFFINITY=0 for scheduler-managed fallback)
000000f53a2e3981 0x000000000107ca80 | 000 | [LlamaCpp][SMP] n_threads=4 strict=1 cpumask=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,...
000000f53cbbbb5b 0x000000000107ca80 | 000 | [LlamaCpp][SMP] n_threads_batch=4 strict_batch=1 cpumask_batch=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,...
000000f53f4922f0 0x000000000107ca80 | 000 | [LlamaCpp] CLI params parsed, applying AROS SMP defaults done
000000f54a517ac6 0x000000000107ca80 | 000 | [LlamaCpp] Serial log bridge installed
000000f54b0dc582 0x000000000107ca80 | 000 | [LlamaCpp] common_init complete, serial bridge active
000000f54beb6e6e 0x000000000107ca80 | 000 | [LlamaCpp] Compute HIDD headers missing; building CPU-only path
000000f54cf47a17 0x000000000107ca80 | 000 | [LlamaCpp] entering common_init_from_params
000000f54da10df9 0x000000000107ca80 | 000 | [LlamaCpp] llama_params_fit_impl: getting device memory data for initial parameters:
000000f635d59d1a 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: loaded meta data with 20 key-value pairs and 57 tensors from LlamaCpp-Models/stories15M-q8_0.gguf (version GGUF V3 (latest))
000000f637c5b54c 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
000000f63b80c082 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 0: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
000000f63fcd76ee 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 1: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
000000f643048cfd 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 2: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
000000f6449d4efb 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 3: tokenizer.ggml.model str = llama
000000f645d07479 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 4: general.architecture str = llama
000000f647361ac6 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 5: general.name str = llama
000000f6486e8bee 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 6: tokenizer.ggml.unknown_token_id u32 = 0
000000f64993bcf6 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 7: tokenizer.ggml.bos_token_id u32 = 1
000000f64b0cb1be 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 8: tokenizer.ggml.eos_token_id u32 = 2
000000f64c7da503 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 9: tokenizer.ggml.seperator_token_id u32 = 4294967295
000000f64dd4f0e6 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 10: tokenizer.ggml.padding_token_id u32 = 4294967295
000000f64f33f0f4 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 11: llama.context_length u32 = 128
000000f65057a057 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 12: llama.embedding_length u32 = 288
000000f65193c6b7 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 13: llama.feed_forward_length u32 = 768
000000f652d6e5c2 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 14: llama.attention.head_count u32 = 6
000000f654269a7b 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 15: llama.block_count u32 = 6
000000f6557552c3 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 48
000000f656a01228 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 17: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
000000f658143f67 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 18: general.quantization_version u32 = 2
000000f6596e90ec 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 19: general.file_type u32 = 7
000000f65a955e42 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - type f32: 13 tensors
000000f65b808ad9 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - type q8_0: 44 tensors
000000f65c6d4db0 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: mmap is not supported on this platform
000000f65d6743a1 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file format = GGUF V3 (latest)
000000f65e3b321f 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file type = Q8_0
000000f65efa2508 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file size = 24.74 MiB (8.50 BPW)
000000f66256d31d 0x000000000107ca80 | 000 | [LlamaCpp] init_tokenizer: initializing tokenizer for type 1
000000f6632397ea 0x000000000107ca80 | 000 | [LlamaCpp] load: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295, using default id -1
000000f6642fee72 0x000000000107ca80 | 000 | [LlamaCpp] load: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295, using default id -1
000000f6678ad253 0x000000000107ca80 | 000 | [LlamaCpp] load: 0 unused tokens
000000f668ba4730 0x000000000107ca80 | 000 | [LlamaCpp] load: control token: 1 '<s>' is not marked as EOG
000000f669df0ddf 0x000000000107ca80 | 000 | [LlamaCpp] load: printing all EOG tokens:
000000f66a8ab509 0x000000000107ca80 | 000 | [LlamaCpp] load: - 2 ('</s>'
000000f66b49da3d 0x000000000107ca80 | 000 | [LlamaCpp] load: special tokens cache size = 3
000000f66e2ed58c 0x000000000107ca80 | 000 | [LlamaCpp] load: token to piece cache size = 0.1684 MB
000000f66f07da62 0x000000000107ca80 | 000 | [LlamaCpp] print_info: arch = llama
000000f66fe0bb2f 0x000000000107ca80 | 000 | [LlamaCpp] print_info: vocab_only = 0
000000f670b9edb9 0x000000000107ca80 | 000 | [LlamaCpp] print_info: no_alloc = 1
000000f6719f57b4 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ctx_train = 128
000000f67264cd50 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd = 288
000000f673398cfc 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_inp = 288
000000f673fc7e74 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_layer = 6
000000f674cae4d5 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_head = 6
000000f67560bc67 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_head_kv = 6
000000f675f5c988 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_rot = 48
000000f6768d3d9a 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_swa = 0
000000f677360fab 0x000000000107ca80 | 000 | [LlamaCpp] print_info: is_swa_any = 0
000000f6780ba968 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_head_k = 48
000000f678ef88db 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_head_v = 48
000000f679bcc481 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_gqa = 1
000000f67a8bab1b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_k_gqa = 288
000000f67b61b8f5 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_v_gqa = 288
000000f67c36046b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_norm_eps = 0.0e+00
000000f67cf4fd17 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_norm_rms_eps = 1.0e-05
000000f67ddc1e3b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_clamp_kqv = 0.0e+00
000000f67ed12516 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_max_alibi_bias = 0.0e+00
000000f67fc12552 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_logit_scale = 0.0e+00
000000f680945a76 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_attn_scale = 0.0e+00
000000f6816c703c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ff = 768
000000f68236ea9c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert = 0
000000f6830c03bc 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert_used = 0
000000f683b794d4 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert_groups = 0
000000f6846c1d3c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_group_used = 0
000000f685292184 0x000000000107ca80 | 000 | [LlamaCpp] print_info: causal attn = 1
000000f685d147ce 0x000000000107ca80 | 000 | [LlamaCpp] print_info: pooling type = 0
000000f68689c0e7 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope type = 0
000000f6874abadb 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope scaling = linear
000000f6881766c1 0x000000000107ca80 | 000 | [LlamaCpp] print_info: freq_base_train = 10000.0
000000f689024b2d 0x000000000107ca80 | 000 | [LlamaCpp] print_info: freq_scale_train = 1
000000f689d59938 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ctx_orig_yarn = 128
000000f68aa73fba 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope_yarn_log_mul = 0.0000
000000f68b7cdaa3 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope_finetuned = unknown
000000f68cc0fffc 0x000000000107ca80 | 000 | [LlamaCpp] print_info: model type = ?B
000000f68d914671 0x000000000107ca80 | 000 | [LlamaCpp] print_info: model params = 24.41 M
000000f68e86c40c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: general.name = llama
000000f68f62e63f 0x000000000107ca80 | 000 | [LlamaCpp] print_info: vocab type = SPM
000000f690448acc 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_vocab = 32000
000000f691438649 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_merges = 0
000000f6920a2dda 0x000000000107ca80 | 000 | [LlamaCpp] print_info: BOS token = 1 '<s>'
000000f692e37e46 0x000000000107ca80 | 000 | [LlamaCpp] print_info: EOS token = 2 '</s>'
000000f693b5a678 0x000000000107ca80 | 000 | [LlamaCpp] print_info: UNK token = 0 '<unk>'
000000f6947cc640 0x000000000107ca80 | 000 | [LlamaCpp] print_info: LF token = 13 '<0x0A>'
000000f69552d81b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: EOG token = 2 '</s>'
000000f696423b9a 0x000000000107ca80 | 000 | [LlamaCpp] print_info: max token length = 48
000000f69718c39f 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
000000f6987063c9 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 0 assigned to device CPU, is_swa = 0
000000f69973f89e 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 1 assigned to device CPU, is_swa = 0
000000f69a8eddcf 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 2 assigned to device CPU, is_swa = 0
000000f69b73ba7b 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 3 assigned to device CPU, is_swa = 0
000000f69c5c5da8 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 4 assigned to device CPU, is_swa = 0
000000f69d580004 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 5 assigned to device CPU, is_swa = 0
000000f69e4a45cb 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 6 assigned to device CPU, is_swa = 0
000000f69f388051 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor token_embd.weight
000000f6a0353080 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor output_norm.weight
000000f6a1126f5e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor output.weight
000000f6a1e86ec4 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_norm.weight
000000f6a2d49d12 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_q.weight
000000f6a3c9a86b 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_k.weight
000000f6a49a102d 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_v.weight
000000f6a57502aa 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_output.weight
000000f6a6510665 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_norm.weight
000000f6a74eec26 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_gate.weight
000000f6a82f95a0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_down.weight
000000f6a92c71a8 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_up.weight
000000f6aa1496f4 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_norm.weight
000000f6aafe590b 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_q.weight
000000f6abe25903 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_k.weight
000000f6acc7ecfc 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_v.weight
000000f6ada357c9 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_output.weight
000000f6ae8f03d1 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_norm.weight
000000f6af8246b8 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_gate.weight
000000f6b0542dae 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_down.weight
000000f6b1923cec 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_up.weight
000000f6b26cdaf0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_norm.weight
000000f6b366d320 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_q.weight
000000f6b44b2f48 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_k.weight
000000f6b530f3fc 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_v.weight
000000f6b5f580fa 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_output.weight
000000f6b6dfe401 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_norm.weight
000000f6b7d9d02e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_gate.weight
000000f6b8b2bdf8 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_down.weight
000000f6b99f4155 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_up.weight
000000f6ba7caf90 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_norm.weight
000000f6bb717386 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_q.weight
000000f6bc7a7b47 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_k.weight
000000f6bd77d68e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_v.weight
000000f6be75baf1 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_output.weight
000000f6bf419063 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_norm.weight
000000f6c024cc8e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_gate.weight
000000f6c0fe0b1b 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_down.weight
000000f6c1eb63f2 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_up.weight
000000f6c2d0ef53 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_norm.weight
000000f6c3b9b819 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_q.weight
000000f6c4afdaf0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_k.weight
000000f6c5a25541 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_v.weight
000000f6c68b1d26 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_output.weight
000000f6c77bcd2a 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_norm.weight
000000f6c86abb47 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_gate.weight
000000f6c954306a 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_down.weight
000000f6ca52ec86 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_up.weight
000000f6cb490247 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_norm.weight
000000f6cc36bf9b 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_q.weight
000000f6cd249fb3 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_k.weight
000000f6ce14c957 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_v.weight
000000f6cf5eb7a5 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_output.weight
000000f6d05a650a 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_norm.weight
000000f6d14ebfec 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_gate.weight
000000f6d226c275 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_down.weight
000000f6d313a2b1 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_up.weight
000000f6d3e898c2 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: CPU model buffer size = 0.00 MiB
000000f6dec648f7 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: constructing llama_context
000000f6e399fb58 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_seq_max = 1
000000f6e460a71c 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx = 256
000000f6e525161f 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx_seq = 256
000000f6e5ef83cd 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_batch = 256
000000f6e6c140de 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ubatch = 256
000000f6e79eeadd 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: causal_attn = 1
000000f6e85fcaf0 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: flash_attn = auto
000000f6e936b01f 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: kv_unified = false
000000f6ea0522b5 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: freq_base = 10000.0
000000f6eac9d4e8 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: freq_scale = 1
000000f6eb90aaf5 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx_seq (256) > n_ctx_train (128) -- possible training context overflow
000000f6eccf8e04 0x000000000107ca80 | 000 | [LlamaCpp] set_abort_callback: call
000000f6ed977310 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: CPU output buffer size = 0.12 MiB
000000f6ee94c461 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 0: dev = CPU
000000f6ef736608 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 1: dev = CPU
000000f6f06d3453 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 2: dev = CPU
000000f6f155ae95 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 3: dev = CPU
000000f6f2233afe 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 4: dev = CPU
000000f6f2fa7e38 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 5: dev = CPU
000000f6f3e52230 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: CPU KV buffer size = 0.00 MiB
000000f6f4f77bc8 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: size = 1.69 MiB ( 256 cells, 6 layers, 1/1 seqs), K (f16): 0.84 MiB, V (f16): 0.84 MiB
000000f6f8d5a7c4 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: enumerating backends
000000f6f98e05e9 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: backend_ptrs.size() = 1
000000f6fa8291ea 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserving ...
000000f6fb5f4842 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: max_nodes = 1024
000000f702e677bb 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserving full memory module
000000f703dd153a 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: worst-case: n_tokens = 256, n_seqs = 1, n_outputs = 1
000000f705198a37 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
000000f70cfc10cb 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: Flash Attention was auto, set to enabled
000000f70de23ec5 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 256, n_seqs = 1, n_outputs = 256
000000f7179fcfcc 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
000000f7228107b7 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 256, n_seqs = 1, n_outputs = 256
000000f72eb05e66 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: CPU compute buffer size = 33.04 MiB
000000f72fbf643c 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: graph nodes = 193
000000f730780915 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: graph splits = 1
000000f731216267 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserve took 360.70 ms, sched copies = 1
000000f731f50581 0x000000000107ca80 | 000 | [LlamaCpp] llama_memory_breakdown_print: | memory breakdown [MiB] | total free self model context compute unaccounted |
000000f7332044d4 0x000000000107ca80 | 000 | [LlamaCpp] llama_memory_breakdown_print: | - Host | 59 = 24 + 1 + 33 |
000000f736a5654e 0x000000000107ca80 | 000 | [LlamaCpp] llama_params_fit_impl: no devices with dedicated memory found
000000f73793cf31 0x000000000107ca80 | 000 | [LlamaCpp] llama_params_fit: successfully fit params to free device memory
000000f73949f1ca 0x000000000107ca80 | 000 | [LlamaCpp] llama_params_fit: fitting params to free memory took 1.87 seconds
000000f81ad8889f 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: loaded meta data with 20 key-value pairs and 57 tensors from LlamaCpp-Models/stories15M-q8_0.gguf (version GGUF V3 (latest))
000000f81c7a9747 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
000000f8203dfea0 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 0: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<...
000000f8245122ab 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 1: tokenizer.ggml.scores arr[f32,32000] = [0.000000, 0.000000, 0.000000, 0.0000...
000000f826756399 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 2: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
000000f827fbf2b4 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 3: tokenizer.ggml.model str = llama
000000f82947d83b 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 4: general.architecture str = llama
000000f82a6a1e63 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 5: general.name str = llama
000000f82c16da26 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 6: tokenizer.ggml.unknown_token_id u32 = 0
000000f82d4c425a 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 7: tokenizer.ggml.bos_token_id u32 = 1
000000f82e8374db 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 8: tokenizer.ggml.eos_token_id u32 = 2
000000f82fc14c88 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 9: tokenizer.ggml.seperator_token_id u32 = 4294967295
000000f830ff2a43 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 10: tokenizer.ggml.padding_token_id u32 = 4294967295
000000f83241f4ee 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 11: llama.context_length u32 = 128
000000f833860ea8 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 12: llama.embedding_length u32 = 288
000000f834cbdd7e 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 13: llama.feed_forward_length u32 = 768
000000f835e1e0fa 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 14: llama.attention.head_count u32 = 6
000000f837207b16 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 15: llama.block_count u32 = 6
000000f838524e65 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 16: llama.rope.dimension_count u32 = 48
000000f83989c209 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 17: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
000000f83aeefd99 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 18: general.quantization_version u32 = 2
000000f83c0beacf 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - kv 19: general.file_type u32 = 7
000000f83d316c36 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - type f32: 13 tensors
000000f83de3aa7c 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: - type q8_0: 44 tensors
000000f83eb850ab 0x000000000107ca80 | 000 | [LlamaCpp] llama_model_loader: mmap is not supported on this platform
000000f83fa2b71d 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file format = GGUF V3 (latest)
000000f8407d475c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file type = Q8_0
000000f84119e710 0x000000000107ca80 | 000 | [LlamaCpp] print_info: file size = 24.74 MiB (8.50 BPW)
000000f8448067e4 0x000000000107ca80 | 000 | [LlamaCpp] init_tokenizer: initializing tokenizer for type 1
000000f8454fe064 0x000000000107ca80 | 000 | [LlamaCpp] load: bad special token: 'tokenizer.ggml.seperator_token_id' = 4294967295, using default id -1
000000f8468df772 0x000000000107ca80 | 000 | [LlamaCpp] load: bad special token: 'tokenizer.ggml.padding_token_id' = 4294967295, using default id -1
000000f849da3a80 0x000000000107ca80 | 000 | [LlamaCpp] load: 0 unused tokens
000000f84b1f0b23 0x000000000107ca80 | 000 | [LlamaCpp] load: control token: 1 '<s>' is not marked as EOG
000000f84c5c9672 0x000000000107ca80 | 000 | [LlamaCpp] load: printing all EOG tokens:
000000f84d1481f1 0x000000000107ca80 | 000 | [LlamaCpp] load: - 2 ('</s>'
000000f84dc72eaa 0x000000000107ca80 | 000 | [LlamaCpp] load: special tokens cache size = 3
000000f850718e11 0x000000000107ca80 | 000 | [LlamaCpp] load: token to piece cache size = 0.1684 MB
000000f85131b7ea 0x000000000107ca80 | 000 | [LlamaCpp] print_info: arch = llama
000000f851fd29d9 0x000000000107ca80 | 000 | [LlamaCpp] print_info: vocab_only = 0
000000f852bf6080 0x000000000107ca80 | 000 | [LlamaCpp] print_info: no_alloc = 0
000000f85391f770 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ctx_train = 128
000000f854a3463c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd = 288
000000f855883aee 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_inp = 288
000000f85639e58c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_layer = 6
000000f856f62206 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_head = 6
000000f857d061cd 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_head_kv = 6
000000f858ae618a 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_rot = 48
000000f859700d53 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_swa = 0
000000f85a2ee034 0x000000000107ca80 | 000 | [LlamaCpp] print_info: is_swa_any = 0
000000f85b09d059 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_head_k = 48
000000f85bc680f0 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_head_v = 48
000000f85c79d8c1 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_gqa = 1
000000f85d28bdcd 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_k_gqa = 288
000000f85de62edf 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_embd_v_gqa = 288
000000f85e9c569c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_norm_eps = 0.0e+00
000000f85f77dd0c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_norm_rms_eps = 1.0e-05
000000f8604b4287 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_clamp_kqv = 0.0e+00
000000f861185d5d 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_max_alibi_bias = 0.0e+00
000000f861d791b4 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_logit_scale = 0.0e+00
000000f862769b47 0x000000000107ca80 | 000 | [LlamaCpp] print_info: f_attn_scale = 0.0e+00
000000f86329f205 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ff = 768
000000f863eb42b1 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert = 0
000000f864a3658f 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert_used = 0
000000f865531525 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_expert_groups = 0
000000f86629e4f1 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_group_used = 0
000000f867064b80 0x000000000107ca80 | 000 | [LlamaCpp] print_info: causal attn = 1
000000f867caf340 0x000000000107ca80 | 000 | [LlamaCpp] print_info: pooling type = 0
000000f8688ba8bf 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope type = 0
000000f8693dd74c 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope scaling = linear
000000f86a15207b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: freq_base_train = 10000.0
000000f86adfcf1a 0x000000000107ca80 | 000 | [LlamaCpp] print_info: freq_scale_train = 1
000000f86b98af53 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_ctx_orig_yarn = 128
000000f86c55e6e0 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope_yarn_log_mul = 0.0000
000000f86d1a94ae 0x000000000107ca80 | 000 | [LlamaCpp] print_info: rope_finetuned = unknown
000000f86de77e97 0x000000000107ca80 | 000 | [LlamaCpp] print_info: model type = ?B
000000f86eb51c7b 0x000000000107ca80 | 000 | [LlamaCpp] print_info: model params = 24.41 M
000000f86fa7d0b5 0x000000000107ca80 | 000 | [LlamaCpp] print_info: general.name = llama
000000f870857235 0x000000000107ca80 | 000 | [LlamaCpp] print_info: vocab type = SPM
000000f871494fcf 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_vocab = 32000
000000f87239a3bc 0x000000000107ca80 | 000 | [LlamaCpp] print_info: n_merges = 0
000000f873012626 0x000000000107ca80 | 000 | [LlamaCpp] print_info: BOS token = 1 '<s>'
000000f873e3880e 0x000000000107ca80 | 000 | [LlamaCpp] print_info: EOS token = 2 '</s>'
000000f874dab4d0 0x000000000107ca80 | 000 | [LlamaCpp] print_info: UNK token = 0 '<unk>'
000000f8763b2a35 0x000000000107ca80 | 000 | [LlamaCpp] print_info: LF token = 13 '<0x0A>'
000000f8770eb2bf 0x000000000107ca80 | 000 | [LlamaCpp] print_info: EOG token = 2 '</s>'
000000f877d4deae 0x000000000107ca80 | 000 | [LlamaCpp] print_info: max token length = 48
000000f87888d9c7 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: loading model tensors, this can take a while... (mmap = false, direct_io = false)
000000f879afad12 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 0 assigned to device CPU, is_swa = 0
000000f87ac8596a 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 1 assigned to device CPU, is_swa = 0
000000f87bb242dc 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 2 assigned to device CPU, is_swa = 0
000000f87cae6c2e 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 3 assigned to device CPU, is_swa = 0
000000f87da8e0c8 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 4 assigned to device CPU, is_swa = 0
000000f87e9c9849 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 5 assigned to device CPU, is_swa = 0
000000f87fa72051 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: layer 6 assigned to device CPU, is_swa = 0
000000f880a0d1e6 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor token_embd.weight
000000f8815a3f23 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor output_norm.weight
000000f882429bce 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor output.weight
000000f88337a452 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_norm.weight
000000f8842518cc 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_q.weight
000000f884ff21fc 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_k.weight
000000f885f80137 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_v.weight
000000f886e1da8c 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.attn_output.weight
000000f887cfdead 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_norm.weight
000000f888b0aa3c 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_gate.weight
000000f889ad4585 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_down.weight
000000f88aa7ddab 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.0.ffn_up.weight
000000f88b9bb1b0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_norm.weight
000000f88c75e2b8 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_q.weight
000000f88d443627 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_k.weight
000000f88e1c866c 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_v.weight
000000f88ef770b5 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.attn_output.weight
000000f88fe7b809 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_norm.weight
000000f890cb46a0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_gate.weight
000000f891de159a 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_down.weight
000000f892bd92fb 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.1.ffn_up.weight
000000f8939fd05d 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_norm.weight
000000f894988eaf 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_q.weight
000000f8959555ea 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_k.weight
000000f896962320 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_v.weight
000000f8979172de 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.attn_output.weight
000000f89863e12e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_norm.weight
000000f8998cfe9b 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_gate.weight
000000f89a6c1c2f 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_down.weight
000000f89ba61235 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.2.ffn_up.weight
000000f89c8731c0 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_norm.weight
000000f89d665242 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_q.weight
000000f89e4f9d4e 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_k.weight
000000f89f4383e1 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_v.weight
000000f8a02245db 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.attn_output.weight
000000f8a115bc9d 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_norm.weight
000000f8a1f947e2 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_gate.weight
000000f8a2df593f 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_down.weight
000000f8a3cc7ed1 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.3.ffn_up.weight
000000f8a49f1a3f 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_norm.weight
000000f8a58b459f 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_q.weight
000000f8a684f013 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_k.weight
000000f8a7750ac6 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_v.weight
000000f8a85b717d 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.attn_output.weight
000000f8a9482482 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_norm.weight
000000f8aa1beb41 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_gate.weight
000000f8ab0d65e8 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_down.weight
000000f8abf7cb60 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.4.ffn_up.weight
000000f8acce496f 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_norm.weight
000000f8adb1324d 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_q.weight
000000f8ae668206 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_k.weight
000000f8af417212 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_v.weight
000000f8b024c931 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.attn_output.weight
000000f8b1092158 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_norm.weight
000000f8b1ea57ef 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_gate.weight
000000f8b2c59ae4 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_down.weight
000000f8b3bb9db3 0x000000000107ca80 | 000 | [LlamaCpp] create_tensor: loading tensor blk.5.ffn_up.weight
000000f8b4b1c8d7 0x000000000107ca80 | 000 | [LlamaCpp] load_tensors: CPU model buffer size = 24.74 MiB
000000f8b5b79397 0x000000000107ca80 | 000 | [LlamaCpp] load_all_data: no device found for buffer type CPU for async uploads
000000fffe4d3aec 0x000000000107ca80 | 000 | [LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp] .[LlamaCpp]
0000010b8c137818 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: constructing llama_context
0000010b8d4675be 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_seq_max = 1
0000010b8e0c34ca 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx = 256
0000010b8ec9abd1 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx_seq = 256
0000010b8f7e0879 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_batch = 256
0000010b90666317 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ubatch = 256
0000010b912273b8 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: causal_attn = 1
0000010b91c865d9 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: flash_attn = auto
0000010b928890f7 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: kv_unified = false
0000010b9356a870 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: freq_base = 10000.0
0000010b9439e65d 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: freq_scale = 1
0000010b9505c70a 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: n_ctx_seq (256) > n_ctx_train (128) -- possible training context overflow
0000010b9656f679 0x000000000107ca80 | 000 | [LlamaCpp] set_abort_callback: call
0000010b9713cb32 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: CPU output buffer size = 0.12 MiB
0000010b981f6766 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 0: dev = CPU
0000010b98d71992 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 1: dev = CPU
0000010b99883907 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 2: dev = CPU
0000010b9a36942f 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 3: dev = CPU
0000010b9ae771d2 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 4: dev = CPU
0000010b9b94019a 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: layer 5: dev = CPU
0000010b9c475312 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: CPU KV buffer size = 1.69 MiB
0000010b9d197aae 0x000000000107ca80 | 000 | [LlamaCpp] llama_kv_cache: size = 1.69 MiB ( 256 cells, 6 layers, 1/1 seqs), K (f16): 0.84 MiB, V (f16): 0.84 MiB
0000010b9efe3710 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: enumerating backends
0000010b9fc7cc6f 0x000000000107ca80 | 000 | [LlamaCpp] llama_context: backend_ptrs.size() = 1
0000010ba09b24bb 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserving ...
0000010ba1a4dca8 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: max_nodes = 1024
0000010ba77134bf 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserving full memory module
0000010ba854d409 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: worst-case: n_tokens = 256, n_seqs = 1, n_outputs = 1
0000010ba95f4424 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
0000010bb29e8e82 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: Flash Attention was auto, set to enabled
0000010bb3c14142 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 256, n_seqs = 1, n_outputs = 256
0000010bbd916990 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 1, n_seqs = 1, n_outputs = 1
0000010bc46d0567 0x000000000107ca80 | 000 | [LlamaCpp] graph_reserve: reserving a graph for ubatch with n_tokens = 256, n_seqs = 1, n_outputs = 256
0000010bcfd42485 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: CPU compute buffer size = 33.04 MiB
0000010bd0d10317 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: graph nodes = 193
0000010bd1986f3d 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: graph splits = 1
0000010bd2491b2b 0x000000000107ca80 | 000 | [LlamaCpp] sched_reserve: reserve took 328.21 ms, sched copies = 1
0000010bd32d11aa 0x000000000107ca80 | 000 | [LlamaCpp] set_adapters_lora: adapters = 0000000000000000
0000010bd41caff3 0x000000000107ca80 | 000 | [LlamaCpp] adapters_lora_are_same: adapters = 0000000000000000
0000010bd5962490 0x000000000107ca80 | 000 | [LlamaCpp] set_warmup: value = 1
0000010bf63f093c 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] threadpool-create n_threads=4 strict=1 poll=50
0000010bf7651ba8 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] threadpool-map ith=0 mask_first=3 mask_valid=1
0000010bf8ab37d6 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] threadpool-map ith=1 mask_first=0 mask_valid=1
0000010bfa5221c1 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] threadpool-map ith=2 mask_first=1 mask_valid=1
0000010bfb6e37ed 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] threadpool-map ith=3 mask_first=2 mask_valid=1
0000010bfd9e460c 0x00000000010294e0 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-pre ith=1 target_cpu=0 mask_valid=1 enabled=1
0000010bfececa7e 0x00000000010294e0 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-post ith=1
0000010bffb82cc8 0x00000000010294e0 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-start ith=1 target_cpu=0 mask_valid=1
0000010c2739cb80 0x0000000001080120 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-pre ith=2 target_cpu=1 mask_valid=1 enabled=1
0000010c286eabc4 0x0000000001080120 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-post ith=2
0000010c2948bcdd 0x0000000001080120 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-start ith=2 target_cpu=1 mask_valid=1
0000010c50c44467 0x0000000001080430 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-pre ith=3 target_cpu=2 mask_valid=1 enabled=1
0000010c52112a15 0x0000000001080430 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-affinity-post ith=3
0000010c5302d496 0x0000000001080430 | 000 | [LlamaCpp] [ggml][AROS-SMP] worker-start ith=3 target_cpu=2 mask_valid=1
0000010c7e0ed6da 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] kickoff #0 n_threads=4 pause=0
0000010c7fdca9ce 0x000000000107ca80 | 000 | [LlamaCpp] [ggml][AROS-SMP] first-compute ith=0 target_cpu=3 cplan_threads=4
2 users reacted to this post
Jeff1138, Argo
terminillsMember
Posted 16 days agoIt's wrong but that's because it's running a 1.5B model... but this is a llamacpp front end running a 1.7GB AI model locally on AROS 64 SMP 
5 users reacted to this post
retrofaza, deadwood, olivier2222, Argo, x-vision
You do not have access to view attachments
Wow
CoolCat5000Junior Member
Posted 13 days agoVery cool hear about it,
I’m trying to have emu68 standalone and it allready has multiple cores mindset (it’s really hard to get most of the cores with parallel programming, the idea of multiple workloads makes lot of sense for me.
Dunno what we can expect for the future but have an AI model running makes me think of radxa Orion o6 that have a npu.
Anyway, atm I’m trying to get emu68 running standalone, but who know if we can have aros+emu68k+whatever …
A good smp support, with nice affinity, priorities, pinning etc maybe is not the best of arosland, but it is, at least for me, something for a far far far future. But, nonetheless cool to see smp and even cooler the sample beeing an ai model 😎
Best regards,
Ps: if, big if, I sucess with emu68 I will try to also add a software midi board (tiny sounfont), and I’m quite enthusiastic what we can do with those cores people sell and mostly are unused.
I’m trying to have emu68 standalone and it allready has multiple cores mindset (it’s really hard to get most of the cores with parallel programming, the idea of multiple workloads makes lot of sense for me.
Dunno what we can expect for the future but have an AI model running makes me think of radxa Orion o6 that have a npu.
Anyway, atm I’m trying to get emu68 running standalone, but who know if we can have aros+emu68k+whatever …
A good smp support, with nice affinity, priorities, pinning etc maybe is not the best of arosland, but it is, at least for me, something for a far far far future. But, nonetheless cool to see smp and even cooler the sample beeing an ai model 😎
Best regards,
Ps: if, big if, I sucess with emu68 I will try to also add a software midi board (tiny sounfont), and I’m quite enthusiastic what we can do with those cores people sell and mostly are unused.
2 users reacted to this post
Argo, terminills
terminillsMember
Posted 11 days agoSo while I was at it I decided to LoRa train a 1.5B model ... this is after the first round there's 5 more to go.
/home/terminills/Desktop/llama.cpp/build/bin/llama-cli -m outputs/qwen15b-aros-dapt-q8.gguf -p "Tell me how to create a window in Zune?" -n 400 --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Radeon PRO V620, gfx1030 (0x1030), VMM: no, Wave Size: 32
Device 1: AMD Radeon PRO V620, gfx1030 (0x1030), VMM: no, Wave Size: 32
build: 6662 (071e9e45) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon PRO V620) (0000:03:00.0) - 30668 MiB free
llama_model_load_from_file_impl: using device ROCm1 (AMD Radeon PRO V620) (0000:83:00.0) - 30668 MiB free
llama_model_loader: loaded meta data with 22 key-value pairs and 338 tensors from outputs/qwen15b-aros-dapt-q8.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen15B Aros Dapt Merged
llama_model_loader: - kv 3: general.size_label str = 1.5B
llama_model_loader: - kv 4: qwen2.block_count u32 = 28
llama_model_loader: - kv 5: qwen2.context_length u32 = 32768
llama_model_loader: - kv 6: qwen2.embedding_length u32 = 1536
llama_model_loader: - kv 7: qwen2.feed_forward_length u32 = 8960
llama_model_loader: - kv 8: qwen2.attention.head_count u32 = 12
llama_model_loader: - kv 9: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 10: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 11: general.file_type u32 = 7
llama_model_loader: - kv 12: general.quantization_version u32 = 2
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 21: tokenizer.chat_template str = {%- if tools %}n {{- '<|im_start|>...
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q8_0: 197 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 1.53 GiB (8.50 BPW)
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 32768
print_info: n_embd = 1536
print_info: n_layer = 28
print_info: n_head = 12
print_info: n_head_kv = 2
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 6
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8960
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = -1
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 32768
print_info: rope_finetuned = unknown
print_info: model type = 1.5B
print_info: model params = 1.54 B
print_info: general.name = Qwen15B Aros Dapt Merged
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors: ROCm0 model buffer size = 711.51 MiB
load_tensors: ROCm1 model buffer size = 853.12 MiB
load_tensors: CPU_Mapped model buffer size = 236.47 MiB
...........................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.58 MiB
llama_kv_cache: ROCm0 KV buffer size = 60.00 MiB
llama_kv_cache: ROCm1 KV buffer size = 52.00 MiB
llama_kv_cache: size = 112.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 56.00 MiB, V (f16): 56.00 MiB
llama_context: pipeline parallelism enabled (n_copies=4)
llama_context: Flash Attention was auto, set to enabled
llama_context: ROCm0 compute buffer size = 106.54 MiB
llama_context: ROCm1 compute buffer size = 339.80 MiB
llama_context: ROCm_Host compute buffer size = 35.05 MiB
llama_context: graph nodes = 959
llama_context: graph splits = 3
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|im_end|> logit bias = -inf
common_init_from_params: added <|fim_pad|> logit bias = -inf
common_init_from_params: added <|repo_name|> logit bias = -inf
common_init_from_params: added <|file_sep|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 64
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
system_info: n_threads = 64 (n_threads_batch = 64) / 128 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: interactive mode on.
sampler seed: 2247139198
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with ''.
- Not using system message. To change it, set a different value via -sys PROMPT
To create a window in Zune, you can use the Zune master class
#include <libraries/mui.h>
#include <proto/muimaster.h>
/* Create a simple window */
static Object *create_window(void)
{
struct MUI_CustomClass *windowClass;
/* Create a window class */
windowClass = MUI_CreateCustomClass
(
NULL, MUIC_Window, NULL, sizeof(struct WindowData), &windowFuncs
);
/* Create a window object */
Object *window = windowClass->mcc_Class->mcc_SuperClass->msd_Object;
return window;
}
/* Show the window */
static void show_window(Object *window)
{
/* If the window isn't already open,
* we need to open it.
*/
if (window && !XGET(window, MUIA_Window_Open))))
{
DoMethod(window, MUIM_Notify, MUIA_Window_CloseRequest, TRUE, window);
set(window, MUIA_Window_Open, TRUE);
return;
}
/* If the window is already open, we need to hide it.
*/
if (window)
{
set(window, MUIA_Window_Open, FALSE);
}
/* Show the window */
set(window, MUIA_Window_Open, TRUE);
/* Return the window object */
return window;
}
int main(int argc, char *argv[])
{
Object *window = NULL;
if (argc > 1)
{
// Get the first command line argument and parse it
char *commandLineArg = argv[1];
ULONG windowFlags;
// Parse the argument and set the flags
if (strnicmp(commandLineArg, "OPENWINDOW", 17) == 0
/home/terminills/Desktop/llama.cpp/build/bin/llama-cli -m outputs/qwen15b-aros-dapt-q8.gguf -p "Tell me how to create a window in Zune?" -n 400 --no-display-prompt
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 2 ROCm devices:
Device 0: AMD Radeon PRO V620, gfx1030 (0x1030), VMM: no, Wave Size: 32
Device 1: AMD Radeon PRO V620, gfx1030 (0x1030), VMM: no, Wave Size: 32
build: 6662 (071e9e45) with cc (Ubuntu 11.4.0-1ubuntu1~22.04.2) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_load_from_file_impl: using device ROCm0 (AMD Radeon PRO V620) (0000:03:00.0) - 30668 MiB free
llama_model_load_from_file_impl: using device ROCm1 (AMD Radeon PRO V620) (0000:83:00.0) - 30668 MiB free
llama_model_loader: loaded meta data with 22 key-value pairs and 338 tensors from outputs/qwen15b-aros-dapt-q8.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = qwen2
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Qwen15B Aros Dapt Merged
llama_model_loader: - kv 3: general.size_label str = 1.5B
llama_model_loader: - kv 4: qwen2.block_count u32 = 28
llama_model_loader: - kv 5: qwen2.context_length u32 = 32768
llama_model_loader: - kv 6: qwen2.embedding_length u32 = 1536
llama_model_loader: - kv 7: qwen2.feed_forward_length u32 = 8960
llama_model_loader: - kv 8: qwen2.attention.head_count u32 = 12
llama_model_loader: - kv 9: qwen2.attention.head_count_kv u32 = 2
llama_model_loader: - kv 10: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
llama_model_loader: - kv 11: general.file_type u32 = 7
llama_model_loader: - kv 12: general.quantization_version u32 = 2
llama_model_loader: - kv 13: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 14: tokenizer.ggml.pre str = qwen2
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 151645
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 151643
llama_model_loader: - kv 20: tokenizer.ggml.bos_token_id u32 = 151643
llama_model_loader: - kv 21: tokenizer.chat_template str = {%- if tools %}n {{- '<|im_start|>...
llama_model_loader: - type f32: 141 tensors
llama_model_loader: - type q8_0: 197 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type = Q8_0
print_info: file size = 1.53 GiB (8.50 BPW)
load: printing all EOG tokens:
load: - 151643 ('<|endoftext|>')
load: - 151645 ('<|im_end|>')
load: - 151662 ('<|fim_pad|>')
load: - 151663 ('<|repo_name|>')
load: - 151664 ('<|file_sep|>')
load: special tokens cache size = 22
load: token to piece cache size = 0.9310 MB
print_info: arch = qwen2
print_info: vocab_only = 0
print_info: n_ctx_train = 32768
print_info: n_embd = 1536
print_info: n_layer = 28
print_info: n_head = 12
print_info: n_head_kv = 2
print_info: n_rot = 128
print_info: n_swa = 0
print_info: is_swa_any = 0
print_info: n_embd_head_k = 128
print_info: n_embd_head_v = 128
print_info: n_gqa = 6
print_info: n_embd_k_gqa = 256
print_info: n_embd_v_gqa = 256
print_info: f_norm_eps = 0.0e+00
print_info: f_norm_rms_eps = 1.0e-06
print_info: f_clamp_kqv = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale = 0.0e+00
print_info: f_attn_scale = 0.0e+00
print_info: n_ff = 8960
print_info: n_expert = 0
print_info: n_expert_used = 0
print_info: causal attn = 1
print_info: pooling type = -1
print_info: rope type = 2
print_info: rope scaling = linear
print_info: freq_base_train = 10000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn = 32768
print_info: rope_finetuned = unknown
print_info: model type = 1.5B
print_info: model params = 1.54 B
print_info: general.name = Qwen15B Aros Dapt Merged
print_info: vocab type = BPE
print_info: n_vocab = 151936
print_info: n_merges = 151387
print_info: BOS token = 151643 '<|endoftext|>'
print_info: EOS token = 151645 '<|im_end|>'
print_info: EOT token = 151645 '<|im_end|>'
print_info: PAD token = 151643 '<|endoftext|>'
print_info: LF token = 198 'Ċ'
print_info: FIM PRE token = 151659 '<|fim_prefix|>'
print_info: FIM SUF token = 151661 '<|fim_suffix|>'
print_info: FIM MID token = 151660 '<|fim_middle|>'
print_info: FIM PAD token = 151662 '<|fim_pad|>'
print_info: FIM REP token = 151663 '<|repo_name|>'
print_info: FIM SEP token = 151664 '<|file_sep|>'
print_info: EOG token = 151643 '<|endoftext|>'
print_info: EOG token = 151645 '<|im_end|>'
print_info: EOG token = 151662 '<|fim_pad|>'
print_info: EOG token = 151663 '<|repo_name|>'
print_info: EOG token = 151664 '<|file_sep|>'
print_info: max token length = 256
load_tensors: loading model tensors, this can take a while... (mmap = true)
load_tensors: offloading 28 repeating layers to GPU
load_tensors: offloading output layer to GPU
load_tensors: offloaded 29/29 layers to GPU
load_tensors: ROCm0 model buffer size = 711.51 MiB
load_tensors: ROCm1 model buffer size = 853.12 MiB
load_tensors: CPU_Mapped model buffer size = 236.47 MiB
...........................................................................
llama_context: constructing llama_context
llama_context: n_seq_max = 1
llama_context: n_ctx = 4096
llama_context: n_ctx_per_seq = 4096
llama_context: n_batch = 2048
llama_context: n_ubatch = 512
llama_context: causal_attn = 1
llama_context: flash_attn = auto
llama_context: kv_unified = false
llama_context: freq_base = 10000.0
llama_context: freq_scale = 1
llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
llama_context: ROCm_Host output buffer size = 0.58 MiB
llama_kv_cache: ROCm0 KV buffer size = 60.00 MiB
llama_kv_cache: ROCm1 KV buffer size = 52.00 MiB
llama_kv_cache: size = 112.00 MiB ( 4096 cells, 28 layers, 1/1 seqs), K (f16): 56.00 MiB, V (f16): 56.00 MiB
llama_context: pipeline parallelism enabled (n_copies=4)
llama_context: Flash Attention was auto, set to enabled
llama_context: ROCm0 compute buffer size = 106.54 MiB
llama_context: ROCm1 compute buffer size = 339.80 MiB
llama_context: ROCm_Host compute buffer size = 35.05 MiB
llama_context: graph nodes = 959
llama_context: graph splits = 3
common_init_from_params: added <|endoftext|> logit bias = -inf
common_init_from_params: added <|im_end|> logit bias = -inf
common_init_from_params: added <|fim_pad|> logit bias = -inf
common_init_from_params: added <|repo_name|> logit bias = -inf
common_init_from_params: added <|file_sep|> logit bias = -inf
common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096
common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)
main: llama threadpool init, n_threads = 64
main: chat template is available, enabling conversation mode (disable it with -no-cnv)
*** User-specified prompt will pre-start conversation, did you mean to set --system-prompt (-sys) instead?
main: chat template example:
<|im_start|>system
You are a helpful assistant<|im_end|>
<|im_start|>user
Hello<|im_end|>
<|im_start|>assistant
Hi there<|im_end|>
<|im_start|>user
How are you?<|im_end|>
<|im_start|>assistant
system_info: n_threads = 64 (n_threads_batch = 64) / 128 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
main: interactive mode on.
sampler seed: 2247139198
sampler params:
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000
dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096
top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist
generate: n_ctx = 4096, n_batch = 2048, n_predict = 400, n_keep = 0
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to the AI.
- To return control without starting a new line, end your input with '/'.
- If you want to submit another line, end your input with ''.
- Not using system message. To change it, set a different value via -sys PROMPT
To create a window in Zune, you can use the Zune master class
Window' and its subclasses.
Here's a basic example of how you might create a window using Zune.
c#include <libraries/mui.h>
#include <proto/muimaster.h>
/* Create a simple window */
static Object *create_window(void)
{
struct MUI_CustomClass *windowClass;
/* Create a window class */
windowClass = MUI_CreateCustomClass
(
NULL, MUIC_Window, NULL, sizeof(struct WindowData), &windowFuncs
);
/* Create a window object */
Object *window = windowClass->mcc_Class->mcc_SuperClass->msd_Object;
return window;
}
/* Show the window */
static void show_window(Object *window)
{
/* If the window isn't already open,
* we need to open it.
*/
if (window && !XGET(window, MUIA_Window_Open))))
{
DoMethod(window, MUIM_Notify, MUIA_Window_CloseRequest, TRUE, window);
set(window, MUIA_Window_Open, TRUE);
return;
}
/* If the window is already open, we need to hide it.
*/
if (window)
{
set(window, MUIA_Window_Open, FALSE);
}
/* Show the window */
set(window, MUIA_Window_Open, TRUE);
/* Return the window object */
return window;
}
int main(int argc, char *argv[])
{
Object *window = NULL;
if (argc > 1)
{
// Get the first command line argument and parse it
char *commandLineArg = argv[1];
ULONG windowFlags;
// Parse the argument and set the flags
if (strnicmp(commandLineArg, "OPENWINDOW", 17) == 0
Edited by amigamia on 28-04-2026 22:20, 10 days ago
3 users reacted to this post
deadwood, retrofaza, miker1264
CoolCat5000Junior Member
Posted 11 days agoHi,
Are you doing this at home computer or using runpod?
I am complettelly token addicted, now I am waiting my token quota to reset for Claude, and have a good model without restriction$ would be really awesome.
(I found several issues in the emulator, copper and memory, that I would love to fix, and that probably should give me the happy hand screen, but I will have to wait 😒)
About training on amigaland stuff, does this interactive sample is good enough? Better than using something like context7 ?
In the end is all context engineering, fine tunning versus rag is something that I don’t know how to measure the difference. There are tons of tools out there for better workflow that I have to test, but I wasn’t thinking of fine tunning something, but if I would fine tunning I would prefer a mucher broader training (an automated scale of datasets)
Regards,
Are you doing this at home computer or using runpod?
I am complettelly token addicted, now I am waiting my token quota to reset for Claude, and have a good model without restriction$ would be really awesome.
(I found several issues in the emulator, copper and memory, that I would love to fix, and that probably should give me the happy hand screen, but I will have to wait 😒)
About training on amigaland stuff, does this interactive sample is good enough? Better than using something like context7 ?
In the end is all context engineering, fine tunning versus rag is something that I don’t know how to measure the difference. There are tons of tools out there for better workflow that I have to test, but I wasn’t thinking of fine tunning something, but if I would fine tunning I would prefer a mucher broader training (an automated scale of datasets)
Regards,
Edited by CoolCat5000 on 28-04-2026 09:19, 11 days ago
terminillsMember
Posted 8 days ago@CoolCat5000 - Hi,
Are you doing this at home computer or using runpod?
I am complettelly token addicted, now I am waiting my token quota to reset for Claude, and have a good model without restriction$ would be really awesome.
(I found several issues in the emulator, copper and memory, that I would love to fix, and that probably should give me the happy hand screen, but I will have to wait 😒)
About training on amigaland stuff, does this interactive sample is good enough? Better than using something like context7 ?
In the end is all context engineering, fine tunning versus rag is something that I don’t know how to measure the difference. There are tons of tools out there for better workflow that I have to test, but I wasn’t thinking of fine tunning something, but if I would fine tunning I would prefer a mucher broader training (an automated scale of datasets)
Regards,
I do it at home but I have multiple server class GPU's... The main point of doing a 1.5B model right now is allowing it to be semi usable on AROS SMP via CPU. it takes roughly 20 hours to LoRa train a 1.5B model, Also LorA training doesn't remove Linux or Windows code samples it just lowers the weights while adding more. So is a 1.5B model enough? I may or may not be but I plan on training up to a 70B model.
CoolCat5000Junior Member
Posted 7 days agoI don’t know what would be others opinion about narrator.device , but I think that the idea of have tts on amiga could be updated with a newer library or even a neural network
https://github.co...smol-audio
https://github.co...smol-audio
2 users reacted to this post
aha, Argo
terminillsMember
Posted 1 day agoSlight breakthrough ... Woke up the ASIC on My Radeon Cards. 
For those who don't know what that means it's the difference between a dumb framebuffer and 2D/3D acceleration on modern Radeon Cards.
[PSP] FW upload: raw_host=000000004262e000 raw_iova=0x4262e000 aligned_host=0000000042700000 aligned_iova=0x42700000 delta=0xd2000 size=1048576 alloc=2097152
[PSP] Destroying existing ring
[PSP] Ring created: IOVA=0x42618000 size=0x10000
[PSP] Loading TOC (1536 bytes) at IOVA=0x42700000
[PSP] LOAD_TOC OK: returned TMR size=0xa00000
[PSP] TMR VRAM reserve: mc=0x877d000000 system_pa=0x4777d000000 bar_phys=0x4777d000000 flags=0x00000002 cpu=unmapped offset=0x77d000000 size=0xa00000 total=30704MB visible=32768MB fb_base=0x00000000 fb_top=0x00000000 fb_offset=0x00000000
[PSP] TMR setup OK: mc=0x877d000000 system_pa=0x4777d000000 size=0xa00000
[PSP] Parsing MEC firmware binary (268160 bytes)...
[PSP] FW header: ver=1.0 ucode_size=267904 ucode_offset=0x100 total=268160
[PSP] Loading MEC ucode (267904 bytes)...
[PSP] Loading firmware type=4 size=267904 at IOVA=0x42700000
[PSP] MEC firmware loaded successfully!
[PSP] Parsing RLC firmware binary (132960 bytes)...
[PSP] FW header: ver=2.4 ucode_size=25088 ucode_offset=0x100 total=132960
[PSP] Loading RLC ucode (25088 bytes)...
[PSP] Loading firmware type=8 size=25088 at IOVA=0x42700000
[PSP] RLC firmware loaded successfully!
[PSP] Parsing PFP firmware binary (263424 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263168 ucode_offset=0x100 total=263424
[PSP] Loading PFP ucode (263168 bytes)...
[PSP] Loading firmware type=2 size=263168 at IOVA=0x42700000
[PSP] PFP firmware loaded successfully!
[PSP] Parsing ME firmware binary (263424 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263168 ucode_offset=0x100 total=263424
[PSP] Loading ME ucode (263168 bytes)...
[PSP] Loading firmware type=1 size=263168 at IOVA=0x42700000
[PSP] ME firmware loaded successfully!
[PSP] Parsing CE firmware binary (263296 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263040 ucode_offset=0x100 total=263296
[PSP] Loading CE ucode (263040 bytes)...
[PSP] Loading firmware type=3 size=263040 at IOVA=0x42700000
[PSP] CE firmware loaded successfully!
[PSP] Sending GFX_CMD_ID_AUTOLOAD_RLC (0x21) — final RLC autoload handshake
[PSP] Command 0x21 completed with error status=0xffff000d
[PSP] AUTOLOAD_RLC FAILED — RLC will not release CP, ring submit will hang
[PSP] === Post-firmware GPU Init ===
[PSP] RLC_CNTL = 0x00000001 (wrote 0x1)
[PSP] CP_MEC_DOORBELL_RANGE: lower=0x00000000 upper=0x000000f8
[PSP] SH_MEM_CONFIG=0x0000d000 SH_MEM_BASES=0x00020001
[PSP] === Post-firmware Init Complete ===
[PSP] === PSP Firmware Loading COMPLETE ===
[PSP-HIDD] PSP loader returned TRUE
[GFX] PSP load OK — bringup_path=PSP_SMU, verifying GFX wake...
[GFX] post-PSP probe: GRBM_STATUS=0x00003028 GRBM_STATUS2=0x00000006 CP_STAT=0x00000000
[GFX] post-PSP probe: grbm_reachable=1 cp_running=0
[GFX] CP wake: CP_ME_CNTL@0x86D8 before=0x15000000 writing 0x00000000 (clear CE/PFP/ME halt bits 24/26/28)
[GFX] CP wake: AFTER halt-clear: GRBM_STATUS=0xA0003028 GRBM_STATUS2=0x00000006 CP_STAT=0x80808000
[GFX] *** accel_enabled=1 — CP woke after halt-clear (PSP+MEC+RLC+CP up) ***
For those who don't know what that means it's the difference between a dumb framebuffer and 2D/3D acceleration on modern Radeon Cards.
[PSP] FW upload: raw_host=000000004262e000 raw_iova=0x4262e000 aligned_host=0000000042700000 aligned_iova=0x42700000 delta=0xd2000 size=1048576 alloc=2097152
[PSP] Destroying existing ring
[PSP] Ring created: IOVA=0x42618000 size=0x10000
[PSP] Loading TOC (1536 bytes) at IOVA=0x42700000
[PSP] LOAD_TOC OK: returned TMR size=0xa00000
[PSP] TMR VRAM reserve: mc=0x877d000000 system_pa=0x4777d000000 bar_phys=0x4777d000000 flags=0x00000002 cpu=unmapped offset=0x77d000000 size=0xa00000 total=30704MB visible=32768MB fb_base=0x00000000 fb_top=0x00000000 fb_offset=0x00000000
[PSP] TMR setup OK: mc=0x877d000000 system_pa=0x4777d000000 size=0xa00000
[PSP] Parsing MEC firmware binary (268160 bytes)...
[PSP] FW header: ver=1.0 ucode_size=267904 ucode_offset=0x100 total=268160
[PSP] Loading MEC ucode (267904 bytes)...
[PSP] Loading firmware type=4 size=267904 at IOVA=0x42700000
[PSP] MEC firmware loaded successfully!
[PSP] Parsing RLC firmware binary (132960 bytes)...
[PSP] FW header: ver=2.4 ucode_size=25088 ucode_offset=0x100 total=132960
[PSP] Loading RLC ucode (25088 bytes)...
[PSP] Loading firmware type=8 size=25088 at IOVA=0x42700000
[PSP] RLC firmware loaded successfully!
[PSP] Parsing PFP firmware binary (263424 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263168 ucode_offset=0x100 total=263424
[PSP] Loading PFP ucode (263168 bytes)...
[PSP] Loading firmware type=2 size=263168 at IOVA=0x42700000
[PSP] PFP firmware loaded successfully!
[PSP] Parsing ME firmware binary (263424 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263168 ucode_offset=0x100 total=263424
[PSP] Loading ME ucode (263168 bytes)...
[PSP] Loading firmware type=1 size=263168 at IOVA=0x42700000
[PSP] ME firmware loaded successfully!
[PSP] Parsing CE firmware binary (263296 bytes)...
[PSP] FW header: ver=1.0 ucode_size=263040 ucode_offset=0x100 total=263296
[PSP] Loading CE ucode (263040 bytes)...
[PSP] Loading firmware type=3 size=263040 at IOVA=0x42700000
[PSP] CE firmware loaded successfully!
[PSP] Sending GFX_CMD_ID_AUTOLOAD_RLC (0x21) — final RLC autoload handshake
[PSP] Command 0x21 completed with error status=0xffff000d
[PSP] AUTOLOAD_RLC FAILED — RLC will not release CP, ring submit will hang
[PSP] === Post-firmware GPU Init ===
[PSP] RLC_CNTL = 0x00000001 (wrote 0x1)
[PSP] CP_MEC_DOORBELL_RANGE: lower=0x00000000 upper=0x000000f8
[PSP] SH_MEM_CONFIG=0x0000d000 SH_MEM_BASES=0x00020001
[PSP] === Post-firmware Init Complete ===
[PSP] === PSP Firmware Loading COMPLETE ===
[PSP-HIDD] PSP loader returned TRUE
[GFX] PSP load OK — bringup_path=PSP_SMU, verifying GFX wake...
[GFX] post-PSP probe: GRBM_STATUS=0x00003028 GRBM_STATUS2=0x00000006 CP_STAT=0x00000000
[GFX] post-PSP probe: grbm_reachable=1 cp_running=0
[GFX] CP wake: CP_ME_CNTL@0x86D8 before=0x15000000 writing 0x00000000 (clear CE/PFP/ME halt bits 24/26/28)
[GFX] CP wake: AFTER halt-clear: GRBM_STATUS=0xA0003028 GRBM_STATUS2=0x00000006 CP_STAT=0x80808000
[GFX] *** accel_enabled=1 — CP woke after halt-clear (PSP+MEC+RLC+CP up) ***
Edited by terminills on 07-05-2026 18:40, 1 day ago
5 users reacted to this post
deadwood, retrofaza, aha, Argo, x-vision
You do not have access to view attachments
terminillsMember
Posted 12 hours agoAnd Compute Ring is officially taking commands. 
[RadeonsiVulkan] CreateInstance via vulkan.hidd dispatch (app='vk-compute-test'
[RadeonsiVulkan] RadeonsiInitRADVDriver()
[RadeonsiVulkan] RADV: PCI device 00007f5061c871c8 driver 00007f5061c652f8
[RadeonsiVulkan] RADV: === GPU Info ===
[RadeonsiVulkan] RADV: Vendor: 0x1002 Device: 0x73a1
[RadeonsiVulkan] RADV: Chip: Navi 21 (RDNA2) Generation: RDNA 2.0 (GFX10.3)
[RadeonsiVulkan] RADV: VRAM base: 0000047000000000 size: 32768 MB
[RadeonsiVulkan] RADV: MMIO base: 00007f5141e4a000 size: 512 KB
[RadeonsiVulkan] RADV: Headless: YES SubClass: 0x80
[RadeonsiVulkan] RADV: Initialized: YES Access: MMIO
[RadeonsiVulkan] RADV: ================
[RadeonsiVulkan] RADV: Instance created: 00007f50620e9180
[VFIOPCI] Found device 03:00.0, reading config reg 010
[VFIO] Reading 4 bytes from config space offset 0x10 (absolute 0x70000000010)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x10 size=4 => 0x0000000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=010 => 0000000c
[VFIOPCI] Found device 03:00.0, reading config reg 014
[VFIO] Reading 4 bytes from config space offset 0x14 (absolute 0x70000000014)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x14 size=4 => 0x00000470
[VFIOPCI] ReadConfigLong 03:00.0 reg=014 => 00000470
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x010 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x10 (absolute 0x70000000010)
[VFIO] *** BAR CONFIG WRITE: offset=0x10 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x014 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x14 (absolute 0x70000000014)
[VFIO] *** BAR CONFIG WRITE: offset=0x14 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] Found device 03:00.0, reading config reg 010
[VFIO] Reading 4 bytes from config space offset 0x10 (absolute 0x70000000010)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x10 size=4 => 0x0000000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=010 => 0000000c
[VFIOPCI] Found device 03:00.0, reading config reg 014
[VFIO] Reading 4 bytes from config space offset 0x14 (absolute 0x70000000014)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x14 size=4 => 0xfffffff8
[VFIOPCI] ReadConfigLong 03:00.0 reg=014 => fffffff8
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x010 val=0x0000000c
[VFIO] Writing 4 bytes to config space offset 0x10 (absolute 0x70000000010)
[VFIO] *** BAR CONFIG WRITE: offset=0x10 size=4 <= 0x0000000c
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x014 val=0x00000470
[VFIO] Writing 4 bytes to config space offset 0x14 (absolute 0x70000000014)
[VFIO] *** BAR CONFIG WRITE: offset=0x14 size=4 <= 0x00000470
[VFIO] Successfully wrote 4 bytes to config space
[RADV-WS] BAR0 (VRAM): addr=0x47000000000 size=32768MB
[VFIOPCI] Found device 03:00.0, reading config reg 018
[VFIO] Reading 4 bytes from config space offset 0x18 (absolute 0x70000000018)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x18 size=4 => 0x0180000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=018 => 0180000c
[VFIOPCI] Found device 03:00.0, reading config reg 01c
[VFIO] Reading 4 bytes from config space offset 0x1c (absolute 0x7000000001c)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x1c size=4 => 0x00000478
[VFIOPCI] ReadConfigLong 03:00.0 reg=01c => 00000478
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x018 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x18 (absolute 0x70000000018)
[VFIO] *** BAR CONFIG WRITE: offset=0x18 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x01c val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x1c (absolute 0x7000000001c)
[VFIO] *** BAR CONFIG WRITE: offset=0x1c size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] Found device 03:00.0, reading config reg 018
[VFIO] Reading 4 bytes from config space offset 0x18 (absolute 0x70000000018)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x18 size=4 => 0xffe0000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=018 => ffe0000c
[VFIOPCI] Found device 03:00.0, reading config reg 01c
[VFIO] Reading 4 bytes from config space offset 0x1c (absolute 0x7000000001c)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x1c size=4 => 0xffffffff
[VFIOPCI] ReadConfigLong 03:00.0 reg=01c => ffffffff
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x018 val=0x0180000c
[VFIO] Writing 4 bytes to config space offset 0x18 (absolute 0x70000000018)
[VFIO] *** BAR CONFIG WRITE: offset=0x18 size=4 <= 0x0180000c
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x01c val=0x00000478
[VFIO] Writing 4 bytes to config space offset 0x1c (absolute 0x7000000001c)
[VFIO] *** BAR CONFIG WRITE: offset=0x1c size=4 <= 0x00000478
[VFIO] Successfully wrote 4 bytes to config space
[RADV-WS] BAR2 (Doorbell): addr=0x47801800000 size=2048KB
[MMIO] MapPhysical: phys=00000000f7c00000, length=524288, flags=00000000
[MMIO] MapPhysical: Found registered mapping, phys=00000000f7c00000 -> virt=00007f5141e4a000 (offset 0x0 in slot 2)
[RADV-WS] MMIO mapped: phys=00000000f7c00000 virt=00007f5141e4a000 size=512KB
[RADV-WS] NBIO doorbell BEFORE: APER_EN=0x00000000 SELFRING_CNTL=0x00000100 FENCE=0x00000000
[RADV-WS] NBIO doorbell AFTER: APER_EN=0x00000001 SELFRING_CNTL=0x00000003 FENCE=0x00000000
[RADV-WS] NBIO SELFRING base: lo=0x01800000 hi=0x00000478 (db_pa=0x47801800000)
[RADV-WS] PCI backend: VFIOPCI [VFIO] [SR-IOV/VF]
[RADV-WS] Created winsys for AMD 1002:73a1 at 03:00.0 [HEADLESS] [VFIO] [SR-IOV/VF]
[RADV-WS] VRAM: 32768MB Doorbell: 2048KB MMIO: 512KB
[RADV-WS] === Compute Ring Init (GFX10.3) ===
[RADV-WS] Using GFX10 register table
[PSP] FB_LOCATION_BASE=0x047000 -> MC 0x47000000000 outside PSP 40-bit window; using GFX10 default 0x8000000000
[VFIOPCI] MapPCI addr=0000047002000000 len=77824
[EXEC] RamLib: OpenLibrary("mmio.library", 1)
[EXEC] RamLib: OpenLibrary("mmio.library", 1) = 00007f5061e15440
[MMIO] RegisterMapping: Invalid parameters
[VFIOPCI] MapPCI: 0000047002000000 -> 00007f4863b3b000 (BAR0, dev 03:00.0)
[RADV-WS] VRAM queue buffers mapped:
[RADV-WS] BAR phys=0000047002000000 cpu=00007f4863b3b000 mapper=PCI offset=0x2000000 size=0x13000 mc_base=0x8000000000
[RADV-WS] Ring: cpu=00007f4863b3b000 mc=0x8002000000 size=0x10000
[RADV-WS] EOP: cpu=00007f4863b4b000 mc=0x8002010000 size=0x1000
[RADV-WS] WPTR: cpu=00007f4863b4c000 mc=0x8002011000 size=0x1000
[RADV-WS] RPTR: mc=0x8002011100
[RADV-WS] MQD: cpu=00007f4863b4d000 mc=0x8002012000 size=0x1000
[MMIO] MapPhysical: phys=0000047801800000, length=2097152, flags=00000000
[MMIO] MapPhysical: Found registered mapping, phys=0000047801800000 -> virt=00007f486193b000 (offset 0x0 in slot 1)
[RADV-WS] Doorbells mapped: base=00007f486193b000 KIQ offset=0x0 -> 00007f486193b000 compute offset=0x18 -> 00007f486193b018
[RADV-WS] CP_MEC_DOORBELL_RANGE: lower=0x00000000 upper=0x00000450 (KIQ=0x0 compute=0x18)
[RADV-WS] RLC_CP_SCHEDULERS scheduler0=KIQ selector=0xc8 before=0x585048c8 after=0x585048c8 readback=0x585048c8
[RADV-WS] Compute init path: psp_already_loaded=0 (skip MEC restart + GFXHUB reinit if 1)
[RADV-WS] CP_MEC_CNTL = 0x00000000 (offset 0x86d4) MEC_ME1_HALT=0 MEC_ME2_HALT=0
[RADV-WS] CP_MEC_CNTL = 0x00000000 (offset 0x86d4)
[RADV-WS] MEC_ME1_HALT = 0, MEC_ME2_HALT = 0
[RADV-WS] GRBM_STATUS2 = 0x00000000 (CPC_BUSY=0 CPF_BUSY=0 RLC_BUSY=0)
[RADV-WS] CP_CPC_STATUS = 0x00000000 (MEC1_BUSY=0 MEC2_BUSY=0)
[RADV-WS] CP_CPC_BUSY_STAT = 0x00000000
[RADV-WS] CP_CPC_STALLED = 0x00000000
[RADV-WS] Halting MEC for firmware restart...
[RADV-WS] CP_MEC_CNTL after halt = 0x50000000
[RADV-WS] Unhalting MEC (firmware restarts from address 0)...
[RADV-WS] CP_MEC_CNTL after unhalt = 0x00000000
[RADV-WS] CP_CPC_STATUS (post-restart) = 0x00000000 (MEC1_BUSY=0)
[RADV-WS] Current HQD_ACTIVE = 0x00000001
[RADV-WS] Queue already active — dequeuing first
[RADV-WS] Graceful dequeue timed out — forcing hard disable
[RADV-WS] HQD_ACTIVE after hard disable = 0x00000000
[RADV-WS] === Initializing GFXHUB VM ===
[RADV-WS] GCVM_CONTEXT0_CNTL (cold): before=0x007ffe81 after=0x007ffe81
[RADV-WS] FB_LOCATION_BASE = 0x00047000 (wrote 0x00047000) -> VRAM 0x47000000000
[RADV-WS] FB_LOCATION_TOP = 0x000477ff (wrote 0x000477ff) -> VRAM end 0x477ff000000
[RADV-WS] SYS_APERTURE_LOW = 0x00000000 (wrote 0x00000000) -> 0x0
[RADV-WS] SYS_APERTURE_HIGH = 0x00004000 (wrote 0x00004000) -> 0x100000000
[RADV-WS] L1_TLB_CNTL = 0x00001859 (wrote 0x00001859)
[RADV-WS] CONTEXT0_CNTL = 0x007ffe81 (wrote 0x00000001)
[RADV-WS] L2_CNTL = 0x00080801 (wrote 0x00080801)
[RADV-WS] ================================
[RADV-WS] HDP flush (normal-mqd): req 0x00000000->0x00000004 done 0x00000000->0x00000004 mask=0x00000004 polls=0 memsize=0x000077f0 OK
[RADV-WS] === HQD Register Dump (before activate) ===
[RADV-WS] MQD_BASE = 0x00000080_02012000 (mqd iova=0x8002012000)
[RADV-WS] PQ_BASE = 0x00000000_80020000 (ring iova=0x8002000000)
[RADV-WS] PQ_CONTROL = 0xd000860d
[RADV-WS] DOORBELL_CTL = 0x40000018
[RADV-WS] WPTR_POLL = 0x00000080_02011000 (wptr iova=0x8002011000)
[RADV-WS] RPTR_REPORT = 0x00000080_02011100 (rptr iova=0x8002011100)
[RADV-WS] EOP_BASE = 0x00000000_80020100
[RADV-WS] EOP_CONTROL = 0x00000005
[RADV-WS] VMID = 0x00000000
[RADV-WS] PQ_RPTR = 0x00000000
[RADV-WS] PQ_WPTR = 0x00000000_00000000
[RADV-WS] PERSISTENT = 0x00005300
[RADV-WS] QUANTUM = 0x80000a11
[RADV-WS] MQD_CONTROL = 0x00000000
[RADV-WS] IB_CONTROL = 0x00300000
[RADV-WS] ================================================
[RADV-WS] Activating compute queue directly (KIQ path unavailable/failed)...
[RADV-WS] CP_PQ_STATUS doorbell enable: before=0x00000003 after=0x00000003
[RADV-WS] HQD_ACTIVE after enable = 0x00000001
[RADV-WS] === GFXHUB VM Diagnostic ===
[RADV-WS] GRBM_STATUS = 0xa0003028
[RADV-WS] GCMC_VM_FB_LOCATION_BASE = 0x00047000 (VRAM start = 0x47000000000)
[RADV-WS] GCMC_VM_FB_LOCATION_TOP = 0x000477ff (VRAM end = 0x477ff000000)
[RADV-WS] GCMC_VM_SYS_APERTURE_LOW = 0x00000000 (low = 0x0)
[RADV-WS] GCMC_VM_SYS_APERTURE_HIGH = 0x00004000 (high = 0x100000000)
[RADV-WS] GCVM_CONTEXT0_CNTL = 0x007ffe81
[RADV-WS] GCVM_L2_PROT_FAULT_STATUS = 0x00000992
[RADV-WS] Ring IOVA = 0x8002000000 WPTR IOVA = 0x8002011000
[RADV-WS] ================================
[RADV-WS] === Compute Ring Init SUCCESS ===
[RADV-WS] === GRBM Context Selection Sanity Test (HQD_VMID) ===
[RADV-WS] NOTE: GRBM_GFX_CNTL is WRITE-ONLY (readback=0 is normal)
[RADV-WS] q0 wrote VMID=0x5, read 0x5 OK
[RADV-WS] q1 wrote VMID=0xA, read 0xa0a FAIL
[RADV-WS] q0 readback after q1 write = 0x5 OK (per-queue HQD context isolated)
[RADV-WS] saved q0 VMID = 0x0 (restored)
[RADV-WS] ================================
[RADV-WS] === NOP Proof-of-Life Test ===
[RADV-WS] Ring buffer head (pre-submit), wptr=0:
[RADV-WS] ring[0] = 0x00000000
[RADV-WS] ring[1] = 0x00000000
[RADV-WS] ring[2] = 0x00000000
[RADV-WS] ring[3] = 0x00000000
[RADV-WS] ring[4] = 0x00000000
[RADV-WS] ring[5] = 0x00000000
[RADV-WS] ring[6] = 0x00000000
[RADV-WS] ring[7] = 0x00000000
[RADV-WS] HDP flush (pre-doorbell): req 0x00000004->0x00000004 done 0x00000004->0x00000004 mask=0x00000004 polls=0 memsize=0x000077f0 OK
[RADV-WS] NOP submitted: wptr=2
[RADV-WS] Doorbell kicked (64-bit) at 00007f486193b018
[RADV-WS] WPTR also written via MMIO: HQD_WPTR=0x00000000_00000002
[RADV-WS] DOORBELL_CTL pre-kick =0x40000018 post-kick=0xc0000018
[RADV-WS] DOORBELL_CTL bits: EN=1 HIT=1 SCHD_HIT=0 SOURCE=0
[RADV-WS] DOORBELL_HIT (bit31) 0 -> 1 OK (MEC saw the kick)
[RADV-WS] RPTR report = 0 (expected 2)
[RADV-WS] Ring buffer head (post-submit), wptr=2:
[RADV-WS] ring[0] = 0xc0001000
[RADV-WS] ring[1] = 0xdeadbeef
[RADV-WS] ring[2] = 0x00000000
[RADV-WS] ring[3] = 0x00000000
[RADV-WS] ring[4] = 0x00000000
[RADV-WS] ring[5] = 0x00000000
[RADV-WS] ring[6] = 0x00000000
[RADV-WS] ring[7] = 0x00000000
[RADV-WS] NOP NOT consumed: RPTR=0 expected=2 after 500000 polls
[RADV-WS] RPTR memory dump: [0x0]=00000000 [0x4]=00000000 [0x8]=00000000 [0xC]=00000000
[RADV-WS] EOP memory dump: [0x0]=00000000 [0x4]=00000000 [0x8]=00000000 [0xC]=00000000
[RADV-WS] HQD_PQ_RPTR (MMIO) = 0x00000000
[RADV-WS] HQD_ACTIVE (MMIO) = 0x00000001
[RADV-WS] GCVM_L2_PROT_FAULT_STATUS = 0x00000992
[RADV-WS] ================================
[RADV-WS] Compute ring init: SUCCESS
[RADV-PD] step1 sync_types ok ptr=00007f50646b2e10
[RADV-WS] GPU: PCI 03:00.0 device=0x73a1 family=0x8f rev=0x28 GFX10.3
[RADV-WS] GB_ADDR_CONFIG = 0x00000444 (reg=0x98f8)
[RADV-WS] Probed topology: 1 SE, 2 SA/SE, 1 RB, 9 CU, 1 TCC
[RADV-WS] Identified: AMD NAVI21 (family=80 gfx_level=13 vram=32768M
[RADV-PD] step2 query_info ok family=80 gfx_level=13 max_align=0 vram_kb=33554432
[RADV-PD] step3 addrlib_create -> 00007f50620f8540
[RADV-PD] step4 radv_is_gpu_supported ok family=80
[RADV-PD] step6 init_cache_key ok
[RADV-PD] step7 get_nir_options ok
[RADV-PD] step8 cache_uuid ok
[RADV-PD] step9 mesa_bytes_to_hex ok
[RADV-PD] step10 disk_cache skipped on AROS
[RADV-PD] step11 -> get_physical_device_properties
[RADV-PD] step12 properties ok
[RADV-PD] step13 decoder ok
[RADV-PD] step14 encoder ok
[RADV-PD] step15 queue_table ok
[RADV-PD] step16 perfcounters ok
[RADV-PD] step17 init_wsi -> 0
[RADV-PD] step99 try_create reached SUCCESS tail
[RadeonsiVulkan] RADV: Found 1 physical device(s)
[RadeonsiVulkan] RADV: Physical device: 00007f50620eae60
[RadeonsiVulkan] CreateDevice via vulkan.hidd dispatch (physDev=00007f50620e9180 pCreateInfo=0000000000000000)
[RADV-WS] ctx_create: 00007f50620f90b0 priority=1
[RADV-WS] cs_create: 00007f50620f90e0 ip=0
[RADV-WS] buffer_create: 00007f5062115e90 size=4096 VA=0x100000000 domain=2
[RadeonsiVulkan] CreateDevice: device 00007f50620f92c0
[RadeonsiVulkan] GetDeviceQueue2(dev=00007f50620f92c0 family=0 idx=0)
[RadeonsiVulkan] GetDeviceQueue2 -> 00007f506210fbe0
[RadeonsiVulkan] CreateShaderModule(dev=00007f50620f92c0 codeSize=140)
[RadeonsiVulkan] CreateShaderModule -> 00007f50621115d0 (result=0)
[RadeonsiVulkan] CreatePipelineLayout(dev=00007f50620f92c0 setLayouts=0 pushConst=0)
[RadeonsiVulkan] CreatePipelineLayout -> 00007f5062111890 (result=0)
[RadeonsiVulkan] CreateComputePipelines(dev=00007f50620f92c0 count=1)
[RADV-WS] buffer_create: 00007f50621125f0 size=262144 VA=0x100001000 domain=4
[RadeonsiVulkan] CreateComputePipelines -> result=0 pipeline[0]=00007f5062111e50
[RadeonsiVulkan] CreateCommandPool(dev=00007f50620f92c0 family=0 flags=0x2)
[RadeonsiVulkan] CreateCommandPool -> 00007f5062114d10 (result=0)
[RadeonsiVulkan] AllocateCommandBuffers(dev=00007f50620f92c0 pool=00007f5062114d10 level=0 count=1)
[RADV-WS] cs_create: 00007f5062114a60 ip=0
[RadeonsiVulkan] AllocateCommandBuffers -> result=0 cmd[0]=00007f5062116f90
[RadeonsiVulkan] BeginCommandBuffer(cmd=00007f5062116f90 flags=0x1)
[RADV-WS] buffer_create: 00007f5062115cb0 size=16384 VA=0x100041000 domain=2
[RadeonsiVulkan] BeginCommandBuffer -> 0
[RadeonsiVulkan] CmdBindPipeline(cmd=00007f5062116f90 bindPoint=1 pipeline=00007f5062111e50)
[RadeonsiVulkan] CmdDispatch(cmd=00007f5062116f90 groups=1,1,1)
[RadeonsiVulkan] EndCommandBuffer(cmd=00007f5062116f90)
[RadeonsiVulkan] EndCommandBuffer -> 0
[RadeonsiVulkan] QueueSubmit(queue=00007f506210fbe0 submitCount=1 fence=0000000000000000)
[RadeonsiVulkan] QueueSubmit -> 0
[RadeonsiVulkan] QueueWaitIdle(queue=00007f506210fbe0)
[RADV-WS] cs_submit: ip=0 cs_count=0
[RADV-WS] cs_submit: no command streams, returning VK_SUCCESS
[RadeonsiVulkan] QueueWaitIdle -> 0
[RadeonsiVulkan] FreeCommandBuffers(dev=00007f50620f92c0 pool=00007f5062114d10 count=1)
[RADV-WS] buffer_destroy: 00007f5062115cb0 size=16384
[RadeonsiVulkan] DestroyCommandPool(dev=00007f50620f92c0 pool=00007f5062114d10)
[RadeonsiVulkan] DestroyPipeline(dev=00007f50620f92c0 pipeline=00007f5062111e50)
[RADV-WS] buffer_destroy: 00007f50621125f0 size=262144
[RadeonsiVulkan] DestroyPipelineLayout(dev=00007f50620f92c0 layout=00007f5062111890)
[RadeonsiVulkan] DestroyShaderModule(dev=00007f50620f92c0 module=00007f50621115d0)
[RadeonsiVulkan] DestroyDevice via vulkan.hidd dispatch
[RADV-WS] buffer_destroy: 00007f5062115e90 size=4096
[RADV-WS] ctx_destroy: 00007f50620f90b0
[RadeonsiVulkan] RADV device destroyed
[RadeonsiVulkan] DestroyInstance via vulkan.hidd dispatch
[RadeonsiVulkan] RadeonsiCleanupRADVForGPU()
[RADV-WS] Destroying winsys for PCI 1002:73a1
[MMIO] UnmapPhysical: addr=00007f5141e4a000, length=524288
[RadeonsiVulkan] RADV instance destroyed
[RadeonsiVulkan] CreateInstance via vulkan.hidd dispatch (app='vk-compute-test'
[RadeonsiVulkan] RadeonsiInitRADVDriver()
[RadeonsiVulkan] RADV: PCI device 00007f5061c871c8 driver 00007f5061c652f8
[RadeonsiVulkan] RADV: === GPU Info ===
[RadeonsiVulkan] RADV: Vendor: 0x1002 Device: 0x73a1
[RadeonsiVulkan] RADV: Chip: Navi 21 (RDNA2) Generation: RDNA 2.0 (GFX10.3)
[RadeonsiVulkan] RADV: VRAM base: 0000047000000000 size: 32768 MB
[RadeonsiVulkan] RADV: MMIO base: 00007f5141e4a000 size: 512 KB
[RadeonsiVulkan] RADV: Headless: YES SubClass: 0x80
[RadeonsiVulkan] RADV: Initialized: YES Access: MMIO
[RadeonsiVulkan] RADV: ================
[RadeonsiVulkan] RADV: Instance created: 00007f50620e9180
[VFIOPCI] Found device 03:00.0, reading config reg 010
[VFIO] Reading 4 bytes from config space offset 0x10 (absolute 0x70000000010)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x10 size=4 => 0x0000000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=010 => 0000000c
[VFIOPCI] Found device 03:00.0, reading config reg 014
[VFIO] Reading 4 bytes from config space offset 0x14 (absolute 0x70000000014)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x14 size=4 => 0x00000470
[VFIOPCI] ReadConfigLong 03:00.0 reg=014 => 00000470
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x010 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x10 (absolute 0x70000000010)
[VFIO] *** BAR CONFIG WRITE: offset=0x10 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x014 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x14 (absolute 0x70000000014)
[VFIO] *** BAR CONFIG WRITE: offset=0x14 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] Found device 03:00.0, reading config reg 010
[VFIO] Reading 4 bytes from config space offset 0x10 (absolute 0x70000000010)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x10 size=4 => 0x0000000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=010 => 0000000c
[VFIOPCI] Found device 03:00.0, reading config reg 014
[VFIO] Reading 4 bytes from config space offset 0x14 (absolute 0x70000000014)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x14 size=4 => 0xfffffff8
[VFIOPCI] ReadConfigLong 03:00.0 reg=014 => fffffff8
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x010 val=0x0000000c
[VFIO] Writing 4 bytes to config space offset 0x10 (absolute 0x70000000010)
[VFIO] *** BAR CONFIG WRITE: offset=0x10 size=4 <= 0x0000000c
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x014 val=0x00000470
[VFIO] Writing 4 bytes to config space offset 0x14 (absolute 0x70000000014)
[VFIO] *** BAR CONFIG WRITE: offset=0x14 size=4 <= 0x00000470
[VFIO] Successfully wrote 4 bytes to config space
[RADV-WS] BAR0 (VRAM): addr=0x47000000000 size=32768MB
[VFIOPCI] Found device 03:00.0, reading config reg 018
[VFIO] Reading 4 bytes from config space offset 0x18 (absolute 0x70000000018)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x18 size=4 => 0x0180000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=018 => 0180000c
[VFIOPCI] Found device 03:00.0, reading config reg 01c
[VFIO] Reading 4 bytes from config space offset 0x1c (absolute 0x7000000001c)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x1c size=4 => 0x00000478
[VFIOPCI] ReadConfigLong 03:00.0 reg=01c => 00000478
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x018 val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x18 (absolute 0x70000000018)
[VFIO] *** BAR CONFIG WRITE: offset=0x18 size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x01c val=0xffffffff
[VFIO] Writing 4 bytes to config space offset 0x1c (absolute 0x7000000001c)
[VFIO] *** BAR CONFIG WRITE: offset=0x1c size=4 <= 0xffffffff
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] Found device 03:00.0, reading config reg 018
[VFIO] Reading 4 bytes from config space offset 0x18 (absolute 0x70000000018)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x18 size=4 => 0xffe0000c
[VFIOPCI] ReadConfigLong 03:00.0 reg=018 => ffe0000c
[VFIOPCI] Found device 03:00.0, reading config reg 01c
[VFIO] Reading 4 bytes from config space offset 0x1c (absolute 0x7000000001c)
[VFIO] Successfully read 4 bytes from config space
[VFIO] *** BAR CONFIG: offset=0x1c size=4 => 0xffffffff
[VFIOPCI] ReadConfigLong 03:00.0 reg=01c => ffffffff
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x018 val=0x0180000c
[VFIO] Writing 4 bytes to config space offset 0x18 (absolute 0x70000000018)
[VFIO] *** BAR CONFIG WRITE: offset=0x18 size=4 <= 0x0180000c
[VFIO] Successfully wrote 4 bytes to config space
[VFIOPCI] WriteConfigLong 03:00.0 reg=0x01c val=0x00000478
[VFIO] Writing 4 bytes to config space offset 0x1c (absolute 0x7000000001c)
[VFIO] *** BAR CONFIG WRITE: offset=0x1c size=4 <= 0x00000478
[VFIO] Successfully wrote 4 bytes to config space
[RADV-WS] BAR2 (Doorbell): addr=0x47801800000 size=2048KB
[MMIO] MapPhysical: phys=00000000f7c00000, length=524288, flags=00000000
[MMIO] MapPhysical: Found registered mapping, phys=00000000f7c00000 -> virt=00007f5141e4a000 (offset 0x0 in slot 2)
[RADV-WS] MMIO mapped: phys=00000000f7c00000 virt=00007f5141e4a000 size=512KB
[RADV-WS] NBIO doorbell BEFORE: APER_EN=0x00000000 SELFRING_CNTL=0x00000100 FENCE=0x00000000
[RADV-WS] NBIO doorbell AFTER: APER_EN=0x00000001 SELFRING_CNTL=0x00000003 FENCE=0x00000000
[RADV-WS] NBIO SELFRING base: lo=0x01800000 hi=0x00000478 (db_pa=0x47801800000)
[RADV-WS] PCI backend: VFIOPCI [VFIO] [SR-IOV/VF]
[RADV-WS] Created winsys for AMD 1002:73a1 at 03:00.0 [HEADLESS] [VFIO] [SR-IOV/VF]
[RADV-WS] VRAM: 32768MB Doorbell: 2048KB MMIO: 512KB
[RADV-WS] === Compute Ring Init (GFX10.3) ===
[RADV-WS] Using GFX10 register table
[PSP] FB_LOCATION_BASE=0x047000 -> MC 0x47000000000 outside PSP 40-bit window; using GFX10 default 0x8000000000
[VFIOPCI] MapPCI addr=0000047002000000 len=77824
[EXEC] RamLib: OpenLibrary("mmio.library", 1)
[EXEC] RamLib: OpenLibrary("mmio.library", 1) = 00007f5061e15440
[MMIO] RegisterMapping: Invalid parameters
[VFIOPCI] MapPCI: 0000047002000000 -> 00007f4863b3b000 (BAR0, dev 03:00.0)
[RADV-WS] VRAM queue buffers mapped:
[RADV-WS] BAR phys=0000047002000000 cpu=00007f4863b3b000 mapper=PCI offset=0x2000000 size=0x13000 mc_base=0x8000000000
[RADV-WS] Ring: cpu=00007f4863b3b000 mc=0x8002000000 size=0x10000
[RADV-WS] EOP: cpu=00007f4863b4b000 mc=0x8002010000 size=0x1000
[RADV-WS] WPTR: cpu=00007f4863b4c000 mc=0x8002011000 size=0x1000
[RADV-WS] RPTR: mc=0x8002011100
[RADV-WS] MQD: cpu=00007f4863b4d000 mc=0x8002012000 size=0x1000
[MMIO] MapPhysical: phys=0000047801800000, length=2097152, flags=00000000
[MMIO] MapPhysical: Found registered mapping, phys=0000047801800000 -> virt=00007f486193b000 (offset 0x0 in slot 1)
[RADV-WS] Doorbells mapped: base=00007f486193b000 KIQ offset=0x0 -> 00007f486193b000 compute offset=0x18 -> 00007f486193b018
[RADV-WS] CP_MEC_DOORBELL_RANGE: lower=0x00000000 upper=0x00000450 (KIQ=0x0 compute=0x18)
[RADV-WS] RLC_CP_SCHEDULERS scheduler0=KIQ selector=0xc8 before=0x585048c8 after=0x585048c8 readback=0x585048c8
[RADV-WS] Compute init path: psp_already_loaded=0 (skip MEC restart + GFXHUB reinit if 1)
[RADV-WS] CP_MEC_CNTL = 0x00000000 (offset 0x86d4) MEC_ME1_HALT=0 MEC_ME2_HALT=0
[RADV-WS] CP_MEC_CNTL = 0x00000000 (offset 0x86d4)
[RADV-WS] MEC_ME1_HALT = 0, MEC_ME2_HALT = 0
[RADV-WS] GRBM_STATUS2 = 0x00000000 (CPC_BUSY=0 CPF_BUSY=0 RLC_BUSY=0)
[RADV-WS] CP_CPC_STATUS = 0x00000000 (MEC1_BUSY=0 MEC2_BUSY=0)
[RADV-WS] CP_CPC_BUSY_STAT = 0x00000000
[RADV-WS] CP_CPC_STALLED = 0x00000000
[RADV-WS] Halting MEC for firmware restart...
[RADV-WS] CP_MEC_CNTL after halt = 0x50000000
[RADV-WS] Unhalting MEC (firmware restarts from address 0)...
[RADV-WS] CP_MEC_CNTL after unhalt = 0x00000000
[RADV-WS] CP_CPC_STATUS (post-restart) = 0x00000000 (MEC1_BUSY=0)
[RADV-WS] Current HQD_ACTIVE = 0x00000001
[RADV-WS] Queue already active — dequeuing first
[RADV-WS] Graceful dequeue timed out — forcing hard disable
[RADV-WS] HQD_ACTIVE after hard disable = 0x00000000
[RADV-WS] === Initializing GFXHUB VM ===
[RADV-WS] GCVM_CONTEXT0_CNTL (cold): before=0x007ffe81 after=0x007ffe81
[RADV-WS] FB_LOCATION_BASE = 0x00047000 (wrote 0x00047000) -> VRAM 0x47000000000
[RADV-WS] FB_LOCATION_TOP = 0x000477ff (wrote 0x000477ff) -> VRAM end 0x477ff000000
[RADV-WS] SYS_APERTURE_LOW = 0x00000000 (wrote 0x00000000) -> 0x0
[RADV-WS] SYS_APERTURE_HIGH = 0x00004000 (wrote 0x00004000) -> 0x100000000
[RADV-WS] L1_TLB_CNTL = 0x00001859 (wrote 0x00001859)
[RADV-WS] CONTEXT0_CNTL = 0x007ffe81 (wrote 0x00000001)
[RADV-WS] L2_CNTL = 0x00080801 (wrote 0x00080801)
[RADV-WS] ================================
[RADV-WS] HDP flush (normal-mqd): req 0x00000000->0x00000004 done 0x00000000->0x00000004 mask=0x00000004 polls=0 memsize=0x000077f0 OK
[RADV-WS] === HQD Register Dump (before activate) ===
[RADV-WS] MQD_BASE = 0x00000080_02012000 (mqd iova=0x8002012000)
[RADV-WS] PQ_BASE = 0x00000000_80020000 (ring iova=0x8002000000)
[RADV-WS] PQ_CONTROL = 0xd000860d
[RADV-WS] DOORBELL_CTL = 0x40000018
[RADV-WS] WPTR_POLL = 0x00000080_02011000 (wptr iova=0x8002011000)
[RADV-WS] RPTR_REPORT = 0x00000080_02011100 (rptr iova=0x8002011100)
[RADV-WS] EOP_BASE = 0x00000000_80020100
[RADV-WS] EOP_CONTROL = 0x00000005
[RADV-WS] VMID = 0x00000000
[RADV-WS] PQ_RPTR = 0x00000000
[RADV-WS] PQ_WPTR = 0x00000000_00000000
[RADV-WS] PERSISTENT = 0x00005300
[RADV-WS] QUANTUM = 0x80000a11
[RADV-WS] MQD_CONTROL = 0x00000000
[RADV-WS] IB_CONTROL = 0x00300000
[RADV-WS] ================================================
[RADV-WS] Activating compute queue directly (KIQ path unavailable/failed)...
[RADV-WS] CP_PQ_STATUS doorbell enable: before=0x00000003 after=0x00000003
[RADV-WS] HQD_ACTIVE after enable = 0x00000001
[RADV-WS] === GFXHUB VM Diagnostic ===
[RADV-WS] GRBM_STATUS = 0xa0003028
[RADV-WS] GCMC_VM_FB_LOCATION_BASE = 0x00047000 (VRAM start = 0x47000000000)
[RADV-WS] GCMC_VM_FB_LOCATION_TOP = 0x000477ff (VRAM end = 0x477ff000000)
[RADV-WS] GCMC_VM_SYS_APERTURE_LOW = 0x00000000 (low = 0x0)
[RADV-WS] GCMC_VM_SYS_APERTURE_HIGH = 0x00004000 (high = 0x100000000)
[RADV-WS] GCVM_CONTEXT0_CNTL = 0x007ffe81
[RADV-WS] GCVM_L2_PROT_FAULT_STATUS = 0x00000992
[RADV-WS] Ring IOVA = 0x8002000000 WPTR IOVA = 0x8002011000
[RADV-WS] ================================
[RADV-WS] === Compute Ring Init SUCCESS ===
[RADV-WS] === GRBM Context Selection Sanity Test (HQD_VMID) ===
[RADV-WS] NOTE: GRBM_GFX_CNTL is WRITE-ONLY (readback=0 is normal)
[RADV-WS] q0 wrote VMID=0x5, read 0x5 OK
[RADV-WS] q1 wrote VMID=0xA, read 0xa0a FAIL
[RADV-WS] q0 readback after q1 write = 0x5 OK (per-queue HQD context isolated)
[RADV-WS] saved q0 VMID = 0x0 (restored)
[RADV-WS] ================================
[RADV-WS] === NOP Proof-of-Life Test ===
[RADV-WS] Ring buffer head (pre-submit), wptr=0:
[RADV-WS] ring[0] = 0x00000000
[RADV-WS] ring[1] = 0x00000000
[RADV-WS] ring[2] = 0x00000000
[RADV-WS] ring[3] = 0x00000000
[RADV-WS] ring[4] = 0x00000000
[RADV-WS] ring[5] = 0x00000000
[RADV-WS] ring[6] = 0x00000000
[RADV-WS] ring[7] = 0x00000000
[RADV-WS] HDP flush (pre-doorbell): req 0x00000004->0x00000004 done 0x00000004->0x00000004 mask=0x00000004 polls=0 memsize=0x000077f0 OK
[RADV-WS] NOP submitted: wptr=2
[RADV-WS] Doorbell kicked (64-bit) at 00007f486193b018
[RADV-WS] WPTR also written via MMIO: HQD_WPTR=0x00000000_00000002
[RADV-WS] DOORBELL_CTL pre-kick =0x40000018 post-kick=0xc0000018
[RADV-WS] DOORBELL_CTL bits: EN=1 HIT=1 SCHD_HIT=0 SOURCE=0
[RADV-WS] DOORBELL_HIT (bit31) 0 -> 1 OK (MEC saw the kick)
[RADV-WS] RPTR report = 0 (expected 2)
[RADV-WS] Ring buffer head (post-submit), wptr=2:
[RADV-WS] ring[0] = 0xc0001000
[RADV-WS] ring[1] = 0xdeadbeef
[RADV-WS] ring[2] = 0x00000000
[RADV-WS] ring[3] = 0x00000000
[RADV-WS] ring[4] = 0x00000000
[RADV-WS] ring[5] = 0x00000000
[RADV-WS] ring[6] = 0x00000000
[RADV-WS] ring[7] = 0x00000000
[RADV-WS] NOP NOT consumed: RPTR=0 expected=2 after 500000 polls
[RADV-WS] RPTR memory dump: [0x0]=00000000 [0x4]=00000000 [0x8]=00000000 [0xC]=00000000
[RADV-WS] EOP memory dump: [0x0]=00000000 [0x4]=00000000 [0x8]=00000000 [0xC]=00000000
[RADV-WS] HQD_PQ_RPTR (MMIO) = 0x00000000
[RADV-WS] HQD_ACTIVE (MMIO) = 0x00000001
[RADV-WS] GCVM_L2_PROT_FAULT_STATUS = 0x00000992
[RADV-WS] ================================
[RADV-WS] Compute ring init: SUCCESS
[RADV-PD] step1 sync_types ok ptr=00007f50646b2e10
[RADV-WS] GPU: PCI 03:00.0 device=0x73a1 family=0x8f rev=0x28 GFX10.3
[RADV-WS] GB_ADDR_CONFIG = 0x00000444 (reg=0x98f8)
[RADV-WS] Probed topology: 1 SE, 2 SA/SE, 1 RB, 9 CU, 1 TCC
[RADV-WS] Identified: AMD NAVI21 (family=80 gfx_level=13 vram=32768M
[RADV-PD] step2 query_info ok family=80 gfx_level=13 max_align=0 vram_kb=33554432
[RADV-PD] step3 addrlib_create -> 00007f50620f8540
[RADV-PD] step4 radv_is_gpu_supported ok family=80
[RADV-PD] step6 init_cache_key ok
[RADV-PD] step7 get_nir_options ok
[RADV-PD] step8 cache_uuid ok
[RADV-PD] step9 mesa_bytes_to_hex ok
[RADV-PD] step10 disk_cache skipped on AROS
[RADV-PD] step11 -> get_physical_device_properties
[RADV-PD] step12 properties ok
[RADV-PD] step13 decoder ok
[RADV-PD] step14 encoder ok
[RADV-PD] step15 queue_table ok
[RADV-PD] step16 perfcounters ok
[RADV-PD] step17 init_wsi -> 0
[RADV-PD] step99 try_create reached SUCCESS tail
[RadeonsiVulkan] RADV: Found 1 physical device(s)
[RadeonsiVulkan] RADV: Physical device: 00007f50620eae60
[RadeonsiVulkan] CreateDevice via vulkan.hidd dispatch (physDev=00007f50620e9180 pCreateInfo=0000000000000000)
[RADV-WS] ctx_create: 00007f50620f90b0 priority=1
[RADV-WS] cs_create: 00007f50620f90e0 ip=0
[RADV-WS] buffer_create: 00007f5062115e90 size=4096 VA=0x100000000 domain=2
[RadeonsiVulkan] CreateDevice: device 00007f50620f92c0
[RadeonsiVulkan] GetDeviceQueue2(dev=00007f50620f92c0 family=0 idx=0)
[RadeonsiVulkan] GetDeviceQueue2 -> 00007f506210fbe0
[RadeonsiVulkan] CreateShaderModule(dev=00007f50620f92c0 codeSize=140)
[RadeonsiVulkan] CreateShaderModule -> 00007f50621115d0 (result=0)
[RadeonsiVulkan] CreatePipelineLayout(dev=00007f50620f92c0 setLayouts=0 pushConst=0)
[RadeonsiVulkan] CreatePipelineLayout -> 00007f5062111890 (result=0)
[RadeonsiVulkan] CreateComputePipelines(dev=00007f50620f92c0 count=1)
[RADV-WS] buffer_create: 00007f50621125f0 size=262144 VA=0x100001000 domain=4
[RadeonsiVulkan] CreateComputePipelines -> result=0 pipeline[0]=00007f5062111e50
[RadeonsiVulkan] CreateCommandPool(dev=00007f50620f92c0 family=0 flags=0x2)
[RadeonsiVulkan] CreateCommandPool -> 00007f5062114d10 (result=0)
[RadeonsiVulkan] AllocateCommandBuffers(dev=00007f50620f92c0 pool=00007f5062114d10 level=0 count=1)
[RADV-WS] cs_create: 00007f5062114a60 ip=0
[RadeonsiVulkan] AllocateCommandBuffers -> result=0 cmd[0]=00007f5062116f90
[RadeonsiVulkan] BeginCommandBuffer(cmd=00007f5062116f90 flags=0x1)
[RADV-WS] buffer_create: 00007f5062115cb0 size=16384 VA=0x100041000 domain=2
[RadeonsiVulkan] BeginCommandBuffer -> 0
[RadeonsiVulkan] CmdBindPipeline(cmd=00007f5062116f90 bindPoint=1 pipeline=00007f5062111e50)
[RadeonsiVulkan] CmdDispatch(cmd=00007f5062116f90 groups=1,1,1)
[RadeonsiVulkan] EndCommandBuffer(cmd=00007f5062116f90)
[RadeonsiVulkan] EndCommandBuffer -> 0
[RadeonsiVulkan] QueueSubmit(queue=00007f506210fbe0 submitCount=1 fence=0000000000000000)
[RadeonsiVulkan] QueueSubmit -> 0
[RadeonsiVulkan] QueueWaitIdle(queue=00007f506210fbe0)
[RADV-WS] cs_submit: ip=0 cs_count=0
[RADV-WS] cs_submit: no command streams, returning VK_SUCCESS
[RadeonsiVulkan] QueueWaitIdle -> 0
[RadeonsiVulkan] FreeCommandBuffers(dev=00007f50620f92c0 pool=00007f5062114d10 count=1)
[RADV-WS] buffer_destroy: 00007f5062115cb0 size=16384
[RadeonsiVulkan] DestroyCommandPool(dev=00007f50620f92c0 pool=00007f5062114d10)
[RadeonsiVulkan] DestroyPipeline(dev=00007f50620f92c0 pipeline=00007f5062111e50)
[RADV-WS] buffer_destroy: 00007f50621125f0 size=262144
[RadeonsiVulkan] DestroyPipelineLayout(dev=00007f50620f92c0 layout=00007f5062111890)
[RadeonsiVulkan] DestroyShaderModule(dev=00007f50620f92c0 module=00007f50621115d0)
[RadeonsiVulkan] DestroyDevice via vulkan.hidd dispatch
[RADV-WS] buffer_destroy: 00007f5062115e90 size=4096
[RADV-WS] ctx_destroy: 00007f50620f90b0
[RadeonsiVulkan] RADV device destroyed
[RadeonsiVulkan] DestroyInstance via vulkan.hidd dispatch
[RadeonsiVulkan] RadeonsiCleanupRADVForGPU()
[RADV-WS] Destroying winsys for PCI 1002:73a1
[MMIO] UnmapPhysical: addr=00007f5141e4a000, length=524288
[RadeonsiVulkan] RADV instance destroyed
4 users reacted to this post
retrofaza, aha, Argo, x-vision
You can view all discussion threads in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.
You cannot start a new discussion thread in this forum.
You cannot reply in this discussion thread.
You cannot start on a poll in this forum.
You cannot upload attachments in this forum.
You cannot download attachments in this forum.
Moderator: Administrator, Moderators
