Configuration Parsing Warning: In adapter_config.json: "peft.task_type" must be a string

Uploaded Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B model

Prompt format: ChatML

Recommended system prompt: You are a helpful assistant that translates Japanese to English.

Recommended sampling settings: temperature 0.5 (or lower), repetition penalty 1.04 (or higher if needed)

Merged model: mpasila/Llama-3.1-Swallow-JP-EN-Translator-v1-8B

Training used LoRA rank 128 and alpha set to 32. Context length was set to 16384. But the there's more data in 8k context length so using 8k context length will likely perform better.

Training data was this: mpasila/ParallelFiction-Ja_En-1k-16k-Gemma-3-ShareGPT-Filtered

Original dataset (before filtering/cleaning): NilanE/ParallelFiction-Ja_En-100k

Developed by: mpasila
License: Llama 3.3 and Gemma
Finetuned from model : tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.

Downloads last month: 5

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mpasila/Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Finetuned

tokyotech-llm/Llama-3.1-Swallow-8B-v0.5

Adapter

(1)

this model

mpasila
/

Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B

Uploaded Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B model

Model tree for mpasila/Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B

Datasets used to train mpasila/Llama-3.1-Swallow-JP-EN-Translator-v1-LoRA-8B