Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
			Paper
			•
			2402.12030
			•
			Published
				
			•
				
				3
			
The ULD loss, based on optimal transport, enables distillation across different LLM families without requiring shared tokenizers.
 
				 
				 
				 
				 
				 
				 
				