 Tempo14
			's Collections
			Tempo14
			's Collections
			
			
		quantization
		
	updated
			
 
				
				
 - BiLLM: Pushing the Limit of Post-Training Quantization for LLMs- 
			Paper
			 •- 
			2402.04291
			 •
			Published
				
			•- 
				50
			 
 - OneBit: Towards Extremely Low-bit Large Language Models- 
			Paper
			 •- 
			2402.11295
			 •
			Published
				
			•- 
				24
			 
 - A Survey on Transformer Compression- 
			Paper
			 •- 
			2402.05964
			 •
			Published
				
			•- 
				1
			 
 - Towards Next-Level Post-Training Quantization of Hyper-Scale
  Transformers- 
			Paper
			 •- 
			2402.08958
			 •
			Published
				
			•- 
				6
			 
 - BitDelta: Your Fine-Tune May Only Be Worth One Bit- 
			Paper
			 •- 
			2402.10193
			 •
			Published
				
			•- 
				22
			 
 - GPTVQ: The Blessing of Dimensionality for LLM Quantization- 
			Paper
			 •- 
			2402.15319
			 •
			Published
				
			•- 
				22
			 
 - EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs- 
			Paper
			 •- 
			2403.02775
			 •
			Published
				
			•- 
				13
			 
 - 4-bit Shampoo for Memory-Efficient Network Training- 
			Paper
			 •- 
			2405.18144
			 •
			Published
				
			•- 
				12
			 
 - PrefixQuant: Static Quantization Beats Dynamic through Prefixed Outliers
  in LLMs- 
			Paper
			 •- 
			2410.05265
			 •
			Published
				
			•- 
				33
			 
 - BitNet a4.8: 4-bit Activations for 1-bit LLMs- 
			Paper
			 •- 
			2411.04965
			 •
			Published
				
			•- 
				69
			 
 - "Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
  Quantization- 
			Paper
			 •- 
			2411.02355
			 •
			Published
				
			•- 
				51
			 
 - NeuZip: Memory-Efficient Training and Inference with Dynamic Compression
  of Neural Networks- 
			Paper
			 •- 
			2410.20650
			 •
			Published
				
			•- 
				17
			 
 - BitStack: Fine-Grained Size Control for Compressed Large Language Models
  in Variable Memory Environments- 
			Paper
			 •- 
			2410.23918
			 •
			Published
				
			•- 
				21