Spaces:

Livengood
/

Instance-VRAM-Calculator

Running

App Files Files Community

Instance-VRAM-Calculator / README.md

Livengood

Init

b148141 24 days ago

preview code

raw

history blame contribute delete

1.72 kB

	---
	title: VRAM Calculator
	emoji: 🧮
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.44.0
	app_file: app.py
	pinned: false
	license: mit
	short_description: Calculate VRAM requirements for HuggingFace models
	tags:
	- vram
	- gpu
	- inference
	- deployment
	---

	# 🧮 VRAM & Instance Type Calculator

	Estimate GPU memory requirements for any HuggingFace model and get cloud instance recommendations.

	## Features

	- Automatic model analysis: Fetches parameter count, dtype, and architecture from HF Hub
	- KV cache estimation: Calculates memory for different context lengths
	- GPU recommendations: Shows which GPUs can run the model (RTX 3090 → H100)
	- Cloud instance mapping: Suggests AWS/GCP instance types with pricing
	- Quantization guidance: Suggests INT8/INT4 options for large models

	## How it works

	1. Fetches `safetensors` metadata for parameter count and dtype
	2. Downloads `config.json` for architecture details (layers, hidden size, KV heads)
	3. Calculates:
	- Model weights: `params × dtype_bytes`
	- KV cache: `2 × layers × batch × seq_len × kv_heads × head_dim × dtype_bytes`
	- Adds ~15% overhead for activations

	## Limitations

	- Estimates are for inference, not training
	- Actual VRAM varies by serving framework (vLLM vs TGI vs vanilla)
	- GGUF/quantized models have different memory profiles
	- Does not account for tensor parallelism across multiple GPUs

	## Usage

	```python
	# Or run locally:
	pip install gradio huggingface_hub
	python app.py
	```

	## Contributing

	PRs welcome! Ideas for improvement:
	- Add support for GGUF models
	- Include throughput estimates
	- Add more cloud providers (Azure, Lambda Labs)
	- Support tensor parallelism calculations