jerryzh168 commited on
Commit
a939a34
·
verified ·
1 Parent(s): 9ccf45e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -19,7 +19,7 @@ language:
19
 
20
 
21
 
22
- Calibrated with 30 samples of `mmlu_philosophy`, got eval accuracy of 81.35, while gemma-3-27b-it-INT4 is 77.17, and bfloat16 baseline is 79.42
23
 
24
 
25
  # Inference with vLLM
@@ -221,7 +221,7 @@ We rely on [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-h
221
  | Benchmark | | | |
222
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
223
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
224
- | philosophy | 79.42 | 77.17 | 81.35 |
225
 
226
 
227
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4
@@ -255,7 +255,7 @@ lm_eval --model hf --model_args pretrained=$MODEL --tasks mmlu --device cuda:0 -
255
  | Benchmark | | | |
256
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
257
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
258
- | Peak Memory (GB) | 55.01 | 17.21 (69% reduction) | 28.82 (48% reduction) |
259
 
260
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4
261
 
@@ -317,8 +317,8 @@ print(f"Peak Memory Usage: {mem:.02f} GB")
317
  | Benchmark (Latency) | | | |
318
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
319
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
320
- | latency (batch_size=1) | 7.46s | 4.84 (1.54x speedup) | 5.55s (1.34x speedup) |
321
- | latency (batch_size=256) | 39.55s | 33.30 (1.19x speedup) | 34.39s (1.15x speedup) |
322
 
323
 
324
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4
 
19
 
20
 
21
 
22
+ Calibrated with 30 samples of `mmlu_philosophy`, got eval accuracy of 80.06, while gemma-3-27b-it-INT4 is 77.17, and bfloat16 baseline is 79.42
23
 
24
 
25
  # Inference with vLLM
 
221
  | Benchmark | | | |
222
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
223
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
224
+ | philosophy | 79.42 | 77.17 | 80.06 |
225
 
226
 
227
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4
 
255
  | Benchmark | | | |
256
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
257
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
258
+ | Peak Memory (GB) | 55.01 | 17.21 (69% reduction) | 27.66 (50% reduction) |
259
 
260
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4
261
 
 
317
  | Benchmark (Latency) | | | |
318
  |----------------------------------|------------------------|--------------------------------|---------------------------------|
319
  | | google/gemma-3-27b-it | jerryzh168/gemma-3-27b-it-INT4 | pytorch/gemma-3-27b-it-AWQ-INT4 |
320
+ | latency (batch_size=1) | 7.46s | 4.84 (1.54x speedup) | 4.92s (1.34x speedup) |
321
+ | latency (batch_size=256) | 39.55s | 33.30 (1.19x speedup) | 28.37s (1.15x speedup) |
322
 
323
 
324
  Note: jerryzh168/gemma-3-27b-it-INT4 is the H100 optimized checkpoint for INT4