Gradient Masks Repository

This repository contains gradient masks generated during training of model.

Overview

Gradient masks are boolean tensors that indicate which parameters have significant gradients during training. These masks can be used to identify important parameters for fine-tuning or to create sparse models.

Usage

from huggingface_hub import hf_hub_download
import torch

# Download masks for a specific step
mask_path = hf_hub_download(
    repo_id="israel-adewuyi/Qwen2.5-0.5B-Instruct-grad_masks",
    filename="masks/beta_<beta>_step_<step>_tolerance_<tolerance>.pt"
)
masks = torch.load(mask_path, map_location="cpu")

# Apply masks to a model
for name, param in model.named_parameters():
    if name in masks:
        mask = masks[name].to(param.device)
        param.requires_grad = mask  # Set requires_grad based on mask

Mask Generation

Tolerance: [1e-05, 1e-06, 1e-07]
Beta (EMA decay): [90, 95, 99]
Epsilon: 1e-08
Base Model: model

Files

masks/beta_<beta>_step_<step>_tolerance_<tolerance>.pt: Boolean masks for each training step
metadata/beta_<beta>_step_<step>_tolerance_<tolerance>_info.json: Metadata about each mask set

License

This repository is licensed under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support