ByteFlow Net - 1 Level Regex Rate Distortion

This is an ByteFlow Net trained with regex rate distortion chunking.

Model Details

  • Model Name: BFlowNet_1B_1levels_regex_rate_100bt
  • Architecture: BFlowNet with adaptive layers
  • Parameters: ~1B parameters
  • Training Step: 350,000
  • Sequence Length: 8192
  • Vocabulary Size: 258

Architecture Details

Model Configuration

  • Dimensions: [512, 2048]
  • Head Dimensions: [64, 128]
  • Layers: [6, 24]
  • Sliding Windows: [512, 4096]
  • Max Sequence Lengths: [8192, 3200]

Block Configuration

  • Dimension: 512
  • Number of Layers: 8
  • RoPE Theta: 500000.0
  • Norm Epsilon: 1e-05

Training Details

Data

  • Dataset: fineweb_edu_100bt
  • Batch Size: 19
  • Tokenizer: bytes
  • Chunking Type: regex_rate_distortion

Optimization

  • Learning Rate: 0.0006
  • Weight Decay: 0.1
  • Scheduler: cosine
  • Warmup Steps: 10000

Distributed Training

  • Data Parallel Replicas: 8
  • Model Dtype: bf16
  • FSDP Type: full_shard

Usage

This model uses a custom AUNet architecture with regex rate distortion chunking. The checkpoint contains distributed model weights that need to be loaded with the appropriate framework.

# Example loading code would go here
# Note: This requires the specific AUNet framework used for training

Evaluation Tasks

The model was evaluated on the following tasks:

  • hellaswag
  • boolq
  • piqa
  • social_iqa
  • winogrande
  • openbookqa
  • arc_easy
  • arc_challenge
  • race
  • commonsense_qa
  • copa

Training Configuration

The complete training configuration is preserved in the uploaded files.

Files Description

  • *.distcp: Distributed checkpoint files containing model weights
  • params.json: Model parameters and configuration
  • train_state_*.json: Training state information including optimizer states
  • config.yaml: Complete training configuration

Citation

If you use this model, please cite the AUNet paper and methodology.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support