Marine1: Underwater Acoustic Classifier πŸŒŠπŸ‹

⚠️ IMPORTANT: Use Safetensors Format
This repository contains both .pth (pickle) and .safetensors formats.
Please use the .safetensors files to avoid pickle security vulnerabilities.
See How to Use section below for examples.

Model Description

Marine1 is a state-of-the-art deep learning model for classifying underwater acoustic events. Built on a fine-tuned ResNet18 architecture, it achieves 98.33% accuracy in distinguishing between four categories of marine soundscapes.

This model is designed for marine biologists, oceanographers, researchers, and conservationists working with underwater audio data.

Model Details

  • Model Type: Audio Classification (CNN-based)
  • Architecture: Fine-tuned ResNet18 (transfer learning from ImageNet)
  • Input: Audio files (WAV, MP3, FLAC) - max 10 seconds
  • Output: 4-class classification with confidence scores
  • Framework: PyTorch 2.0+
  • Parameters: ~11M parameters
  • Training Time: ~10 minutes (4 epochs)
  • Format: Available in both safetensors (recommended) and PyTorch formats

πŸ”’ Security Note

This model is available in safetensors format, which is the recommended secure format that avoids pickle vulnerabilities.

Why Safetensors?

  • βœ… No arbitrary code execution risks - Safe to load from any source
  • βœ… Fast loading times - Optimized performance
  • βœ… Memory-efficient - Better resource usage
  • βœ… Cross-platform compatibility - Works everywhere

⚠️ The .pth files are provided for backward compatibility only.
Always prefer .safetensors for production use.

Learn more: Hugging Face Safetensors Documentation

Categories

The model classifies underwater sounds into four distinct categories:

  1. πŸ‹ Marine Animals (marine_animal)

    • Whales, dolphins, orcas, and other marine mammals
    • Fish sounds and biological vocalizations
  2. 🚒 Vessels (vessel)

    • Ships, boats, submarines
    • Maritime traffic and propeller sounds
  3. 🌊 Natural Sounds (natural_sound)

    • Ocean waves, water movement
    • Bubbles, rain, and environmental sounds
  4. πŸ”§ Other Anthropogenic (other_anthropogenic)

    • Human-made sounds (non-vessel)
    • Underwater construction, sonar, etc.

Performance

Metric Value
Accuracy 98.33%
Balanced Accuracy 91.67%
Training Epochs 4
Training Time ~10 minutes

Per-Class Performance

  • Marine Animals: Excellent (high precision/recall)
  • Vessels: Excellent (high precision/recall)
  • Natural Sounds: Good (limited training data)
  • Other Anthropogenic: Good (limited training data)

How to Use

Installation

pip install torch torchaudio librosa numpy safetensors

Quick Start (Recommended - Safetensors)

import torch
import librosa
import numpy as np
from safetensors.torch import load_file

# Load model (secure format)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
state_dict = load_file("best_model_finetuned.safetensors", device=str(device))

# Load and process audio
audio_path = "underwater_sound.wav"
y, sr = librosa.load(audio_path, sr=16000, duration=10.0)

# Create mel spectrogram
mel_spec = librosa.feature.melspectrogram(
    y=y, sr=sr, n_mels=128, n_fft=2048, 
    hop_length=512, fmax=8000
)
log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max)

# Prepare input
input_tensor = torch.FloatTensor(log_mel_spec).unsqueeze(0).unsqueeze(0).to(device)

# Predict
model.eval()
with torch.no_grad():
    outputs = model(input_tensor)
    probabilities = torch.nn.functional.softmax(outputs, dim=1)[0]
    predicted_class = probabilities.argmax().item()
    confidence = probabilities[predicted_class].item()

# Map to class names
class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"]
print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")

Using the Inference Class (Easiest)

from huggingface_hub import hf_hub_download
from inference import Marine1Classifier

# Download model (safetensors format - secure!)
model_path = hf_hub_download(
    repo_id="shiv207/Marine1",
    filename="best_model_finetuned.safetensors"
)

# Initialize classifier
classifier = Marine1Classifier(model_path)

# Make prediction
result = classifier.predict("underwater_sound.wav")
print(f"Prediction: {result['predicted_class']}")
print(f"Confidence: {result['confidence']*100:.2f}%")

Using the Complete Pipeline

For a full-featured implementation with preprocessing and JSON output:

# Clone the repository
git clone https://github.com/shiv207/underwater-audio-classifier.git
cd underwater-audio-classifier

# Install dependencies
pip install -r requirements.txt

# Run prediction (supports both .pth and .safetensors)
python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors

# Generate UDA-compliant JSON
python generate_json.py --audio your_audio.wav --output result.json

Streamlit Web Interface

streamlit run app.py

Features:

  • Drag-and-drop multiple audio files
  • Batch processing with progress tracking
  • Visual spectrograms and probability charts
  • Export results as JSON (individual or batch)
  • Event detection mode with temporal localization

Training Data

The model was trained on a diverse dataset of underwater acoustic recordings:

  • Marine Animals: Whale songs, dolphin clicks, orca vocalizations
  • Vessels: Various ship types (cargo, tanker, passenger, tug)
  • Natural Sounds: Ocean waves, water movement, environmental sounds
  • Other Sounds: Anthropogenic non-vessel sounds

Note: Natural sounds and other anthropogenic categories have limited training samples (13 each), which may affect performance on edge cases.

Technical Details

Audio Processing Pipeline

  1. Resampling: Audio resampled to 16kHz
  2. Duration: Truncated/padded to 10 seconds
  3. Spectrogram: 128-band mel spectrogram (n_fft=2048, hop_length=512)
  4. Normalization: Log-scale conversion with dB normalization
  5. Augmentation (training only): Speed, pitch, noise injection

Model Architecture

ResNet18 (Pre-trained on ImageNet)
β”œβ”€β”€ Conv1 (frozen)
β”œβ”€β”€ Layer1 (frozen)
β”œβ”€β”€ Layer2 (fine-tuned)
β”œβ”€β”€ Layer3 (fine-tuned)
β”œβ”€β”€ Layer4 (fine-tuned)
└── FC Layer (4 classes, trained from scratch)

Training Configuration

  • Optimizer: Adam (lr=0.0001)
  • Loss: CrossEntropyLoss
  • Batch Size: 8
  • Epochs: 4
  • Data Split: 80% train, 20% validation
  • Device: CPU/MPS (Apple Silicon compatible)

Limitations

  1. Limited Data: Natural sounds and other anthropogenic categories have fewer training samples
  2. Duration: Optimized for 10-second audio clips
  3. Sample Rate: Best performance at 16kHz
  4. Domain: Trained on specific underwater environments; may need fine-tuning for different acoustic conditions
  5. Background Noise: Performance may degrade with high noise levels

Intended Use

Primary Use Cases

  • Marine biology research and species monitoring
  • Vessel traffic analysis and maritime surveillance
  • Environmental impact assessment
  • Underwater acoustic event detection
  • Marine conservation and ecosystem monitoring

Out-of-Scope Use

  • Real-time streaming (model requires preprocessing)
  • Non-underwater audio classification
  • Medical or security applications
  • High-stakes decision making without human oversight

Ethical Considerations

  • This model is intended for research and conservation purposes
  • Should not be used as the sole basis for regulatory or enforcement decisions
  • May have biases based on training data distribution
  • Performance varies across different marine environments
  • Users should validate results in their specific context

Citation

If you use this model in your research, please cite:

@misc{marine1-underwater-classifier,
  author = {Shivam Sharma},
  title = {Marine1: Underwater Acoustic Classifier},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/shiv207/Marine1}},
}

Model Variants

This repository includes multiple model formats:

Safetensors Format (πŸ”’ Recommended - Secure)

  1. best_model_finetuned.safetensors ⭐

    • Fine-tuned ResNet18
    • 98.33% accuracy
    • Secure format (no pickle vulnerabilities)
    • Best overall performance
  2. best_model_simple.safetensors

    • Custom CNN trained from scratch
    • 93% accuracy
    • Lighter weight alternative
    • Secure format

Legacy Formats (⚠️ Not Recommended)

  1. best_model_finetuned.pth

    • PyTorch pickle format (legacy)
    • ⚠️ Contains pickle security warnings
    • Use safetensors version instead
  2. best_model_simple.pth

    • PyTorch pickle format (legacy)
    • ⚠️ Contains pickle security warnings
    • Use safetensors version instead
  3. Marine 1.mlmodel

    • CoreML format for iOS/macOS deployment
    • Optimized for Apple devices

Additional Resources

  • GitHub Repository: underwater-audio-classifier
  • Demo Application: Streamlit web interface included
  • Documentation: Comprehensive README and code comments
  • JSON Export: Grand Challenge UDA format compatible

License

MIT License - Free for commercial and research use

Acknowledgments

  • Built with PyTorch and Streamlit
  • ResNet18 architecture from torchvision
  • Audio processing with librosa
  • Inspired by marine conservation efforts

Contact

For questions, issues, or collaboration opportunities:


Model Card Authors: Shivam Sharma

Last Updated: October 2024

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results