Marine1: Underwater Acoustic Classifier 🌊🐋

⚠️ IMPORTANT: Use Safetensors Format
This repository contains both .pth (pickle) and .safetensors formats.
Please use the .safetensors files to avoid pickle security vulnerabilities.
See How to Use section below for examples.

Model Description

Marine1 is a state-of-the-art deep learning model for classifying underwater acoustic events. Built on a fine-tuned ResNet18 architecture, it achieves 98.33% accuracy in distinguishing between four categories of marine soundscapes.

This model is designed for marine biologists, oceanographers, researchers, and conservationists working with underwater audio data.

Model Details

Model Type: Audio Classification (CNN-based)
Architecture: Fine-tuned ResNet18 (transfer learning from ImageNet)
Input: Audio files (WAV, MP3, FLAC) - max 10 seconds
Output: 4-class classification with confidence scores
Framework: PyTorch 2.0+
Parameters: ~11M parameters
Training Time: ~10 minutes (4 epochs)
Format: Available in both safetensors (recommended) and PyTorch formats

🔒 Security Note

This model is available in safetensors format, which is the recommended secure format that avoids pickle vulnerabilities.

Why Safetensors?

✅ No arbitrary code execution risks - Safe to load from any source
✅ Fast loading times - Optimized performance
✅ Memory-efficient - Better resource usage
✅ Cross-platform compatibility - Works everywhere

⚠️ The .pth files are provided for backward compatibility only.
Always prefer .safetensors for production use.

Learn more: Hugging Face Safetensors Documentation

Performance

Metric	Value
Accuracy	98.33%
Balanced Accuracy	91.67%
Training Epochs	4
Training Time	~10 minutes

Per-Class Performance

Marine Animals: Excellent (high precision/recall)
Vessels: Excellent (high precision/recall)
Natural Sounds: Good (limited training data)
Other Anthropogenic: Good (limited training data)

How to Use

Installation

pip install torch torchaudio librosa numpy safetensors

Quick Start (Recommended - Safetensors)

import torch
import librosa
import numpy as np
from safetensors.torch import load_file

# Load model (secure format)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
state_dict = load_file("best_model_finetuned.safetensors", device=str(device))

# Load and process audio
audio_path = "underwater_sound.wav"
y, sr = librosa.load(audio_path, sr=16000, duration=10.0)

# Create mel spectrogram
mel_spec = librosa.feature.melspectrogram(
    y=y, sr=sr, n_mels=128, n_fft=2048, 
    hop_length=512, fmax=8000
)
log_mel_spec = librosa.power_to_db(mel_spec, ref=np.max)

# Prepare input
input_tensor = torch.FloatTensor(log_mel_spec).unsqueeze(0).unsqueeze(0).to(device)

# Predict
model.eval()
with torch.no_grad():
    outputs = model(input_tensor)
    probabilities = torch.nn.functional.softmax(outputs, dim=1)[0]
    predicted_class = probabilities.argmax().item()
    confidence = probabilities[predicted_class].item()

# Map to class names
class_names = ["vessel", "marine_animal", "natural_sound", "other_anthropogenic"]
print(f"Prediction: {class_names[predicted_class]} ({confidence*100:.2f}%)")

Using the Inference Class (Easiest)

from huggingface_hub import hf_hub_download
from inference import Marine1Classifier

# Download model (safetensors format - secure!)
model_path = hf_hub_download(
    repo_id="shiv207/Marine1",
    filename="best_model_finetuned.safetensors"
)

# Initialize classifier
classifier = Marine1Classifier(model_path)

# Make prediction
result = classifier.predict("underwater_sound.wav")
print(f"Prediction: {result['predicted_class']}")
print(f"Confidence: {result['confidence']*100:.2f}%")

Using the Complete Pipeline

For a full-featured implementation with preprocessing and JSON output:

# Clone the repository
git clone https://github.com/shiv207/underwater-audio-classifier.git
cd underwater-audio-classifier

# Install dependencies
pip install -r requirements.txt

# Run prediction (supports both .pth and .safetensors)
python predict_minimal.py --audio your_audio.wav --model models/best_model_finetuned.safetensors

# Generate UDA-compliant JSON
python generate_json.py --audio your_audio.wav --output result.json

Streamlit Web Interface

streamlit run app.py

Features:

Drag-and-drop multiple audio files
Batch processing with progress tracking
Visual spectrograms and probability charts
Export results as JSON (individual or batch)
Event detection mode with temporal localization

Training Data

The model was trained on a diverse dataset of underwater acoustic recordings:

Marine Animals: Whale songs, dolphin clicks, orca vocalizations
Vessels: Various ship types (cargo, tanker, passenger, tug)
Natural Sounds: Ocean waves, water movement, environmental sounds
Other Sounds: Anthropogenic non-vessel sounds

Note: Natural sounds and other anthropogenic categories have limited training samples (13 each), which may affect performance on edge cases.

Technical Details

Audio Processing Pipeline

Resampling: Audio resampled to 16kHz
Duration: Truncated/padded to 10 seconds
Spectrogram: 128-band mel spectrogram (n_fft=2048, hop_length=512)
Normalization: Log-scale conversion with dB normalization
Augmentation (training only): Speed, pitch, noise injection

Model Architecture

ResNet18 (Pre-trained on ImageNet)
├── Conv1 (frozen)
├── Layer1 (frozen)
├── Layer2 (fine-tuned)
├── Layer3 (fine-tuned)
├── Layer4 (fine-tuned)
└── FC Layer (4 classes, trained from scratch)

Training Configuration

Optimizer: Adam (lr=0.0001)
Loss: CrossEntropyLoss
Batch Size: 8
Epochs: 4
Data Split: 80% train, 20% validation
Device: CPU/MPS (Apple Silicon compatible)

Limitations

Limited Data: Natural sounds and other anthropogenic categories have fewer training samples
Duration: Optimized for 10-second audio clips
Sample Rate: Best performance at 16kHz
Domain: Trained on specific underwater environments; may need fine-tuning for different acoustic conditions
Background Noise: Performance may degrade with high noise levels

Intended Use

Primary Use Cases

Marine biology research and species monitoring
Vessel traffic analysis and maritime surveillance
Environmental impact assessment
Underwater acoustic event detection
Marine conservation and ecosystem monitoring

Out-of-Scope Use

Real-time streaming (model requires preprocessing)
Non-underwater audio classification
Medical or security applications
High-stakes decision making without human oversight

Ethical Considerations

This model is intended for research and conservation purposes
Should not be used as the sole basis for regulatory or enforcement decisions
May have biases based on training data distribution
Performance varies across different marine environments
Users should validate results in their specific context

Citation

If you use this model in your research, please cite:

@misc{marine1-underwater-classifier,
  author = {Shivam Sharma},
  title = {Marine1: Underwater Acoustic Classifier},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/shiv207/Marine1}},
}

Model Variants

This repository includes multiple model formats:

Safetensors Format (🔒 Recommended - Secure)

best_model_finetuned.safetensors ⭐
- Fine-tuned ResNet18
- 98.33% accuracy
- Secure format (no pickle vulnerabilities)
- Best overall performance
best_model_simple.safetensors
- Custom CNN trained from scratch
- 93% accuracy
- Lighter weight alternative
- Secure format

Legacy Formats (⚠️ Not Recommended)

best_model_finetuned.pth
- PyTorch pickle format (legacy)
- ⚠️ Contains pickle security warnings
- Use safetensors version instead
best_model_simple.pth
- PyTorch pickle format (legacy)
- ⚠️ Contains pickle security warnings
- Use safetensors version instead
Marine 1.mlmodel
- CoreML format for iOS/macOS deployment
- Optimized for Apple devices

Additional Resources

GitHub Repository: underwater-audio-classifier
Demo Application: Streamlit web interface included
Documentation: Comprehensive README and code comments
JSON Export: Grand Challenge UDA format compatible

License

MIT License - Free for commercial and research use

Acknowledgments

Built with PyTorch and Streamlit
ResNet18 architecture from torchvision
Audio processing with librosa
Inspired by marine conservation efforts

Contact

For questions, issues, or collaboration opportunities:

GitHub: @shiv207
Repository Issues: Report a bug

Model Card Authors: Shivam Sharma

Last Updated: October 2024

Downloads last month: 24

Evaluation results

Accuracy
self-reported

98.330
Balanced Accuracy
self-reported

91.670

Metadata error: specify a dataset to view leaderboard

shiv207
/

Marine1