-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2402.15627
-
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
Memory Layers at Scale
Paper • 2412.09764 • Published • 5
-
Contrastive Learning for Many-to-many Multilingual Neural Machine Translation
Paper • 2105.09501 • Published -
Cross-modal Contrastive Learning for Speech Translation
Paper • 2205.02444 • Published -
ByteTransformer: A High-Performance Transformer Boosted for Variable-Length Inputs
Paper • 2210.03052 • Published -
Diffusion Glancing Transformer for Parallel Sequence to Sequence Learning
Paper • 2212.10240 • Published • 1
-
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs
Paper • 2402.15627 • Published • 38 -
One Wide Feedforward is All You Need
Paper • 2309.01826 • Published • 33 -
Fast Feedforward Networks
Paper • 2308.14711 • Published • 3 -
Memory Layers at Scale
Paper • 2412.09764 • Published • 5
-
MLP Can Be A Good Transformer Learner
Paper • 2404.05657 • Published • 1 -
Toward a Better Understanding of Fourier Neural Operators: Analysis and Improvement from a Spectral Perspective
Paper • 2404.07200 • Published • 2 -
An inclusive review on deep learning techniques and their scope in handwriting recognition
Paper • 2404.08011 • Published • 1 -
Long-form music generation with latent diffusion
Paper • 2404.10301 • Published • 27