Relaxed Recursive Transformer implementation, uptraining with distillation on openwebtext2. arxiv.org/abs/2410.20672