Generate images from text prompts
Generate analysis and response based on policy and prompt
Play Atari games using a vision-language model
Try out DeepSeek-OCR