openbmb
/

MiniCPM-o-2_6

Model card Files Files and versions

Cuiunbo commited on Jan 13

Commit

554254e

·

verified ·

1 Parent(s): 9baca8a

Update README.md

Files changed (1) hide show

README.md +7 -3

README.md CHANGED Viewed

@@ -1,5 +1,5 @@
 ---
-pipeline_tag: image-text-to-text
 datasets:
 - openbmb/RLAIF-V-Dataset
 library_name: transformers
@@ -13,6 +13,10 @@ tags:
 - multi-image
 - video
 - custom_code
 ---
 <h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone</h1>
@@ -1217,7 +1221,7 @@ msgs = [{'role': 'user', 'content': [task_prompt]}] # you can also try to ask th
 # sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='voice_cloning', language='en')
 # text_prompt = f"Please read the text below."
 # user_question = {'role': 'user', 'content': [text_prompt, "content that you want to read"]} # using same voice in sys_prompt to read the text. (Voice Cloning)
-# user_question = {'role': 'user', 'content': [text_prompt, librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # using same voice in sys_prompt to read 'xxx.wav'. (Voice Creation)
 # msgs = [sys_prompt, user_question]
 res = model.chat(
@@ -1386,4 +1390,4 @@ If you find our work helpful, please consider citing our papers 📝 and liking
   journal={arXiv preprint arXiv:2408.01800},
   year={2024}
 }
-```

 ---
+pipeline_tag: any-to-any
 datasets:
 - openbmb/RLAIF-V-Dataset
 library_name: transformers
 - multi-image
 - video
 - custom_code
+- audio
+- speech
+- asr
+- tts
 ---
 <h1>A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone</h1>
 # sys_prompt = model.get_sys_prompt(ref_audio=ref_audio, mode='voice_cloning', language='en')
 # text_prompt = f"Please read the text below."
 # user_question = {'role': 'user', 'content': [text_prompt, "content that you want to read"]} # using same voice in sys_prompt to read the text. (Voice Cloning)
+# user_question = {'role': 'user', 'content': [text_prompt, librosa.load('xxx.wav', sr=16000, mono=True)[0]]} # using same voice in sys_prompt to read 'xxx.wav'. (Voice Conversion)
 # msgs = [sys_prompt, user_question]
 res = model.chat(
   journal={arXiv preprint arXiv:2408.01800},
   year={2024}
 }
+```