WeijianQi1999 commited on
Commit
c2dcaba
·
1 Parent(s): d7b65f7

update name

Browse files
auto_o4-mini_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -10,4 +10,4 @@ ACT-1-20250814,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,[Enhans](https:
10
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
11
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
12
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
13
- Google Computer Use (09-2025),gemini-2.5-computer-use-preview-10-2025,Google DeepMind,Google DeepMind,77.1,55.2,45.9,57.3,2025-09-29,True,,2025-09
 
10
  Eko-V2,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),95.0,76.0,70.0,78.0,2025-5-24,False,Unknown evaluation method,2025-05
11
  Eko-V1,Unknown,Fellou,[Fellou](https://fellou.ai/blog/post/eko20-launch/),-,-,-,31.0,2025-5-24,False,Unknown evaluation method,2025-05
12
  Seed1.5-VL,Seed1.5-VL,ByteDance,[ByteDance](https://arxiv.org/pdf/2505.07062),-,-,-,76.4,2025-5-11,False,Evaluated by WebJudge(GPT-4o),2025-05
13
+ Gemini 2.5 Computer Use,gemini-2.5-computer-use-preview-10-2025,Google DeepMind,Google DeepMind,77.1,55.2,45.9,57.3,2025-09-29,True,,2025-09
human_Mind2Web-Online - Leaderboard_data.csv CHANGED
@@ -7,4 +7,4 @@ Agent-E,gpt-4o-2024-08-06,Emergence AI,OSU NLP,49.4,26.6,6.8,28.0,2025-3-22
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
8
  ACT-1-20250703,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,65.1,46.2,23.0,45.7,2025-7-16
9
  ACT-1-20250814,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,81.9,54.5,35.1,57.3,2025-8-23
10
- Google Computer Use (09-2025),gemini-2.5-computer-use-preview-10-2025,Google DeepMind,Google DeepMind,77.1,71.3,55.4,69.0,2025-9-29
 
7
  Claude Computer Use 3.7 (w/o thinking),Claude-3-7-sonnet-20250219,Anthropic,OSU NLP,90.4,49.0,32.4,56.3,2025-4-20
8
  ACT-1-20250703,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,65.1,46.2,23.0,45.7,2025-7-16
9
  ACT-1-20250814,o3-2025-04-16 and Claude-sonnet-4-20250514,Enhans,Enhans,81.9,54.5,35.1,57.3,2025-8-23
10
+ Gemini 2.5 Computer Use,gemini-2.5-computer-use-preview-10-2025,Google DeepMind,Google DeepMind,77.1,71.3,55.4,69.0,2025-9-29