Adding Evaluation Results

00a6f7b verified 4 months ago

6.92 kB

metadata

library_name: transformers
tags:
  - mergekit
  - merge
base_model: LeroyDyer/_Spydaz_Web_AI_Mistral_R1_Base
model-index:
  - name: _Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: IFEval (0-Shot)
          type: HuggingFaceH4/ifeval
          args:
            num_few_shot: 0
        metrics:
          - type: inst_level_strict_acc and prompt_level_strict_acc
            value: 59.51
            name: strict accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: BBH (3-Shot)
          type: BBH
          args:
            num_few_shot: 3
        metrics:
          - type: acc_norm
            value: 28.15
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MATH Lvl 5 (4-Shot)
          type: hendrycks/competition_math
          args:
            num_few_shot: 4
        metrics:
          - type: exact_match
            value: 5.44
            name: exact match
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GPQA (0-shot)
          type: Idavidrein/gpqa
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 5.59
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MuSR (0-shot)
          type: TAUR-Lab/MuSR
          args:
            num_few_shot: 0
        metrics:
          - type: acc_norm
            value: 25.14
            name: acc_norm
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU-PRO (5-shot)
          type: TIGER-Lab/MMLU-Pro
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 22.22
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=LeroyDyer/_Spydaz_Web_AI_AGI_R1_Math_AdvancedStudent
          name: Open LLM Leaderboard

rank 1667

"Success comes from defining each task in achievable steps.

Every completed step is a success that brings you closer to your goal.

Winners create more winners, while losers do the opposite.

Success is a game of winners.

— # Leroy Dyer (1972-Present)

The Human AI .

Deep thinking Model - Highly Trained om Multiple Datasets

The base model has been created as a new staarting point : It has been fully primed with various types of chains of thoughts and step by step solutions : enabling for reward training to take place . this model has been trained with various languges ( not intensivly ), enabling for cross languge understanding ; Here we create a valid start point for agent based modelling , As we find that some training actually affects existing knowledge , hence agents become a thing ! or if you prefr, distillations .... These agents can be medical , technical , roleplayers etc .

This model was trained on various datasets , such as the basic math ones . As well as some adaced reasoning tasks. here we experiment with various styles of data from finacial to medical to coding (althugh this seemms to have an issue with very long context ,, as the servers seems to crash out a lot whe pushing larger cotext and rewards - suggestion , only 1 sample perstep can solve it), very impressive with its diagnsis skill for medical.

SpydazWeb AI (7b Mistral) (512k)

This model has been trained to perform with contexts of 512k , although in training it has been trained mainly with the 2048 for general usage : the long context aspect also allows fro advanced projects and sumarys as well as image and audio translationns and generations:

Highly trained as well as methodolgy oriented , this model has been trained on the reAct Prcess and other structured processes . hence structured outputs (json) are very highly trained as well as orchestration of other agents and tasks : the model has been trained for tools use as well as funtion use : as well as custom processes and tools : some tools do not need code either as thier implication means the model may even generate a tool or artifct to perfrom the task :

A New genrea of AI ! This is Trained to give highly detailed humanized responses : Performs tasks well, a Very good model for multipupose use : the model has been trained to become more human in its reposes as well as role playing and story telling : This latest model has been trained on Conversations with a desire to respond with expressive emotive content , As well as discussions on various topics: It has also been focused on conversations by human interactions. hence there maybe NFSW contet in the model : This has no way inhibited its other tasks which were also aligned using the new intensive and Expressive prompt :

Thinking Humanly:

AI aims to model human thought, a goal of cognitive science across fields like psychology and computer science.

Thinking Rationally:

AI also seeks to formalize “laws of thought” through logic, though human thinking is often inconsistent and uncertain.

Acting Humanly:

Turing's test evaluates AI by its ability to mimic human behavior convincingly, encompassing skills like reasoning and language.

Acting Rationally:

Russell and Norvig advocate for AI that acts rationally to achieve the best outcomes, integrating reasoning and adaptability to environments.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	24.34
IFEval (0-Shot)	59.51
BBH (3-Shot)	28.15
MATH Lvl 5 (4-Shot)	5.44
GPQA (0-shot)	5.59
MuSR (0-shot)	25.14
MMLU-PRO (5-shot)	22.22