Preservation of rare knowledge
Hi Cerebras,
Thank you for your work. If you don't mind, I have a quick question.
I'm curious to know how rare (long-tail) data is preserved, within your work with REAP?
If we take the minimum saliency score, would it be accurate to say that data below that score is completely lost?
In other words I see a lot of techniques that take different routes but in one way or the other, all end up at frequency = importance.
However, that seems to come with it's own issues in the context of rare knowledge.
Feel free to correct me here, as I'm simply trying to understand.
thanks for the interest in our work!
it's still an open question on what capabilities could be most impacted by the pruning process. we are calibrating using a mix of coding and multi-turn tool calling data, so domains like multi-lingual creative writing could be affected, for example. that said, the pruned checkpoints primarily target the coding assistant use case.
are there any particular benchmarks of interest you'd suggest we run to test for other capabilities of interest?
Thank you for sharing that insight about this model.
As for the benchmarks, for me personally, I'd love to see how it does on medical knowledge before and after pruning (Healthbench, MedQA, PubMedQA, MMLU Med [text only]).
However, it seems like there are also other people with interest in this same question (who possibly have use cases outside of the med domain), so I would say SimpleQA would be a good one to test the general (rare) knowledge, also measured before and after the pruning. That might be a good fit for this test.
Thank you again for your time and work!
+1 for general SimpleQA benches.
Its definitely a worst case scenario for this sort of prune, so it'd be good to see.