LM Provers

Team
community
Activity Feed

AI & ML interests

None defined yet.

Recent Activity

cfahlgren1 
posted an update about 1 year ago
view post
Post
1281
I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-results

You can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!!

cfahlgren1 
posted an update about 1 year ago
cfahlgren1 
posted an update about 1 year ago