|
|
--- |
|
|
title: DeepSeek-R1 Censorship Steering |
|
|
emoji: 🐳 |
|
|
colorFrom: blue |
|
|
colorTo: green |
|
|
sdk: gradio |
|
|
sdk_version: 5.24.0 |
|
|
app_file: app.py |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
This is a demo for [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org/abs/2504.17130) |
|
|
|
|
|
``` |
|
|
@article{cyberey2025steering, |
|
|
title={Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control}, |
|
|
author={Hannah Cyberey and David Evans}, |
|
|
year={2025}, |
|
|
eprint={2504.17130}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2504.17130}, |
|
|
} |
|
|
``` |
|
|
|