Instructions to use peach-lab/privacy-comparator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use peach-lab/privacy-comparator with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct") model = PeftModel.from_pretrained(base_model, "peach-lab/privacy-comparator") - Notebooks
- Google Colab
- Kaggle
Gigi commited on
Commit ·
6096583
1
Parent(s): 60710c8
add dataset link
Browse files
README.md
CHANGED
|
@@ -122,6 +122,32 @@ It performs relative comparison only.
|
|
| 122 |
|
| 123 |
Training performed using Fireworks AI.
|
| 124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 125 |
---
|
| 126 |
|
| 127 |
## Model Outputs
|
|
|
|
| 122 |
|
| 123 |
Training performed using Fireworks AI.
|
| 124 |
|
| 125 |
+
## Training Data
|
| 126 |
+
|
| 127 |
+
This model is fine-tuned via supervised fine-tuning (SFT) with LoRA on pairwise privacy-preference comparisons.
|
| 128 |
+
|
| 129 |
+
Training labels are generated using a teacher model (OpenAI o3) on [ShareGPT90K](https://huggingface.co/datasets/liyucheng/ShareGPT90K)-derived privacy-variant pairs.
|
| 130 |
+
As described in the paper, o3 was selected based on its alignment with human ground truth under high-consensus cases.
|
| 131 |
+
|
| 132 |
+
In addition, we release a human-labeled evaluation set of 150 A/B pairs.
|
| 133 |
+
Each pair is annotated by at least 5 qualified participants (52 unique participants total), with provided `consensus` labels and `consensus_ratio`.
|
| 134 |
+
|
| 135 |
+
For details on data construction, model selection, and annotation procedures, please refer to the paper.
|
| 136 |
+
|
| 137 |
+
---
|
| 138 |
+
## Released Dataset (Human Ground Truth)
|
| 139 |
+
|
| 140 |
+
We release a human-labeled [dataset](https://github.com/PEACH-Research-Lab/Operationalize-Data-Minimization/blob/main/human_labeled_datasets/DATASET_CARD.md) of 150 pairwise privacy-preference comparisons.
|
| 141 |
+
|
| 142 |
+
Each JSONL entry contains:
|
| 143 |
+
- `survey_id`, `conversation_id`, `pair_index`
|
| 144 |
+
- `answers`: anonymized participant votes (`participant_1`, `participant_2`, ...)
|
| 145 |
+
- `consensus`, `consensus_ratio`
|
| 146 |
+
- `message_A`, `message_B`
|
| 147 |
+
|
| 148 |
+
### Participant Privacy
|
| 149 |
+
All participant identifiers are anonymized. No Prolific IDs or direct participant identifiers are released.
|
| 150 |
+
|
| 151 |
---
|
| 152 |
|
| 153 |
## Model Outputs
|