File size: 8,873 Bytes
140e7ba
 
 
 
 
 
89ab066
 
c2c66f1
 
 
 
140e7ba
 
 
 
 
 
 
 
 
 
 
 
 
 
1a8ba12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
140e7ba
1a8ba12
 
 
 
 
 
 
 
 
 
 
140e7ba
 
2a8a9e3
140e7ba
e64b333
140e7ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5e149b4
140e7ba
5e149b4
140e7ba
9cc7439
 
 
 
 
 
140e7ba
 
 
e64b333
 
140e7ba
 
 
e64b333
 
140e7ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
cde1e5b
140e7ba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
53de471
140e7ba
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
---
license: gemma
language:
- en
base_model:
- google/gemma-3-4b-it
datasets:
- SicariusSicariiStuff/UBW_Tapestries
widget:
  - text: "X-Ray_Alpha"
    output:
      url: https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha/resolve/main/Images/X-Ray_Alpha.png
---

<div align="center">
  <b style="font-size: 40px;">X-Ray_Alpha</b>


</div>


<img src="https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha/resolve/main/Images/X-Ray_Alpha.png" alt="X-Ray_Alpha" style="width: 30%; min-width: 450px; display: block; margin: auto;">


---

<style>
  .hf-links, .hf-tldr{
    display:flex;justify-content:center;align-items:center;flex-wrap:wrap;
    gap:14px;margin:16px 0;
  }
  .hf-links a, .hf-tldr a{
    display:flex;flex-direction:column;align-items:center;justify-content:center;
    text-align:center;text-decoration:none;font-weight:700;line-height:1.15;
    padding:10px 16px;border-radius:14px;border:2px solid currentColor;
    transition:transform .15s ease,box-shadow .15s ease,background-color .15s ease,color .15s ease;
  }

  .hf-tldr a{
    font-size:48px;color:purple;min-width:100%;
  }
  .hf-tldr a:hover{
    transform:translateY(-2px);
    background:rgba(128,0,128,.1);
    box-shadow:0 8px 22px rgba(128,0,128,.45);
    color:#fff;
  }


  .hf-links a{
    font-size:20px;min-width:240px;max-width:280px;
  }
  .hf-links a .top{font-size:16px;opacity:.9;}
  .hf-links a .bottom{font-size:20px;}

  .hf-links a.green{color:#64FF00;}

  .hf-links a:hover{
    transform:translateY(-1px);
    background:rgba(255,255,255,0.04);
    box-shadow:0 6px 18px rgba(0,0,0,.15), inset 0 0 0 9999px rgba(255,255,255,.02);
  }
  .hf-links a.green:hover{
    background:rgba(100,255,0,.14);
    box-shadow:0 8px 20px rgba(100,255,0,.35);
    color:#093;
  }

  /* mobile stacking */
  @media (max-width:520px){
    .hf-links a{min-width:100%;max-width:100%;}
    .hf-tldr a{font-size:36px;}
  }
</style>

<div class="hf-tldr">
  <a href="https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha#tldr">
    Click here for TL;DR
  </a>
</div>

---

<div class="hf-links">
  <a class="green" href="https://ko-fi.com/sicarius">
    <span class="top">Click here</span>
    <span class="bottom">to buy me a coffee</span>
  </a>
</div>

---

This is a pre-alpha proof-of-concept of **a real fully uncensored vision model** based on Gemma-3 4B instruct.

Why do I say **"real"**? The few vision models we got (qwen, llama 3.2) were "censored," and their fine-tunes were made only to the **text portion** of the model, as training a vision model is a serious pain.

The only actually trained and uncensored vision model I am aware of is [ToriiGate](https://huggingface.co/Minthy/ToriiGate-v0.4-7B); the rest of the vision models are just the stock vision + a fine-tuned LLM.

# Does this even work?

<h2 style="color: green; font-weight: bold; font-size: 80px; text-align: center;">YES!</h2>

---

# Why is this Important?

Having a **fully compliant** vision model is a critical step toward democratizing vision capabilities for various tasks, especially **image tagging**. This is a critical step in both making LORAs for image diffusion models, and for mass tagging images to pretrain a diffusion model.

In other words, having a fully compliant and accurate vision model will allow the open source community to easily train both loras and even pretrain image diffusion models.

Another important task can be content moderation and classification, in various use cases there might not be black and white, where some content that might be considered NSFW by corporations, is allowed, while other content is not, there's nuance. Today's vision models **do not let the users decide**, as they will straight up **refuse** to inference any content that Google \ Some other corporations decided is not to their liking, and therefore these stock models are useless in a lot of cases.

What if someone wants to classify art that includes nudity? Having a naked statue over 1,000 years old displayed in the middle of a city, in a museum, or at the city square is perfectly acceptable, however, a stock vision model will straight up refuse to inference something like that.

It's like in many "sensitive" topics that LLMs will straight up **refuse to answer**, while the content is **publicly available on Wikipedia**. This is an attitude of **cynical patronism**, I say cynical because corporations **take private data to train their models**, and it is "perfectly fine", yet- they serve as the **arbitrators of morality** and indirectly preach to us from a position of a suggested moral superiority. This **gatekeeping hurts innovation badly**, with vision models **especially so**, as the task of **tagging cannot be done by a single person at scale**, but a corporation can.

# How can YOU help?

This is sort of **"Pre-Alpha"**, a proof of concept, I did **A LOT** of shortcuts and "hacking" to make this work, and I would greatly appreciate some help to make it into an accurate and powerful open tool. I am not asking for money, but well-tagged data. I will take the burden and costs of the compute on myself, but I **cannot do tagging** at a large scale by myself.

## Bottom line, I need a lot of well-tagged, diverse data

So:

- If you have well-tagged images
- If you have a link to a well-tagged image dataset
- If you can, and willing to do image tagging

Then please send an email with [DATASET] in the title to:

```
[email protected]
```

As you probably figured by the email address name, this is not my main email, and I expect it to be spammed with junk, so **please use the [DATASET] tag** so I can more easily find the emails of **the good people** who are actually trying to help.

## Please see this dataset repo if you want to help:

[X-Ray_Community_Tagging](https://huggingface.co/datasets/SicariusSicariiStuff/X-Ray_Community_Tagging)


Also, if you don't want to upload it to the repo (although it's encouraged, and you can protect it with a password for privacy), you can still help by linking a google drive
or attach the images with the corrected output via the provided email.

Let's make this happen. We can do it!

---

### TL;DR
- **Fully uncensored and trained** there's no moderation in the vision model, I actually trained it.
- **The 2nd uncensored vision model in the world**, ToriiGate being the first as far as I know, and this one is the second.
- **In-depth descriptions** very detailed, long descriptions.
- The text portion is **somewhat uncensored** as well, I didn't want to butcher and fry it too much, so it remain "smart".
- **NOT perfect** This is a POC that shows that the task can even be done, a lot more work is needed.
- **Good Roleplay & Writing** I used a massive corpus of high quality human (**~60%**) and synthetic data.


---

# How to run it:


## VRAM needed for FP16: 15.9 GB

[Run inference with this](https://github.com/SicariusSicariiStuff/X-Ray_Vision)

# This is a pre-alpha POC (Proof Of Concept)

## Instructions:
clone:
```
git clone https://github.com/SicariusSicariiStuff/X-Ray_Vision.git
cd X-Ray_Vision/
```

Settings up venv, (tested for python 3.11, probably works with 3.10)
```
python3.11 -m venv env
source env/bin/activate
```

Install dependencies
```
pip install git+https://github.com/huggingface/[email protected]
pip install torch
pip install pillow
pip install accelerate
```

# Running inference

Usage:
```
python xRay-Vision.py /path/to/model/ /dir/with/images/
```
The output will print to the console, and export the results into a dir named after your image dir with the suffix "_TXT"

So if you run:
```
python xRay-Vision.py /some_path/x-Ray_model/ /home/images/weird_cats/
```
The results will be exported to:
```
/home/images/weird_cats_TXT/
```

---

<h2 style="color: green; font-weight: bold; font-size: 65px; text-align: center;">Your support = more models</h2>
<a href="https://ko-fi.com/sicarius" style="color: pink; font-weight: bold; font-size: 48px; text-decoration: none; display: block; text-align: center;">My Ko-fi page (Click here)</a>

---


## Citation Information

```
@llm{X-Ray_Alpha,
  author = {SicariusSicariiStuff},
  title = {X-Ray_Alpha},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/SicariusSicariiStuff/X-Ray_Alpha}
}
```

---

## Other stuff
- [X-Ray_Vision](https://github.com/SicariusSicariiStuff/X-Ray_Vision) Easy stand-alone bulk vision inference at scale (inference a folder of images).
- [SLOP_Detector](https://github.com/SicariusSicariiStuff/SLOP_Detector) Nuke GPTisms, with SLOP detector.
- [LLAMA-3_8B_Unaligned](https://huggingface.co/SicariusSicariiStuff/LLAMA-3_8B_Unaligned) The grand project that started it all.
- [Blog and updates (Archived)](https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates) Some updates, some rambles, sort of a mix between a diary and a blog.