nielsr HF Staff commited on
Commit
3a1ac52
·
verified ·
1 Parent(s): 975380a

Improve model card: Add pipeline tag, library name, paper, project page, and code links

Browse files

This PR enhances the model card by:
- Adding the `pipeline_tag: image-to-image` to the metadata, which helps users discover the model through relevant filters on the Hugging Face Hub.
- Specifying `library_name: pytorch` to leverage automated code snippets, as the model's inference code directly uses PyTorch.
- Including a direct link to the paper [Exploring Image Representation with Decoupled Classical Visual Descriptors](https://huggingface.co/papers/2510.14536).
- Adding links to the official project page (https://chenyuanqu.com/VisualSplit/) and the GitHub repository (https://github.com/HenryQUQ/VisualSplit) for easier access to more information and code.
- Integrating the paper's abstract into the "Model Description" section to provide a richer overview of the model.

These additions improve the discoverability, usability, and completeness of the model card.

Files changed (1) hide show
  1. README.md +8 -1
README.md CHANGED
@@ -2,11 +2,16 @@
2
  license: apache-2.0
3
  tags:
4
  - vision
 
 
5
  ---
6
 
7
-
8
  # VisualSplit
9
 
 
 
 
 
10
  **VisualSplit** is a ViT-based model that explicitly factorises an image into **classical visual descriptors**—such as **edges**, **color segmentation**, and **grayscale histogram**—and learns to reconstruct the image conditioned on those descriptors. This design yields **interpretable representations** where geometry (edges), albedo/appearance (segmented colors), and global tone (histogram) can be reasoned about or varied independently.
11
 
12
  > **Training data**: ImageNet-1K.
@@ -14,6 +19,8 @@ tags:
14
 
15
  ## Model Description
16
 
 
 
17
  - **Inputs** (at inference):
18
  - An RGB image (for convenience) which is converted to descriptors using the provided `FeatureExtractor` (edges, color segmentation, grayscale histogram).
19
  - **Outputs**:
 
2
  license: apache-2.0
3
  tags:
4
  - vision
5
+ pipeline_tag: image-to-image
6
+ library_name: pytorch
7
  ---
8
 
 
9
  # VisualSplit
10
 
11
+ This model is presented in the paper [Exploring Image Representation with Decoupled Classical Visual Descriptors](https://huggingface.co/papers/2510.14536).
12
+ Project page: https://chenyuanqu.com/VisualSplit/
13
+ Code: https://github.com/HenryQUQ/VisualSplit
14
+
15
  **VisualSplit** is a ViT-based model that explicitly factorises an image into **classical visual descriptors**—such as **edges**, **color segmentation**, and **grayscale histogram**—and learns to reconstruct the image conditioned on those descriptors. This design yields **interpretable representations** where geometry (edges), albedo/appearance (segmented colors), and global tone (histogram) can be reasoned about or varied independently.
16
 
17
  > **Training data**: ImageNet-1K.
 
19
 
20
  ## Model Description
21
 
22
+ Exploring and understanding efficient image representations is a long-standing challenge in computer vision. While deep learning has achieved remarkable progress across image understanding tasks, its internal representations are often opaque, making it difficult to interpret how visual information is processed. In contrast, classical visual descriptors (e.g. edge, colour, and intensity distribution) have long been fundamental to image analysis and remain intuitively understandable to humans. Motivated by this gap, we ask a central question: Can modern learning benefit from these classical cues? In this paper, we answer it with VisualSplit, a framework that explicitly decomposes images into decoupled classical descriptors, treating each as an independent but complementary component of visual knowledge. Through a reconstruction-driven pre-training scheme, VisualSplit learns to capture the essence of each visual descriptor while preserving their interpretability. By explicitly decomposing visual attributes, our method inherently facilitates effective attribute control in various advanced visual tasks, including image generation and editing, extending beyond conventional classification and segmentation, suggesting the effectiveness of this new learning approach for visual understanding.
23
+
24
  - **Inputs** (at inference):
25
  - An RGB image (for convenience) which is converted to descriptors using the provided `FeatureExtractor` (edges, color segmentation, grayscale histogram).
26
  - **Outputs**: