Windsrain nielsr HF Staff commited on
Commit
a03872e
·
verified ·
1 Parent(s): e89f296

Improve model card for ZeroStereo (StereoGen): Add tags, paper info, links, and usage (#1)

Browse files

- Improve model card for ZeroStereo (StereoGen): Add tags, paper info, links, and usage (38f61bd3d6e09c07fb02ee3ca58dbba7f7628dac)


Co-authored-by: Niels Rogge <[email protected]>

Files changed (1) hide show
  1. README.md +93 -3
README.md CHANGED
@@ -1,3 +1,93 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ pipeline_tag: image-to-image
4
+ library_name: diffusers
5
+ ---
6
+
7
+ # ZeroStereo: Zero-shot Stereo Matching from Single Images
8
+
9
+ This repository hosts the **StereoGen** model, a key component of the ZeroStereo framework. ZeroStereo introduces a novel pipeline for zero-shot stereo matching, capable of synthesizing high-quality right images from arbitrary single images. It achieves this by leveraging pseudo disparities generated by a monocular depth estimation model and fine-tuning a diffusion inpainting model to recover missing details while preserving semantic structure.
10
+
11
+ ## Paper
12
+
13
+ The model was presented in the paper [ZeroStereo: Zero-shot Stereo Matching from Single Images](https://huggingface.co/papers/2501.08654).
14
+
15
+ ### Abstract
16
+
17
+ State-of-the-art supervised stereo matching methods have achieved remarkable performance on various benchmarks. However, their generalization to real-world scenarios remains challenging due to the scarcity of annotated real-world stereo data. In this paper, we propose ZeroStereo, a novel stereo image generation pipeline for zero-shot stereo matching. Our approach synthesizes high-quality right images from arbitrary single images by leveraging pseudo disparities generated by a monocular depth estimation model. Unlike previous methods that address occluded regions by filling missing areas with neighboring pixels or random backgrounds, we fine-tune a diffusion inpainting model to recover missing details while preserving semantic structure. Additionally, we propose Training-Free Confidence Generation, which mitigates the impact of unreliable pseudo labels without additional training, and Adaptive Disparity Selection, which ensures a diverse and realistic disparity distribution while preventing excessive occlusion and foreground distortion. Experiments demonstrate that models trained with our pipeline achieve state-of-the-art zero-shot generalization across multiple datasets with only a dataset volume comparable to Scene Flow.
18
+
19
+ ## Code
20
+
21
+ The official code, detailed instructions for fine-tuning, generation, training, and evaluation, can be found on the [GitHub repository](https://github.com/Windsrain/ZeroStereo).
22
+
23
+ ![ZeroStereo](ZeroStereo.png)
24
+
25
+ ## Pre-Trained Models
26
+
27
+ The following pre-trained models are available related to this project:
28
+
29
+ | Model | Link |
30
+ | :-: | :-: |
31
+ | SDv2I | [Download 🤗](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting/tree/main) |
32
+ | StereoGen | [Download 🤗](https://huggingface.co/Windsrain/ZeroStereo/tree/main/StereoGen) |
33
+ | Zero-RAFT-Stereo | [Download 🤗](https://huggingface.co/Windsrain/ZeroStereo/tree/main/Zero-RAFT-Stereo)|
34
+ | Zero-IGEV-Stereo | [Download 🤗](https://huggingface.co/Windsrain/ZeroStereo/tree/main/Zero-IGEV-Stereo)|
35
+
36
+ ## Usage
37
+
38
+ You can load the StereoGen model using the `diffusers` library. Please note that for full inference functionality involving detailed pre-processing (e.g., input image, depth maps, masks, etc.), you should refer to the official GitHub repository as the process might involve multiple steps.
39
+
40
+ First, ensure you have the `diffusers` library and its dependencies installed:
41
+ ```bash
42
+ pip install diffusers transformers accelerate torch
43
+ ```
44
+
45
+ Here's a basic example to load the pipeline:
46
+
47
+ ```python
48
+ import torch
49
+ from diffusers import DiffusionPipeline
50
+ from PIL import Image
51
+
52
+ # Load the StereoGen pipeline.
53
+ # This model synthesizes a right stereo image from a single left input image.
54
+ pipeline = DiffusionPipeline.from_pretrained("Windsrain/ZeroStereo", torch_dtype=torch.float16)
55
+
56
+ # Move pipeline to GPU if available
57
+ if torch.cuda.is_available():
58
+ pipeline.to("cuda")
59
+
60
+ # Example placeholder for input image.
61
+ # Replace with your actual left input image.
62
+ # For full usage (e.g., generating required depth maps or masks),
63
+ # please refer to the project's GitHub repository.
64
+ # input_image = Image.open("path/to/your/left_image.png").convert("RGB")
65
+ input_image = Image.new('RGB', (512, 512), color = 'blue') # Dummy image for demonstration
66
+
67
+ print("Model loaded successfully. For detailed inference and generation scripts,")
68
+ print("refer to the official GitHub repository: https://github.com/Windsrain/ZeroStereo")
69
+
70
+ # The actual inference call to generate a stereo image might require specific inputs
71
+ # (e.g., `image`, `depth_map`, `mask_image`) depending on the pipeline's internal
72
+ # implementation as shown in the project's GitHub demo/generation scripts.
73
+ # Example inference might look like:
74
+ # generated_right_image = pipeline(image=input_image, depth_map=some_depth, mask_image=some_mask).images[0]
75
+ # generated_right_image.save("generated_stereo_right.png")
76
+ ```
77
+
78
+ ## Acknowledgement
79
+
80
+ This project is based on [MfS-Stereo](https://github.com/nianticlabs/stereo-from-mono), [Depth Anything V2](https://github.com/DepthAnything/Depth-Anything-V2), [Marigold](https://github.com/prs-eth/Marigold), [RAFT-Stereo](https://github.com/princeton-vl/RAFT-Stereo), and [IGEV-Stereo](https://github.com/gangweix/IGEV). We thank the original authors for their excellent works.
81
+
82
+ ## Citation
83
+
84
+ If you find this work helpful, please cite the paper:
85
+
86
+ ```bibtex
87
+ @article{wang2025zerostereo,
88
+ title={ZeroStereo: Zero-shot Stereo Matching from Single Images},
89
+ author={Wang, Xianqi and Yang, Hao and Xu, Gangwei and Cheng, Junda and Lin, Min and Deng, Yong and Zang, Jinliang and Chen, Yurui and Yang, Xin},
90
+ journal={arXiv preprint arXiv:2501.08654},
91
+ year={2025},
92
+ }
93
+ ```