Image-Text-to-Text
Transformers
Safetensors
qwen2
text-generation
conversational
text-generation-inference
nielsr HF Staff commited on
Commit
ad41797
·
verified ·
1 Parent(s): 58b01a9

Improve model card: Add pipeline tag, library, code link, and usage

Browse files

This PR enhances the model card by:
- Adding `pipeline_tag: image-text-to-text` to the metadata for better discoverability.
- Specifying `library_name: transformers` in the metadata, as the model's configuration (`model_type: qwen2`, `tokenizer_class: Qwen2Tokenizer`) indicates compatibility with the Hugging Face Transformers library. This will enable automated code snippets on the Hub.
- Adding a direct link to the GitHub repository in the main content.
- Including a "Sample Usage" section with a "Single Inference" code snippet, directly taken from the GitHub README, to help users get started easily.

All existing content and metadata, including the current arXiv paper link, have been preserved.

Files changed (1) hide show
  1. README.md +25 -3
README.md CHANGED
@@ -1,16 +1,20 @@
1
  ---
2
- license: mit
 
3
  datasets:
4
  - luzimu/webgen-agent_train_step-grpo
5
  - luzimu/webgen-agent_train_sft
6
- base_model:
7
- - Qwen/Qwen2.5-Coder-7B-Instruct
 
8
  ---
9
 
10
  # WebGen-Agent
11
 
12
  WebGen-Agent is an advanced website generation agent designed to autonomously create websites from natural language instructions. It was introduced in the paper [WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/pdf/2509.22644v1).
13
 
 
 
14
  ## Project Overview
15
 
16
  WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
@@ -55,6 +59,24 @@ These dual rewards provide dense, reliable process supervision that significantl
55
 
56
  ![Step-GRPO with Screenshot and GUI-agent Feedback](fig/step-grpo.png)
57
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
  ## Citation
59
 
60
  If you find our project useful, please cite:
 
1
  ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-Coder-7B-Instruct
4
  datasets:
5
  - luzimu/webgen-agent_train_step-grpo
6
  - luzimu/webgen-agent_train_sft
7
+ license: mit
8
+ pipeline_tag: image-text-to-text
9
+ library_name: transformers
10
  ---
11
 
12
  # WebGen-Agent
13
 
14
  WebGen-Agent is an advanced website generation agent designed to autonomously create websites from natural language instructions. It was introduced in the paper [WebGen-Agent: Enhancing Interactive Website Generation with Multi-Level Feedback and Step-Level Reinforcement Learning](https://arxiv.org/pdf/2509.22644v1).
15
 
16
+ Code: https://github.com/mnluzimu/WebGen-Agent
17
+
18
  ## Project Overview
19
 
20
  WebGen-Agent combines state-of-the-art language models with specialized training techniques to create a powerful website generation tool. The agent can understand natural language instructions specifying appearance and functional requirements, iteratively generate website codebases, and refine them using visual and functional feedback.
 
59
 
60
  ![Step-GRPO with Screenshot and GUI-agent Feedback](fig/step-grpo.png)
61
 
62
+ ## Sample Usage
63
+
64
+ Before running inference, you should rename `.env.template` to `.env` and set the base urls and api keys for the agent-engine LLM and feedback VLM. They can be obtained from any openai-compatible providers such as [openrouter](https://openrouter.ai/), [modelscope](https://www.modelscope.cn/my/overview), [bailian](https://bailian.console.aliyun.com/#/home), and [llmprovider](https://llmprovider.ai/).
65
+
66
+ You can also deploy open-source VLMs and LLMs by running `src/scripts/deploy_qwenvl_32b.sh` and `src/scripts/deploy.sh`. Scripts for single inference and batch inference can be found at `src/scripts/infer_single.sh` and `src/scripts/infer_batch.sh`.
67
+
68
+ ```bash
69
+ python src/infer_single.py \
70
+ --model deepseek-chat \
71
+ --vlm_model Qwen/Qwen2.5-VL-32B-Instruct \
72
+ --instruction "Please implement a wheel of fortune website." \
73
+ --workspace-dir workspaces_root/test \
74
+ --log-dir service_logs/test \
75
+ --max-iter 20 \
76
+ --overwrite \
77
+ --error-limit 5
78
+ ```
79
+
80
  ## Citation
81
 
82
  If you find our project useful, please cite: