---
tags:
- text-to-image
- lora
- diffusers
- template:diffusion-lora
base_model: black-forest-labs/FLUX.1-Kontext-dev
instance_prompt: >-
[photo content], recreate the scene from a top-down perspective. Maintain all
visual proportions, lighting consistency, and realistic spatial relationships.
Ensure the background, textures, and environmental shadows remain naturally
aligned from this elevated angle.
license: other
license_name: flux-1-dev-non-commercial-license
license_link: LICENSE.md
language:
- en
pipeline_tag: image-to-image
library_name: diffusers
---

# **Kontext-Top-Down-View**
The Kontext-Top-Down-View is an experimental adapter for black-forest-lab's FLUX.1-Kontext-dev, designed to transform scenes into a top-down perspective while maintaining accurate visual proportions, consistent lighting, and realistic spatial relationships. The model ensures that backgrounds, textures, and environmental details remain natural and contextually coherent, producing high-quality, perspective-accurate visual outputs. It was trained on 800 image pairs (400 start images and 400 end images) to achieve precise, geometry-consistent top-down scene generation.
> [!note]
[photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle.
> You modified the prompt, altering its properties and subjective elements. Note: this is an experimental adapter and may contain artifacts.
---
## **Sample Inferences : Demo**
|
|
|
|
---
## Parameter Settings
| Setting | Value |
| ------------------------ | ------------------------ |
| Module Type | Adapter |
| Base Model | FLUX.1 Kontext Dev - fp8 |
| Trigger Words | [photo content], recreate the scene from a top-down perspective. Maintain all visual proportions, lighting consistency, and realistic spatial relationships. Ensure the background, textures, and environmental shadows remain naturally aligned from this elevated angle. |
| Image Processing Repeats | 50 |
| Epochs | 25 |
| Save Every N Epochs | 1 |
Labeling: DeepCaption-VLA-7B(natural language & English)
Total Images Used for Training : 800 Image Pairs (400 Start, 400 End)
## Training Parameters
| Setting | Value |
| --------------------------- | --------- |
| Seed | - |
| Clip Skip | - |
| Text Encoder LR | 0.00001 |
| UNet LR | 0.00005 |
| LR Scheduler | constant |
| Optimizer | AdamW8bit |
| Network Dimension | 64 |
| Network Alpha | 32 |
| Gradient Accumulation Steps | - |
## Label Parameters
| Setting | Value |
| --------------- | ----- |
| Shuffle Caption | - |
| Keep N Tokens | - |
## Advanced Parameters
| Setting | Value |
| ------------------------- | ----- |
| Noise Offset | 0.03 |
| Multires Noise Discount | 0.1 |
| Multires Noise Iterations | 10 |
| Conv Dimension | - |
| Conv Alpha | - |
| Batch Size | - |
| Steps | 3800 & 400(warm up) |
| Sampler | euler |
---
## Trigger words
You should use `[photo content]` to trigger the image generation.
You should use `recreate the scene from a top-down perspective. Maintain all visual proportions` to trigger the image generation.
You should use `lighting consistency` to trigger the image generation.
You should use `and realistic spatial relationships. Ensure the background` to trigger the image generation.
You should use `textures` to trigger the image generation.
You should use `and environmental shadows remain naturally aligned from this elevated angle.` to trigger the image generation.
## Download model
[Download](/prithivMLmods/Kontext-Top-Down-View/tree/main) them in the Files & versions tab.