You need to agree to share your contact information to access this model

By clicking "Agree and Access" you acknowledge the Privacy Policy and consent to receive offers and updates. You can unsubscribe at any time.

LTX-2.3 22B IC-LoRA Reference Sheet Control

This is an IC-LoRA trained on top of LTX-2.3-22B, which conditions video generation on a reference sheet — a single composite image inventorying the characters, props, and location of a scene — so that generated videos keep those elements visually consistent.

It is based on the LTX-2.3 foundation model.

Prompt: ### Reference Sheet Description **Top Row Left (Setting):** A modern "FreshMart" grocery store exterior with a grey facade, a prominent green awning displaying the FreshMart logo (a leaf inside a shopping cart), and a white delivery car parked in front on a dark asphalt lot under a bright, partly cloudy sky. **Top Row Middle (Setting):** A long, brightly lit interior aisle of a grocery store fully stocked with various colorful packaged products on tall shelves, with polished floors reflecting the overhead fluorescent lights. **Top Row Right (Character):** A 3D animated owl mascot shown from multiple angles (front, sides, back, 3/4 view). The owl has grey and white feathers, large expressive eyes, and wears a green short-sleeved polo shirt, a dark green apron with the FreshMart logo, and a green visor cap. **Middle Row Left (Props):** Dark blue reusable shopping tote bags with bright green handles and trim. One features the FreshMart logo, and another features the text "FRESH CHOICES, BETTER LIVING". **Middle Row Right (Props):** Multiple angles (front, back, side) of a white 5-door FreshMart delivery hatchback car featuring the FreshMart logo on the side doors and a roof sign. **Bottom Row Left (Props):** Close-up angles of the FreshMart tote bags, showing their sturdy boxy shape, green fabric trim, and folded handles. **Bottom Row Right (Logo):** The standalone FreshMart logo featuring a minimalist shopping cart with a green leaf design above bold green text. ### Target Description A bright, high-quality 3D animated commercial shot set outside a modern "FreshMart" grocery store exterior during a sunny day. The camera holds a steady, static wide shot. In the foreground, standing on the grey pavement, is a cheerful 3D animated owl mascot with grey and white feathers, large expressive eyes, and a small yellow beak. The owl is dressed in a green short-sleeved polo shirt, a dark green apron featuring the FreshMart logo, and a matching green visor cap. Parked directly behind and slightly to the right of the owl is a white 5-door FreshMart delivery hatchback car with the store's leaf-and-cart logo on the side doors and a small roof sign. The store's grey facade and prominent green awning form the background, with the sliding glass entrance doors visible behind the character. The owl warmly spreads its wings outward and upward in an inviting, energetic welcoming gesture, blinking its large eyes naturally. It looks directly into the camera lens with a joyful, enthusiastic expression and opens its beak, confidently delivering its lines: "Welcome to FreshMart, where we're always ready to help you load up with everything you need. Fresh choices, better living!" The lighting is crisp and commercial, casting soft natural shadows on the asphalt and highlighting the clean, vibrant green accents of the store.

Prompt: ### Reference Sheet Description **Top Row (Character):** A 3D animated young, athletic man shown from multiple turnaround angles. He has short brown hair and brown eyes. He wears a teal hooded sleeveless vest with a high collar over a dark grey undershirt, brown leather bracers on his forearms, a cross-body leather satchel strap, brown pants with thick brown boots, and a glowing blue crystal pendant necklace. **Middle Row Left (Prop):** A casual short-sleeved button-up Hawaiian shirt in a light cream color, covered with large green tropical pine and palm leaf patterns. **Middle Row Right (Prop):** A brightly colored tropical soda can shown from three angles. The can has a yellow top transitioning to a teal bottom with abstract wave designs and a silver pull tab. **Bottom Row Left (Prop):** A magical wooden staff made of dark brown twisting wood. The top holds a large, sharp, glowing blue crystal bound tightly to the wood by dark leather straps. **Bottom Row Right (Setting):** A bright, sunny open-air tropical beach bar with wooden railings, sturdy wooden pillars, and warm hanging lantern lights. The background features a pristine white sand beach, lush green trees, and clear turquoise ocean water beneath a bright blue sky. ### Target Description A vibrant, high-quality 3D animated cinematic shot set inside a sun-drenched, open-air tropical beach bar. The camera frames a steady medium shot of the young, athletic man with short brown hair leaning casually near a sturdy wooden pillar. He is dressed in the light cream Hawaiian shirt covered in green palm leaf patterns, layered under the teal hooded sleeveless vest, wearing brown leather bracers and the glowing blue crystal pendant necklace. In his right hand, he firmly grips the twisting dark wooden staff with the glowing blue crystal at the top. In his left hand, he holds the brightly colored yellow and teal tropical soda can. He takes a slow, deep breath of the ocean air, looking slightly off-camera before turning his warm brown eyes forward. A soft, contented smile spreads across his face as he slightly raises the soda can in a subtle toasting motion. He speaks smoothly in a relaxed and satisfied way: "Nothing beats a cold drink after a long quest." He pauses for a beat, his smile widening as he gently shakes the can, adding: "Pure magic!" The background features a shallow depth of field, showcasing the pristine white sand, lush green trees, and clear turquoise ocean water. Warm, bright daylight illuminates his face, casting natural shadows, while a gentle ocean breeze slightly rustles his hair and the collar of his shirt.

Prompt: ### Reference Sheet Description **Top Row Left (Character):** A young woman with long, slightly wavy brown hair parted down the middle. She has a warm complexion and is wearing a dark olive green ribbed tank top and dark, baggy cargo pants with large side pockets. She is shown from front, side, and back profiles with a relaxed expression. **Top Row Right (Prop):** A sleek, modern black video game controller shown from top, front, and angled perspectives, featuring standard joysticks, a D-pad, and action buttons. **Middle Row Right (Prop):** A dark olive green short-sleeved t-shirt featuring a stylized retro graphic print of a bright orange sun setting behind dark silhouette pine trees and green rolling hills. **Bottom Row (Setting):** A wide shot of a cozy, dimly lit bedroom at night. The room features wood-paneled accents, a neatly made low bed with green bedding, string lights on the wall, and a gaming desk setup with dual monitors. A striking circular wall light emitting a warm yellow-orange ring illuminates the space above the bed, casting a neon glow over the dark, moody room. ### Target Description A cozy, cinematic scene inside a dimly lit bedroom at night with rich wood-paneled accents. The camera holds a steady, static medium shot. A young woman with long, slightly wavy brown hair parted down the middle and a warm complexion is sitting on the edge of her neatly made low bed with green bedding. She is gripping the sleek, modern black video game controller tightly in both hands, her thumbs actively moving on the thumbsticks. She wears the dark olive green short-sleeved t-shirt featuring the stylized retro graphic print of a bright orange sun setting behind silhouette pine trees, paired with her dark, baggy cargo pants. The room is bathed in deep, atmospheric shadows, heavily illuminated by the striking circular wall light emitting a warm yellow-orange ring above her bed, casting a vivid neon glow across her face. Her initial expression is one of intense focus and determination, but she suddenly relaxes as a triumphant smile breaks across her face. She throws her head back slightly, letting out a joyful, energetic laugh, and speaks to her off-screen opponent in a smug and playful way: "Oh, you thought you had me cornered, didn't you?" She pauses for a split second, leaning forward slightly as her smile widens, and exclaims: "Not today!" The background gaming desk setup with dual monitors remains softly blurred, emphasizing her vivid expressions under the warm ambient neon light.

Prompt: ### Reference Sheet Description **Top Row Left (Character):** A cute 3D animated hedgehog with a round, plump body, soft brown spiky fur on its back, a light tan belly, and a friendly, curious face. **Top Row Middle (Character):** A 3D animated grey and white rabbit shown from multiple angles. The rabbit has tall ears with pink insides, large expressive blue eyes, a small pink nose, and soft, fluffy fur. **Top Row Right (Character):** Multiple turnaround angles (front, back, sides) of the same brown hedgehog character. **Middle Row Left (Prop):** A green and white rectangular sign that reads "Greenfield Home & Garden" with a small leaf logo. Next to it is a digital menu board listing garden services. **Middle Row Right (Props):** A collection of bright green gardening tools against a white background, including a boxy green retractable wall-mounted hose reel, a handheld leaf blower, and a pair of sharp pruning shears with green handles. **Bottom Row (Setting):** The interior of a bright, spacious retail garden center. Aisles of metal shelving are heavily stocked with various potted green plants, bags of organic soil, and boxed gardening supplies, with hanging green aisle signs visible overhead. ### Target Description A bright, high-quality 3D animated commercial shot set inside the bright, spacious retail garden center. The camera frames a steady, well-lit medium shot of a wooden display counter. Sitting in the center of the counter is the boxy green retractable wall-mounted hose reel. Standing on the counter to the left of the hose reel is the cute 3D animated hedgehog with a round, plump body and soft brown spiky fur. Standing on the counter to the right is the 3D animated grey and white rabbit with tall ears, a small pink nose, and large expressive blue eyes. The hedgehog gestures warmly with its small paws, looking directly at the camera with a friendly, welcoming expression, and says cheerfully: "Welcome to Greenfield! We have the best new garden tools!". The rabbit then leans in slightly, raising a fluffy paw with an eager, wide-eyed expression, and interrupts to asks in a curious way: "Hold on... Does this dirt make carrots grow instantly? Asking for a friend.". The scene features vibrant, cinematic Pixar-style lighting that highlights the lush potted green plants on the metal shelving in the softly blurred background, while the characters display smooth, lifelike micro-movements and natural breathing.

Prompt: ### Reference Sheet Description **Top Row Left (Character):** A young man with dark, messy curly hair and a mustache. He has a warm, natural skin tone and is wearing a black collared overshirt left unbuttoned over a simple white t-shirt. He has a relaxed, subtle smile. **Top Row Right (Character):** Multiple turnaround angles of the same man in the black overshirt, showing his full body wearing dark pants and dark casual shoes. **Middle Row Left (Prop):** A black rectangular sign displaying the white "SOUNDWAVE" logo alongside a stylized circular audio wave icon. **Middle Row Right (Prop):** Close-up angles of a pair of premium black over-ear wireless headphones with thick cushioned earcups and a smooth, matte black headband. To the far right is the sleek, tall black retail packaging box for the headphones. **Bottom Row (Setting):** A wide view of a modern, upscale "Soundwave" audio electronics retail store. The store features warm wood accents, large windows letting in natural sunlight, neat wooden display tables showcasing gadgets, and walls lined with neatly organized boxed products and headphones. A store clerk wearing a black polo shirt stands behind a wooden counter. ### Target Description A cinematic, high-quality commercial scene set inside a modern "Soundwave" audio electronics retail store. The camera frames a steady medium shot. In the foreground, a young man with dark messy curly hair and a mustache, wearing a black collared overshirt left unbuttoned over a simple white t-shirt, stands at a wooden counter. He is holding and closely inspecting a pair of premium black over-ear wireless headphones with thick cushioned earcups and a smooth, matte black headband. Behind the counter stands a male store clerk wearing a black polo shirt, watching the customer attentively. The customer gently turns the headphones in his hands to examine their texture, looks up at the clerk, and says in an impressed and enthusiastic way: "The build quality on these is honestly incredible. How is the active noise cancellation?" The clerk nods with a warm, welcoming smile and replies professionally and confidently: "It's our best yet, it completely blocks out the street noise" Natural sunlight streams through large windows out of frame, casting warm, cinematic lighting across the rich wood accents of the store and highlighting the sleek matte finish of the headphones. The background, featuring shelves lined with neatly organized boxed products and the Soundwave logo, remains softly out of focus.

Prompt: ### Reference Sheet Description **Top Row Left (Character):** A young Asian woman with a warm skin tone. Her dark hair is parted down the middle and styled into two long braids resting on her chest. She wears a short-sleeved olive-green t-shirt, khaki cargo pants, dark brown hiking boots, and a black wristwatch on her left arm. She has a serious, natural expression. **Top Row Right (Prop - Backpack):** A large, heavy-duty blue hiking backpack. It features an external silver metal frame, multiple side and top pouches, black adjustable straps, and a brown leather square patch near the bottom. **Middle Row Left (Prop - Walking Stick):** A simple, thick, natural wooden walking stick with rough bark texture and a slight fork or knot near one end. **Middle Row Right (Character/Animal - Yak):** A large, sturdy yak with long, shaggy white and blonde hair and curved grey horns. It wears a highly ornate saddle blanket featuring intricate blue, red, and yellow patterns, along with a saddle equipped with metal stirrups. Colorful tassels hang near its ears and chest. **Bottom Row Left (Setting - Landscape):** A sweeping, majestic mountain landscape. A dirt path winds through green, rocky slopes toward towering, snow-capped peaks in the distance, set under a bright blue sky with scattered white clouds. **Bottom Row Right (Setting - Building):** A small, traditional square stone building (a shrine or temple). It has a flat, slightly tiered roof adorned with a bright yellow fabric valance along the roofline. The windows have bright blue trim, and the wooden door is painted red. A small stone stupa sits beside it. ### Target Description cinematic adventure documentary scene captures a dynamic medium wide shot of a young asian woman with her dark hair in two long braids, wearing an olive-green t-shirt and khaki cargo pants. she is sitting on a rock along a mountain dirt path, resting beside a massive white shaggy yak with curved horns. the yak wears an ornate blue, red, and yellow patterned saddle blanket with metal stirrups. leaning against a nearby small stone building—which features blue window trim, a red door, and a yellow fabric roof valance—is her large blue external-frame backpack and thick wooden walking stick. majestic snow-capped mountains tower in the distant background under a bright blue sky. she looks at the yak, chest heaving slightly from exertion, and says with a breathy, tired, but gentle voice: "we've got a long way to go..." she pauses briefly, extending her hand to gently pat the thick white fur on the yak's neck. the yak shifts its weight slightly, causing the colorful tassels near its ears to sway. a faint, exhausted smile breaks across her face as she continues softly: "...big guy." lowering her hand, she grabs her thick wooden walking stick, leaning her weight onto it as she turns her gaze up toward the distant snowy peaks. her expression shifts from exhaustion to quiet determination as she adds, her voice growing slightly firmer: "but the pass..." she takes a deep, grounding breath. "...is just over that ridge." the woman looks profoundly weary but unbroken, embodying a deep connection to the harsh, beautiful environment and her animal companion. her acting is subtle and grounded; micro-expressions of fatigue are visible in the slight tension around her eyes and her parted lips as she catches her breath, while her jaw sets with firm resolve as she contemplates the climb ahead. the camera is dynamically handheld, slowly orbiting around the woman and the yak to reveal the depth of the valley and the towering mountains behind them. the film aesthetic is naturalistic and breathtaking, with bright, crisp sunlight casting sharp shadows across the rocky path and stone shrine. the audio is clear and immersive, featuring her wind-swept voice, the heavy, rhythmic breathing of the yak, the faint jingle of metal stirrups, and the distant, ambient howl of mountain winds, with no background music.

Model Files

ltx-2.3-22b-ic-lora-ingredients-0.9.safetensors

Model Details

Base Model: LTX-2.3-22B (dev)
Training Type: IC-LoRA (in-context LoRA)
Control Type: Reference-sheet conditioning — character / prop / location identity carried into the generated video
Reference Downscale Factor: 1 (the reference is provided at the same resolution as the output)
Pipeline details: The reference sheet is supplied as a static video (the still sheet looped to the output's length and frame rate). The model is trained with a video_to_video strategy over reference latents; no extra color/space transforms are applied at inference.

Intended Use & Out-of-Scope

Intended use: Generating short video clips that stay faithful to a supplied reference sheet — keeping recurring characters (face and costume), handled props, and the set/location consistent with the sheet while following an action described in the prompt.

Out of scope: This is not a general text-to-video model — it expects a reference sheet as conditioning. It was trained at a single resolution / length bucket (768×448, 121 frames, 24 fps); other resolutions, much longer clips, or use without a reference sheet are out of distribution. It does not reproduce identities that are absent from the supplied sheet.

Control Signal Requirements

Control signal type: Reference sheet — a single composite image with one clean panel per distinct visual element (each character as a face close-up + body turnaround, each prop as a product-style render, and one clean location panel), laid out on a black background with no text.
Expected input: A static video built from the reference sheet, looped to match the output clip's length and frame rate, at the output resolution (downscale factor 1).
Preprocessing: Author the reference sheet with the element-driven reference-sheet generator, then loop the still into a static video. Frame count must be ≥ 121 so the reference-encoding / 121-frame read bucket is satisfied; all targets in training were ≥ 121 frames.
Alignment: The reference video should match the output resolution and frame rate; its frame count must be at least the output length (clamped to ≥ 121).

How It Works

The prompt is split into two labeled parts, matching how the model was trained:

Reference sheet: <description of the panels in the sheet — characters, props, location>

Generated video: <description of the action / shot you want generated>

At inference the reference sheet (as a static video) supplies the what things look like, and the Generated video: portion of the prompt supplies the what happens. The model reads the reference latents in-context and renders a new clip whose characters, props, and setting match the sheet.

Usage

🔌 ComfyUI

Copy the LoRA weights into models/loras.
Load the LTX-2.3-22B base model and add lora_weights_step_12000.safetensors as the LoRA.
Start at strength 1.0 and adjust to taste.
Use an IC-LoRA / reference workflow from the LTX-2 ComfyUI repository, which already wires the reference (control) input. Connect the reference-sheet static video as the control/reference input; a generic LoRA loader that ignores the reference path will not apply the conditioning. See the IC-LoRA docs.

Recommended Settings

LoRA strength / weight: 1.4
Inference steps: 30
Guidance scale: 4.0
Resolution & frames: 768×448, 121 frames, 24 fps (the trained bucket — best results here)
Prompting: Use the two-part Reference sheet: … / Generated video: … structure above. The Reference sheet: text should describe the panels present; the Generated video: text drives the action. Suggested negative prompt: worst quality, inconsistent motion, blurry, jittery, distorted. Validation used spatiotemporal guidance (STG, mode stg_v, block 29, scale 1.0), which can help motion stability.

References

Code: GitHub Repository
IC-LoRA docs: docs.ltx.video — IC-LoRA usage guide

Tips & Troubleshooting

Bigger panels carry over better: The more space an element takes up in the reference image, the more faithfully it carries over into the generated video. Give important characters/props larger, more prominent panels rather than small or crowded ones.
Identity drift: If a character's face or costume drifts, make sure the reference sheet has a clean, front-facing close-up and full turnaround for that character, and that its panel isn't cluttered or text-laden.
Element not appearing: The model only reproduces elements present on the sheet — add a dedicated panel for any prop/character you need to persist, and describe it in the Reference sheet: portion of the prompt.
Reference too short: The reference static video must be ≥ 121 frames; shorter references break the reference-encoding bucket.

Dataset

The model was trained using a proprietary dataset of video clips paired with generated reference sheets.

Training

Technique: IC-LoRA (rank 128, alpha 128, dropout 0.0) on the DiT transformer — attn1/attn2 q/k/v/out projections and the feed-forward layers.
Hyperparameters: bf16 mixed precision, AdamW-8bit, gradient checkpointing, batch size 1, gradient accumulation 1, max grad norm 1.0, seed 42. Learning rate: 1.3e-4 (linear scheduler) for the first 6,000 steps, then a low constant 1.3e-5 for the continuation to 12,000.
Strategy: video_to_video over reference latents, first_frame_conditioning_p 0.0, reference downscale factor 1.
Steps: 12,000 (recommended checkpoint: step 12,000).
Infrastructure: LTX-2 Community Trainer, 8× GPU DDP.