Character consistency is hard in still imagery. In video, it's brutal. A 90-second animated short is typically built from 25–35 separate shots, each one a fresh keyframe generation followed by an animation pass. Without strong anchoring, your protagonist looks like a different person in shots 1, 12 and 27 — and the audience notices immediately, because faces in motion are even more diagnostic than faces in stills.
This post is about how to keep the same character across 30+ animated shots with AI: why video makes the problem worse, the critical distinction between keyframe consistency and animation consistency, and how Lumora's pipeline solves both.
If you want the general theory of character consistency across all formats, start with the complete guide. This post focuses on what's specific to animated video.
Why character consistency is brutal in AI video
A novel has ten illustrations. A comic has twenty-four pages. An animated short has thirty-plus shots, and each shot is composed of:
- A keyframe — a still image that defines what the shot looks like at its starting moment.
- An animation pass — usually 4–8 seconds of motion generated from that keyframe.
- Optionally, a second keyframe for shots that need a specific ending pose.
That's already 30–60 still images that need to share a face. But it gets worse:
- Camera angles vary wildly. Animation requires extreme low angles, dolly-ins, over-the-shoulder shots, full body action — most of which a single 3-view character sheet doesn't directly cover.
- Lighting shifts between scenes. Day exterior, night interior, firelight, harsh fluorescent — each lighting condition pushes the model to re-render the character with different shading assumptions.
- Motion adds drift. Even if your keyframe nails the character, the animation step can soften features, smear faces during fast movement, or invent details during the in-between frames.
- Veo and similar video models don't accept reference images. The animation step works from the keyframe alone — it can't be fed the character sheet directly.
This last point is the one that surprises people. Even the best current video generation models (Veo 3.1, Kling 3, Seedance) animate from a single input frame. If that frame is inconsistent with the previous shot, no amount of clever prompting at the animation step will fix it. All your consistency battles are won or lost at the keyframe stage.
Keyframe consistency vs animation consistency
These are two different problems that get conflated. Solving them requires two different strategies.
Keyframe consistency is "does the protagonist's face match across the 30 keyframes I'm about to animate?" This is the same problem as in comics, just with more shots and more varied camera angles. It's solved with strong reference image conditioning at the keyframe generation step.
Animation consistency is "does the protagonist's face stay stable across the 4–8 seconds of motion generated from a single keyframe?" This is solved by the video model itself — it's a property of how well the model preserves identity during temporal generation. Most current video models do this well within a single shot. The drift happens between shots, not within them.
The practical implication: if your face changes between shot 5 and shot 6 but stays stable inside each shot, you have a keyframe consistency problem, and the fix is upstream of the animation step. This is almost always where AI video fails.
How Lumora locks identity across 30+ shots
Lumora's video pipeline runs through six stages: preparation, planning, storyboard, keyframes, animation, assembly. Character consistency machinery is set up in planning and applied at keyframes. Here's how.
Planning stage: character sheets are generated and persisted to storage. During the planning phase of a video project, Lumora generates a 3-view character sheet for every named character (front, 3/4, profile, neutral background, in the project's art style). Same approach as comics — but with a key difference: video sheets are also saved to Supabase Storage at a stable path, not just held in memory. This matters because the video pipeline runs across multiple background jobs over many minutes, and the in-memory cache wouldn't survive across them.
Planning stage: location sheets too. Video stresses location consistency in a way novels and comics don't — a scene shot from three different angles still needs to look like the same room. Lumora generates a separate "location sheet" per major setting in the video, also persisted to storage.
Keyframe stage: sheets are hydrated and attached to every shot. When the keyframe stage runs, it loads all character and location sheets from storage back into the image service's cache. Then, for every shot in the storyboard, the keyframe generation call includes:
- The shot's prompt (action, framing, mood).
- The character sheet(s) for every character in the shot.
- The location sheet for the setting.
- An instruction to use the references for consistency.
That's typically 2–4 reference images per keyframe call, well within Nano Banana 2's 5-reference budget. Every one of your 30 keyframes is generated with the same anchors. Drift is structurally prevented, not hoped against.
Animation stage: Veo animates the consistent keyframes. With the keyframes locked, the animation step (Veo 3.1) takes each frame and produces 4–8 seconds of motion. Because every keyframe already represents the same character, the animations are internally consistent and they're consistent with the surrounding shots — not because Veo knows anything about the character, but because the input it received was already locked.
Assembly stage: ffmpeg stitches the clips. No consistency mechanism here — by this stage the visual identity is already determined upstream.
What if a single shot drifts?
Sometimes one keyframe out of thirty looks slightly off — maybe the angle was too extreme, or the action prompt fought the reference. Lumora supports per-shot regeneration. You can regenerate that single keyframe (and re-animate that single shot) without touching the other twenty-nine.
The regeneration call uses the exact same character and location sheets. This is the key property: the regeneration is consistent with the rest of the video, not a fresh roll of the dice. If you fix shot 17, shot 17 will still match shots 16 and 18.
This is what makes per-shot regeneration viable as a workflow. Without reference-anchored consistency, regenerating one shot would risk breaking its neighbors. With it, you can iterate on individual shots until the whole video lands.
Common video-specific failure modes
Even with the pipeline doing the right thing, a few patterns cause trouble:
- Extreme angles with no matching reference view. If every shot in your storyboard is a high-angle bird's-eye view but your character sheet is front/3-4/profile, the model has to extrapolate. Add a reference view for the unusual angle.
- Very fast action across multiple shots. A chase scene with shots that all happen in 0.5 seconds of story time creates pressure for the animation model to invent details. Slowing the pacing slightly often improves perceived consistency.
- Many characters per shot. Same rule as comics: two named characters per shot is the safe ceiling. Three or more risks feature-mixing.
- Style/world inconsistency in prompts. "Cinematic realism" in shot 1 and "Pixar-style render" in shot 2 will produce two different worlds and two different versions of your protagonist. Lock the visual style at the planning stage and let the pipeline enforce it.
- Skipping storyboard review. The storyboard stage is where you can catch impossible character arrangements before they hit the (expensive) keyframe step. Use it.
A practical checklist for video character consistency
- [ ] Each named character has a 3-view sheet generated in planning and reviewed.
- [ ] Each major location has a sheet generated in planning.
- [ ] Reference photos are uploaded for any character that needs to look like a real person.
- [ ] Storyboard is reviewed before keyframe generation runs.
- [ ] Shots with three or more named characters are flagged.
- [ ] Extreme camera angles have a reference view that covers them, or are accepted as looser.
- [ ] Visual style is locked at the planning stage and not changed.
Where to go next
The reason most AI animation feels off is that nobody's solving keyframe consistency rigorously. Animation models are good. Image models are good. The discipline of generating a sheet, persisting it through the pipeline, and attaching it to every keyframe call — that's where it stops being a tech demo and starts being a finished short.