Do you have a dumb question that you're kind of embarrassed to ask in the main thread? Is there something you're just not sure about?
This is your opportunity to ask questions. No question too simple or too silly.
Culture war topics are accepted, and proposals for a better intro post are appreciated.

Jump in the discussion.
No email address required.
Notes -
If nothing else (and that's an if that won't hold); AI is to CGI as CGI was to stop-motion (and many other practical effects). CGI is soon to be over as the state of the art way to produce special effects. It will be reduced tremendously in it's purpose
I think that is a correct analogy.
My guess is that there might be an opening where very low-fidelity renderings are used to map out the action on screen, but AI is doing the work of dozens of other animators in texturing, lighting, simulating and 'rendering' the actual image on screen, with a human just nudging it along and rejecting outputs as they go.
The missing step seems to be fine-grained control over the details, but creators like Gossip Goblin have been able to keep an extremely consistent style, so either that's a solved problem or they've got their prompts refined to a point that they aren't having to toss out much.
The quality available at what has to be a fraction of the cost of traditional FX is going to lead to rapid uptake.
Something very much like this will be a near certainty because trying to prompt detailed poses, positions, proportions, movement paths and so on is a fool's errand. Pure written language is a horrible inefficient way to do such things while a 3D modeler uses an interface optimized for that and provides realtime feedback to the user.
To a large extent, these tools already exist. They're just limited: SCAIL struggles for movement paths with more than three characters or over nine seconds, ControlNet Pose has to be tuned for each model and sometimes even each finetune, and LoRA can uniquely handle three or four style/character/event/motion per output before they start getting funky interactions.
But even assuming that these problems can be fixed - plausible, but not a given! - there's a fundamental tradeoff between what you let the model do, and what you don't. Sometimes expressed as a double! And still hard to manage.
More options
Context Copy link
I mean if we actually get human-level AI in the picture, isn't this pretty much how traditional animation is done? Some storyboards plus a bunch of pure written language?
More options
Context Copy link
Yep. Unless you can hook the thing straight up to the animator's brain (hi there, Neuralink!) the fidgety little details will be hard to keep perfect and consistent, let alone going back and making minute changes without 'redoing' the whole shebang.
It still might beat having to go in and do all the detailed work manually, bur I know way to little about digital animation to give a real guess.
I note that this isn't all that different from standard live-action filmmaking, where you would have actors give multiple 'takes' on a scene and edit in the best ones. You're still 'prompting' actors, and refining your instructions based on the 'output' they produce, then choosing which ones you like and discarding the rest.
In fact, that might be the way to think of it, a return from the sheer tedious craftmanship of computer animation to the more 'organic' style of a Director/Prompter eliciting their ideal performance and massaging it into the final product.
More options
Context Copy link
More options
Context Copy link
Something like SCAIL and LoRA abuses can probably do that today and is probably already getting used in that sense today, but the current version of the technology goes a little nuts for segments longer than 9 seconds, and it's painful to do even short segments using the existing workflows, on top of being egregiously slow on consumer hardware. I've seen people take it into a couple minutes by doing really aggressive generation of prompts to make a flipshow to start with, but anything longer than that tends to either end up needing to compromise on weird physics or ugly scene changes.
And the current implementations have some limits; pose info can't do talking heads well, going beyond three characters with pose info gets rough, and some particular pose changes can go full-on Exorcist. SCAIL's lipsync capabilities are worse than WAN animate, and while it's possible to combine them, it's even more finicky.
But compared to the cost and unpleasantness of traditional mocap, or even makeup? If you can possibly use this tech, there's a lot of good arguments in its favor.
More options
Context Copy link
More options
Context Copy link
More options
Context Copy link