Why AI-generated videos aren't watched to the end: What's wrong with the content and what new aggregators have to do with it

In December 2025, global companies conducted several studies. Nielsen Norman Group, Kapwing, and LAMPA + Rambler&Co concluded that users are quite ready to watch content created by neural networks. However, most AI videos still lose the viewer's attention within the first 10–20 seconds. This creates a curious paradox: while tools are becoming increasingly powerful, audience retention often remains weak. Why do some videos make you want to watch until the final frame, while others are closed almost immediately?

If you look at modern social networks, it becomes obvious that the audience has learned to instantly recognize template videos. Every day, videos appear where historical figures blog, cats star in car commercials, and characters from old paintings suddenly start dancing to popular tracks. Technologically, this is impressive. But viewers increasingly treat such videos as visual noise.

Research by LAMPA and Rambler&Co showed an interesting thing: people do not dislike neural networks as such. Moreover, many regularly watch AI content. However, viewers are very good at sensing the difference between a story in context and a technology demonstration for the sake of technology.

After analyzing studies and hundreds of popular videos, ten reasons can be identified why audiences most often do not watch AI videos to the end.

1. No story, only effect

Beautiful shots can attract attention for a few seconds, but the plot always holds it. If after the first wow effect the viewer doesn’t understand, it’s not interesting to continue watching, the interest quickly fades.

2. Content optimized for clicks, not retention

Many creators focus on the first five seconds and start with a clickbait phrase like “This is what you dream about” or “Watch until the end to learn the secret,” then spend a minute repeating the same in different words with no change in picture. Kapwing’s research directly calls this a feature of AI Slop content: attention is grabbed, but interest doesn’t develop.

3. Trying to hide AI usage

Paradoxically, the audience is more often annoyed not by AI itself but by attempts to pass its work off as fully live filming. When viewers notice the deception, trust collapses.

4. The “almost human” effect

The famous uncanny valley has not disappeared. Characters’ faces look about 95% realistic, movements almost lifelike, but the viewer’s brain still notices inconsistencies with reality, which repels.

5. Story fits the model’s capabilities

Very often the script is built around what the neural network can do, not the idea. As a result, it becomes a set of technology demos rather than a full narrative.

6. Quality starts to degrade after the first scenes

Lighting changes, clothing details shift, characters gradually become different people. The viewer may not consciously notice the problem, but the sense of quality is lost.

7. Lighting looks artificial

The real world is imperfect. AI still tends to make lighting too perfect, which causes scenes to appear plastic.

8. Characters lack psychology

A beautiful hero is no longer enough. Audiences want to understand a character’s motivation, emotions, and reasons for actions. Although AI series on TikTok have already solved this problem—there are such charismatic characters you can love, hate, or pity. But most videos still need work in this direction.

9. All videos start to look alike

The same camera moves, identical editing, and similar color grading create a feeling of endless repetition.

10. No author presence

People don’t come for the algorithm. They come for the author’s perspective on the topic. And this is where most AI projects start to lose.

t turns out many creators still perceive AI as a magic button “Make it beautiful.” But a good video still requires directing, scripting, and understanding the audience, just like live filming.

How AI for content creation turns into an editorial team: overview of the new StoryTube feature

Most problems of modern AI videos arise between stages of production, when one service writes the script another generates images, a third handles animation, a fourth does the voiceover, and only the fifth adds music and effects.

At some point, the author starts dealing not with content but with logistics between neural networks. That is why new aggregators that combine the entire video creation cycle within one platform are growing especially fast now.

Recently, Doitong launched an interesting tool called StoryTube for documentary content. It is specifically designed for creating engaging documentary videos. Essentially, it’s an AI factory capable of creating both short shorts and full-length films up to 25 minutes long.

Unified storyboard for creating AI video

Step 1: work starts with the topic

The more specific the idea is formulated, the stronger the result. For example, like this:

Prompt in Doitong Step 2: then choose duration and aspect ratio. You can make a short 1-minute video or a full 25-minute film. But the most popular social media videos, which convenient both to make and watch, usually last about 9 minutes.

Set the format right here: horizontal, vertical, for a specific platform.

Video time and format selection in Doitong

Step 3: the system offers to set the genre of the future video: documentary investigation, true crime, historical review, scientific explanation, technological exposé, or any author’s style. At this stage, the dramaturgy of the future video begins to form.

Genre selection in Doitong Here the script model connects. Its task is not just to write text. It shapes the plot structure, places attention hooks, creates intrigue between blocks, and helps keep the viewer engaged throughout the video.

Step 4: special attention to voiceover. First, there is a choice between completely silent video, voiceover only, and voice with music and sound effects. Secondly, instead of one synthetic presenter’s voice, 30+ options with different intonations and deliveries are available.

Voiceover selection in Doitong Step 5: add a presenter. You can do without, but then the problem mentioned in chapter one arises again: the video will have no authorship. Here the presenter won’t look like a static avatar. With the fourth option, the presenter can accompany the plot throughout the film and create a feeling of an author’s show. The third option allows up to four additional characters—supposed experts-investigators. Or you can describe your own presenter.

Presenter avatar selection in Doitong Step 6: the system moves to visualization. You can assemble a slideshow or create a full film with camera movement, atmospheric effects, scene dynamics, and cinematic delivery.

That’s it, press the generate button and wait for the finished video.

Convenience is obvious: in StoryTube, we no longer work with a set of disjointed tools, but with a full studio including scriptwriter, cameraman, editor, sound engineer, and versatile narrator.

All this happens inside one window. And this approach today helps solve many problems causing viewers to close videos prematurely.

As for cost, it’s impossible to say even approximately. Because for each choice inside the StoryTube constructor, a completely different number of credits is deducted. But conveniently, after selecting all components, the number of credits to be deducted is displayed immediately.

Suppose we decided to make a horizontal 9-minute video for YouTube. We chose the historical genre with full voiceover and an avatar of one presenter + 4 expert quotes. Also, the most expensive cinematic visual style. And here is the cost calculation provided by the platform:

Credits deduction in AI video generator 54 scenes in 9 minutes - that’s truly a full film!

How to create AI video that will be watched until the end: 7 principles

Let’s try to highlight the top 7 attention retention principles that work regardless of whether you use a separate model or a modern AI tools ecosystem.

1. Start with a question

“When will the last glacier melt in Antarctica and what will happen to the planet?””

The strongest videos create curiosity. The viewer should want to know the answer.

2. Refresh interest every 15–20 seconds

Each new block or scene should bring new information, an unexpected fact, or a plot twist. You can place hook phrases: “But here’s the paradox...,” “What about what we talked about at the beginning?,” “But the most interesting thing is...“

3. Show the author

Even if the content is created by a neural network, the audience wants to feel a human presence behind the project. Therefore, the voiceover should be as natural as possible, or at least the author should be introduced by voice in the beginning, and the presence of a presenter on screen, if it’s an educational or documentary video, is the best option.

4. Control characters

Stable heroes hold attention much better than characters who change from scene to scene. So test and adjust your neuro tools, read knowledge bases. Usually, developers carefully explain how to work with characters.Scenes change, the character doesn’t.

5. Use editing as a storytelling tool

Editing should guide the viewer through the story, not just connect beautiful shots. Shots should change each other while maintaining intrigue, giving visual cues, creating impressions, and encouraging the viewer to look at frame by frame. This sequencing is often embedded in the AI model tuned for script creation.

6. Create visual

Changing pace, shots, and atmosphere helps hold attention much more effectively than any special effects.

7. Subject technology to the story

This is the most important principle. Many creators start with the question: “What can this new AI do?” But the best projects start with another question: “What story do I want to tell?”

Only then are tools selected accordingly.

The irony of the modern industry is that viewers are tired of neural network capability demonstrations. People are still interested in discoveries, emotions, investigations, strong heroes, and good stories. Therefore, AI-created videos stop being just technical experiments and become full-fledged media products only when technology serves the plot.

The market is gradually moving to a new norm: the frontrunners are not those with the most models, but those who learned to turn AI into a team of specialists. And that is why new aggregators are becoming one of the most interesting directions for the entire content industry’s development.

-----------------------------------------------

Test the latest developments with profit:

Doitong from Russia

Doitong outside Russia

📲iPhone mobile app

Promo code for a 10% discount and free starting generations: SEA

The links themselves also provide a 15% discount and free starting generations