I recently open-sourced a new project: Video Studio Skills .

It’s a multi-agent video creation toolkit built for Hermes Agent . Simply put, it lets you form a virtual video studio with 7 AI agents, taking over the entire workflow—from in-depth research, scriptwriting, and TTS voiceovers to Remotion animation rendering and multi-platform SEO packaging.

Why Build This Tool

As AI capabilities continue to improve, creating short videos with AI has become incredibly easy. Nowadays, we can frequently scroll through various AI-generated videos on TikTok or Bilibili. However, whether it’s the advanced Seedance 2.0 or Gemini omni, they share common issues like slow generation speeds and difficulty in control. Sometimes, to generate a satisfactory video, we have to repeatedly roll the dice, constantly picking and splicing, which consumes a vast amount of time—let alone mass-producing short videos.

While I believe AI will eventually solve these problems given the current pace of development, these pain points still exist right now. So I started wondering: could we make AI video generation slightly more controllable through a multi-agent collaborative workflow? At the very least, when we are unsatisfied with the final video output, we wouldn’t have to start from scratch but rather tweak a specific node in the workflow, saving a considerable amount of time.

Thus, this project was born. It is particularly suited for producing batches of popular science short videos. From gathering materials, writing verbatim scripts, recording or configuring TTS, editing, and animating, to packaging titles and descriptions for various video platforms—this entire process can be collaboratively completed by AI agents with diverse characteristics. You only need to provide the topic, and AI can mass-produce videos for you like an assembly line. Of course, you can do the voiceovers yourself and add your own A-roll footage to achieve even higher video quality.

7 Agents, 6 Stages

In this project, I broke down the video production workflow into the collaboration of 7 specific roles:

  • Director: The general manager of the entire studio, responsible for receiving your topics, breaking down tasks, and dispatching them to other agents.
  • Researcher: Responsible for in-depth research and outputting structured research data.
  • Writer: Drafts the initial video script based on the research data.
  • Editor: Exclusively responsible for “de-AI polishing” and finalizing the script.
  • Narrator: Calls TTS tools to generate voiceovers and outputs timeline synchronization files.
  • Renderer: Transforms text and audio into dynamic video frames based on Remotion.
  • Packager: Generates titles, descriptions, and tags tailored for platforms like YouTube and Bilibili.

The entire pipeline flows completely automatically. You can choose different working modes: if you want to chime in at any time, you can pull them into a “group chat” and watch them discuss; if you just want to see the results, you can contact the Director one-on-one and let it “delegate” work in the background; if you need mass production, you can also manage the progress using a Kanban mode.

The Necessity of “De-AI Polishing”

Throughout the entire pipeline, what I care about the most—and spent the most effort tuning—is the Editor stage.

Current large models share a common flaw when writing articles: they love using overly enthusiastic marketing tones filled with grandiose adjectives. This feels disconnected from our daily reading experience. I personally have always preferred restrained, natural, and clear writing. If I were to read the script myself, that strong “AI flavor” would make me feel very uncomfortable.

Therefore, after the Writer produces the first draft, I forcibly introduced the Editor role. Its task is to perform “de-AI polishing”: removing pompous vocabulary, breaking long sentences into short ones, eliminating stereotypical patterns, and making the script return to the natural tone of human speech as much as possible.

A Few Thoughts

After completing this project, I have a very strong feeling: multi-agent collaboration might truly change the survival state of individual creators.

In the past, when we talked about a “one-person studio,” it often carried a sense of ascetic tragedy. But now, if you know how to orchestrate AI, you can genuinely have your own virtual employees. In Video Studio, you only need to decide on a good topic. AI will automatically run through the rest of the dirty work for you.

Technology is leveling the execution threshold at an unprecedented speed. When tools are no longer an obstacle, what ultimately widens the gap might once again return to those things in our minds that truly belong to humans: your taste, your restraint, and your unique understanding of the world.

If you are interested, feel free to check out the code and try deploying it into your Hermes Agent.