Faceless Video Factory
YouTube Shorts pipeline — BullMQ + FFmpeg + Edge TTS
Built by Rogue AI · Topic-to-MP4 Shorts pipeline on free tooling · Pipeline demo · Self-hosted
Built solo in a local lab as a pipeline-pattern demo; iterated over a single focused build cycle.
The problem
Cranking out short-form vertical video by hand is the same chore repeated forever: write a script, record a voiceover, cut captions, stack it into a 9:16 frame, export. None of those steps are hard — they are just slow and easy to get subtly wrong (caption timing drifts, the export ratio is off, the audio clips). I wanted to prove that the whole chain could be automated end to end as a queue of background jobs, and that it could run entirely on free tooling so the cost floor is zero. This is a demonstration of the automation pattern, not a launched content business.
What I built
A Next.js 16 web app where you submit a topic, and a BullMQ worker walks it through a fixed status flow: PENDING → SCRIPTING → DRAFT → VOICING → GENERATING → ASSEMBLING → REVIEW. Each video row in Postgres carries its own status, script, and asset paths, so the queue is the source of truth and you can watch a job move through the stages. The default 'Faceless' path uses only free services — a local model writes the script, Edge TTS speaks it, stock B-roll fills the frame, and FFmpeg burns the captions in and exports a 1080x1920 MP4. Paid providers (a hosted LLM, ElevenLabs, HeyGen, Kling, fal.ai) are wired in as opt-in upgrades behind env keys, but nothing requires them to produce a finished clip.
Architecture
Tech stack
What broke first
- ▸
Make the queue the source of truth. Storing each job's status on its database row — rather than tracking it in the worker's memory — meant a crashed or restarted render left an inspectable, resumable row instead of a vanished process. Execution state belongs in the data.
- ▸
Concurrency 1 is a feature, not a limitation. FFmpeg saturates a CPU on its own; letting two encodes run together made everything slower. Matching worker concurrency to what the hardware can actually do beat any clever parallelism.
- ▸
Design for the free path first. Building the default route entirely on free tooling (local model, Edge TTS, stock B-roll, FFmpeg) and treating paid providers as opt-in upgrades kept the cost floor at zero and forced every step to have a working fallback — so a missing API key downshifts a stage instead of breaking the run.
Outcome
The pattern works: submit a topic, and a finished captioned vertical MP4 comes out the other end without a paid API key in sight. What it proves is the engineering shape — a durable job queue, a status machine you can watch, and a provider chain that prefers free tools and only reaches for paid ones when you opt in. It is a reference implementation of that pipeline, not a content operation; there is no upload scheduler, no channel, and no published volume to quote.
Honest limits
This is a pipeline-pattern demo, not a launched product. It is self-hosted, built solo, and runs in a local lab (the old VPS that once hosted experiments has been retired). The point is to prove the automation chain end to end on 100% free tooling — there are no real users, no published content volume, and no invented metrics. The paid provider integrations exist as opt-in branches; the demo runs the free path. Output quality is bounded by the free tools: stock B-roll and synthetic narration are serviceable for the pattern, not a substitute for crafted footage.
