Reclara
YouTube video summarization system with automatic transcription and AI-powered summaries.
For installation and setup instructions, refer to the README on my github.
Project Structure
apps/web- Next.js frontendapps/mq- Message queue workers (BullMQ)packages/db- Database schemas & queries (Drizzle)packages/redis- Redis connection & queue setuppackages/env- Environment configurationpackages/constants- Shared constants
Technical Explanation
Architecture
Workflow Overview
The Reclara system follows a distributed job queue architecture:
- User Submission - User sends a YouTube video URL to the server
- Job Creation - Server creates a record in the database with state "pending" and sends a job to the transcript worker
- Transcription - Transcriber worker processes the video using yt-dlp to extract and clean the transcript
- Summarization - Summarizer worker generates an AI-powered summary using the cleaned transcript
- Polling - Client periodically polls the server for results. While WebSocket or Server-Sent Events (SSE) would be more efficient for real-time updates, polling was chosen to align with Vercel's serverless architecture constraints
Database Schema
The core of Reclara's functionality revolves around the Summary table:
| Field | Type | Description |
|---|---|---|
| id | UUID | Unique identifier (derived from YouTube video ID) |
| userId | UUID | Reference to the user who created the summary |
| transcript | text | Cleaned transcript extracted from the video |
| summarize | text | AI-generated summary result |
| model | enum | Type of LLM model used ("gpt-oss-120b" | "llama-4-maverick" | "Qwen3 Reranker 8B") |
| state | enum | Job state tracking ("pending" | "start_transcript" | "success_transcript" | "start_summarizing" | "finished" | "error") |
| createdAt | timestamp | Record creation time |
| updatedAt | timestamp | Last update time |
| videoId | string(11) | YouTube video identifier |
Transcript Processing
The transcript extraction includes sophisticated fallback mechanisms:
- Primary Extraction: Uses
yt-dlpto fetch YouTube video transcripts - Exponential Backoff Retry: On failure, retries after 2s, 4s, and 8s to respect YouTube rate limits
- Language Fallback: Attempts English first, then Indonesian if not available
- Cleaning: Strips VTT metadata (timestamps, positioning) to extract only the text content
Example Raw VTT Format:
WEBVTTKind: captionsLanguage: en00:00:14.310 --> 00:00:14.320 align:start position:0%Kompas tonight on Kompas TV Independant.
After Cleaning:
Kompas tonight on Kompas TV Independant. Trustworthy, Brother. The first
information...
AI-Powered Summarization
The summarizer worker uses the Fireworks API with structured output:
- Prompt Engineering: Generates a detailed prompt with specific instructions for JSON output and Markdown formatting
- Model Inference: Sends the cleaned transcript to the LLM (GPT-OSS-120B, Llama 4, or Qwen3) via Fireworks API
- JSON Schema Validation: Enforces JSON schema to ensure consistent output format
- Result Storage: Saves the generated summary and updates the database state to "finished"
Generated Summary Structure:
# Video Summary[Opening paragraphs explaining main content]## Main Points- [Important point 1]- [Important point 2]- [Important point 3]## Conclusion[Closing paragraph with core message]
Job Queue Architecture
Reclara uses a producer-consumer pattern with BullMQ and Redis:
- Producer: Creates jobs when users submit videos
- Transcript Worker: Processes transcript extraction jobs
- Summarizer Worker: Processes summarization jobs
- Redis: Acts as the message broker for reliable job distribution
- State Management: Each job progresses through defined states, enabling client-side polling
Tools & Libraries
Frontend:
- Next.js - Full-stack React framework
- TypeScript - Static type checking
- Better-auth - Authentication system
Backend:
- Bun - JavaScript/TypeScript runtime
- Drizzle ORM - Type-safe database queries
- Turso + SQLite - Edge database
- Redis - In-memory task queue
- BullMQ - Redis-based job queue library
- yt-dlp - YouTube transcript extraction
- Fireworks AI - LLM inference platform with GPT-OSS-120B, Llama 4, and Qwen3 models
Infrastructure:
- Docker & Docker Compose - Local development containerization
- GCP Compute Engine - Backend deployment
- Vercel - Next.js frontend hosting
- GitHub Actions - CI/CD automation