Reclara

YouTube video summarization system with automatic transcription and AI-powered summaries.

For installation and setup instructions, refer to the README on my github.

Project Structure

apps/web - Next.js frontend
apps/mq - Message queue workers (BullMQ)
packages/db - Database schemas & queries (Drizzle)
packages/redis - Redis connection & queue setup
packages/env - Environment configuration
packages/constants - Shared constants

Technical Explanation

Architecture

Workflow Overview

The Reclara system follows a distributed job queue architecture:

User Submission - User sends a YouTube video URL to the server
Job Creation - Server creates a record in the database with state "pending" and sends a job to the transcript worker
Transcription - Transcriber worker processes the video using yt-dlp to extract and clean the transcript
Summarization - Summarizer worker generates an AI-powered summary using the cleaned transcript
Polling - Client periodically polls the server for results. While WebSocket or Server-Sent Events (SSE) would be more efficient for real-time updates, polling was chosen to align with Vercel's serverless architecture constraints

Database Schema

The core of Reclara's functionality revolves around the Summary table:

Field	Type	Description
id	UUID	Unique identifier (derived from YouTube video ID)
userId	UUID	Reference to the user who created the summary
transcript	text	Cleaned transcript extracted from the video
summarize	text	AI-generated summary result
model	enum	Type of LLM model used ("gpt-oss-120b" \| "llama-4-maverick" \| "Qwen3 Reranker 8B")
state	enum	Job state tracking ("pending" \| "start_transcript" \| "success_transcript" \| "start_summarizing" \| "finished" \| "error")
createdAt	timestamp	Record creation time
updatedAt	timestamp	Last update time
videoId	string(11)	YouTube video identifier

Transcript Processing

The transcript extraction includes sophisticated fallback mechanisms:

Primary Extraction: Uses yt-dlp to fetch YouTube video transcripts
Exponential Backoff Retry: On failure, retries after 2s, 4s, and 8s to respect YouTube rate limits
Language Fallback: Attempts English first, then Indonesian if not available
Cleaning: Strips VTT metadata (timestamps, positioning) to extract only the text content

Example Raw VTT Format:

WEBVTTKind: captionsLanguage: en00:00:14.310 --> 00:00:14.320 align:start position:0%Kompas tonight on Kompas TV Independant.

After Cleaning:

Kompas tonight on Kompas TV Independant. Trustworthy, Brother. The first
information...

AI-Powered Summarization

The summarizer worker uses the Fireworks API with structured output:

Prompt Engineering: Generates a detailed prompt with specific instructions for JSON output and Markdown formatting
Model Inference: Sends the cleaned transcript to the LLM (GPT-OSS-120B, Llama 4, or Qwen3) via Fireworks API
JSON Schema Validation: Enforces JSON schema to ensure consistent output format
Result Storage: Saves the generated summary and updates the database state to "finished"

Generated Summary Structure:

# Video Summary[Opening paragraphs explaining main content]## Main Points- [Important point 1]- [Important point 2]- [Important point 3]## Conclusion[Closing paragraph with core message]

Job Queue Architecture

Reclara uses a producer-consumer pattern with BullMQ and Redis:

Producer: Creates jobs when users submit videos
Transcript Worker: Processes transcript extraction jobs
Summarizer Worker: Processes summarization jobs
Redis: Acts as the message broker for reliable job distribution
State Management: Each job progresses through defined states, enabling client-side polling

Tools & Libraries

Frontend:

Next.js - Full-stack React framework
TypeScript - Static type checking
Better-auth - Authentication system

Backend:

Bun - JavaScript/TypeScript runtime
Drizzle ORM - Type-safe database queries
Turso + SQLite - Edge database
Redis - In-memory task queue
BullMQ - Redis-based job queue library
yt-dlp - YouTube transcript extraction
Fireworks AI - LLM inference platform with GPT-OSS-120B, Llama 4, and Qwen3 models

Infrastructure:

Docker & Docker Compose - Local development containerization
GCP Compute Engine - Backend deployment
Vercel - Next.js frontend hosting
GitHub Actions - CI/CD automation