Reclara

YouTube video summarization system with automatic transcription and AI-powered summaries.

For installation and setup instructions, refer to the README on my github.

Project Structure

Technical Explanation

Architecture

Workflow Overview

The Reclara system follows a distributed job queue architecture:

  1. User Submission - User sends a YouTube video URL to the server
  2. Job Creation - Server creates a record in the database with state "pending" and sends a job to the transcript worker
  3. Transcription - Transcriber worker processes the video using yt-dlp to extract and clean the transcript
  4. Summarization - Summarizer worker generates an AI-powered summary using the cleaned transcript
  5. Polling - Client periodically polls the server for results. While WebSocket or Server-Sent Events (SSE) would be more efficient for real-time updates, polling was chosen to align with Vercel's serverless architecture constraints

Database Schema

The core of Reclara's functionality revolves around the Summary table:

FieldTypeDescription
idUUIDUnique identifier (derived from YouTube video ID)
userIdUUIDReference to the user who created the summary
transcripttextCleaned transcript extracted from the video
summarizetextAI-generated summary result
modelenumType of LLM model used ("gpt-oss-120b" | "llama-4-maverick" | "Qwen3 Reranker 8B")
stateenumJob state tracking ("pending" | "start_transcript" | "success_transcript" | "start_summarizing" | "finished" | "error")
createdAttimestampRecord creation time
updatedAttimestampLast update time
videoIdstring(11)YouTube video identifier

Transcript Processing

The transcript extraction includes sophisticated fallback mechanisms:

Example Raw VTT Format:

WEBVTTKind: captionsLanguage: en00:00:14.310 --> 00:00:14.320 align:start position:0%Kompas tonight on Kompas TV Independant.

After Cleaning:

Kompas tonight on Kompas TV Independant. Trustworthy, Brother. The first information...

AI-Powered Summarization

The summarizer worker uses the Fireworks API with structured output:

  1. Prompt Engineering: Generates a detailed prompt with specific instructions for JSON output and Markdown formatting
  2. Model Inference: Sends the cleaned transcript to the LLM (GPT-OSS-120B, Llama 4, or Qwen3) via Fireworks API
  3. JSON Schema Validation: Enforces JSON schema to ensure consistent output format
  4. Result Storage: Saves the generated summary and updates the database state to "finished"

Generated Summary Structure:

# Video Summary[Opening paragraphs explaining main content]## Main Points- [Important point 1]- [Important point 2]- [Important point 3]## Conclusion[Closing paragraph with core message]

Job Queue Architecture

Reclara uses a producer-consumer pattern with BullMQ and Redis:

Tools & Libraries

Frontend:

Backend:

Infrastructure: