Home
Technical White Paper v1.0

A Private Media Cloud with Perceptual Proof of Possession

How Lumen scales to millions of users while maintaining strict compliance and near-zero storage overhead through perceptual hashing and content-addressed deduplication.

Lumen

Engineering

March 202625 min read

Lumen is a cross-platform private media cloud designed to bridge the gap between high-fidelity local media ownership and the convenience of global streaming. Users upload media files they already have on their computer, and Lumen transcodes, stores, and streams it back to them across every screen — phone, tablet, TV, desktop, and browser.

This paper outlines the proprietary Verification & Deduplication Protocol (VDP) that allows Lumen to scale to millions of users while maintaining strict legal compliance and low storage overhead. By combining perceptual hashing, client-side transcoding, and perceptual proof of possession, Lumen ensures that every user who accesses a piece of content has independently demonstrated ownership of the original file — while never storing more than one copy.


1. Product & Market Vision

The Problem

The home media landscape is fractured. Users who purchase or rip physical media face an impossible choice: run complex self-hosted infrastructure (Plex, Jellyfin) that requires always-on hardware, networking expertise, and ongoing maintenance — or abandon their collections for the convenience of subscription streaming, where content disappears without warning and libraries they've "purchased" can be revoked at any time.

Self-hosted solutions impose a hardware tax on every user. A single household running a Plex server consumes electricity, demands port forwarding, and produces a stream only as reliable as their home internet upload speed. This model doesn't scale. It doesn't travel. And it certainly doesn't work on a hotel TV.

The Opportunity

Lumen eliminates the hardware requirement entirely. There is no server in the user's closet. Instead, Lumen operates a cloud-native media infrastructure where users upload files they own, and the platform handles transcoding, storage, and global delivery. The user's only job is to prove they own the file.

This is not a novel concept in isolation — but the execution is. Lumen's Verification & Deduplication Protocol solves the fundamental tension between legal compliance (every user must own what they access) and operational efficiency (storing the same film 10,000 times is economically unviable). The result is a system where storage costs decrease as the user base grows, without compromising the legal requirement that each user independently proves possession.

Business Model

Lumen operates on a tiered subscription model:

TierStorageProfilesConcurrent StreamsPrice
Free50 GB11$0
Starter100 GB11$4.99/mo
Standard500 GB33$9.99/mo
Ultra1 TB510$17.99/mo

Pricing is viable because of Cloudflare R2's zero-egress model — streaming costs $0 regardless of bandwidth consumed. Traditional S3 egress at $0.09/GB would make per-user streaming economically impossible at these price points. With R2, the dominant cost is storage ($0.015/GB/month), and deduplication drives that cost toward zero as the content pool grows.


2. The Lumen Ecosystem

Client Architecture

The Lumen platform ships production clients across every major consumer platform, each purpose-built for its target environment while sharing a unified API contract:

Lumen Desktop (macOS, Windows) — Built on Tauri 2 with a Rust backend and Next.js frontend. The desktop client is the only client capable of local transcoding and client-side perceptual hash computation. It bundles platform-native FFmpeg and FFprobe binaries as Tauri sidecars for each target architecture (macOS x86_64/aarch64, Windows x86_64), enabling offline video processing without external dependencies. Tauri's Rust layer handles file I/O, subprocess management, and presigned URL uploads via reqwest with streaming multipart support. Tauri's cross-platform compilation model means the same Rust codebase produces native binaries for both operating systems with no platform-specific application logic.

Lumen Web — A Next.js 16 application serving as both the primary upload interface and a full playback client. Web users who lack local transcoding capability upload raw files for server-side processing. The web client implements HLS playback via native HTML5 <video> with adaptive bitrate, real-time WebVTT subtitle rendering, and multipart upload orchestration with four concurrent part uploads.

Lumen for iOS — A SwiftUI application targeting iOS 17+ with AVPlayer-based HLS streaming, custom playback controls, background download support via URLSessionDownloadDelegate, and offline playback. The iOS client implements ConnectivityMonitor (Network.framework) for graceful degradation when offline.

Lumen for Apple TV — A SwiftUI application sharing models, API services, and subtitle management with the iOS client via a shared framework. The tvOS client implements focus-based navigation, device code authentication with QR code display, and AVPlayerViewController integration with custom subtitle overlays.

Lumen for Android — A Kotlin application built on Jetpack Compose with Material3 design. Video playback uses Media3 ExoPlayer with custom controls, playback speed adjustment, and a foreground service for background downloads. Token management uses DataStore for encrypted persistence.

Lumen for Smart TVs — A unified television client architecture distributed across the Google Play Store (Android TV), Amazon Appstore (Fire TV), Roku Channel Store, Samsung Galaxy Store (Tizen), and LG Content Store (webOS). The Android TV variant wraps a React-based web application in a lightweight Activity shell, intercepting D-pad events at the native layer and forwarding them as JavaScript KeyboardEvent dispatches into a WebView. A custom spatial navigation engine calculates 2D distances between focusable elements to determine the optimal navigation target for each directional input. This WebView-based architecture enables a single React codebase to target every major smart TV platform with minimal platform-specific adaptation.

Unified API Layer

All clients communicate through a single REST API hosted on Express.js 5. The API contract is versioned and consistent — a content item returns the same JSON structure whether requested by a Kotlin Retrofit call, a Swift URLSession request, or a browser fetch. Authentication uses short-lived JWTs (1-hour access tokens) with long-lived refresh tokens (30-day) stored server-side for revocation capability. Token refresh is implemented identically across all clients: intercept 401 responses, queue concurrent requests behind a single refresh call, retry the original request with the new token.


3. System Architecture

Infrastructure Overview

Infrastructure Overview
Cloudflare R2Object Storage
Event Queue
Presigned Upload
HLS Streaming
Cloudflare WorkerQueue Consumer
Express.js API
PostgreSQL
VerifierpHash Verification
Droplet PoolFFmpeg Transcoding

Database Design

The PostgreSQL schema separates content identity from user access — a critical architectural decision that enables deduplication.

Content Layer — The content table stores one row per unique piece of media, keyed by TMDB ID. A film exists exactly once regardless of how many users have uploaded it. The series_episodes table extends this for episodic content with season/episode granularity.

Access Layer — The user_library table maps users to content (movies and series). The user_episodes table maps users to individual episodes. A user sees an episode in their library if and only if a row exists in user_episodes linking their user ID to that episode ID.

Verification Layer — The phashes column (JSONB) on both content and series_episodes stores the perceptual hash fingerprint of the canonical file. The phash_verified boolean tracks whether the server has independently verified the uploaded file against client-submitted hashes.

This three-layer separation means that granting or revoking access never touches the content itself. Deleting a user cascades through user_library and user_episodes (via ON DELETE CASCADE) without affecting the underlying media files or other users' access.

Webhook Pipeline

R2 object storage events flow through a Cloudflare Worker queue consumer. When a file lands in R2 (via PutObject or CompleteMultipartUpload), the Worker forwards the event to the API server. An in-memory sequential queue on the API server prevents database connection stampedes during burst uploads — events are processed in FIFO order, one at a time, ensuring consistent state transitions.


4. The Verification & Deduplication Protocol (VDP)

The VDP is the core innovation that makes Lumen legally and economically viable. It ensures two invariants:

  1. Proof of Possession: Every user who accesses content has independently demonstrated that they possess the original media file.
  2. Single-Copy Storage: The platform stores at most one copy of any given piece of content, regardless of how many users have proven possession.

4.1 Perceptual Hashing Algorithm

Lumen uses a DCT-based perceptual hash (pHash) to generate a visual fingerprint of video content. Unlike traditional cryptographic hashes (SHA-256), perceptual hashes produce similar outputs for visually similar inputs — meaning the same film encoded at different bitrates, resolutions, or with minor compression artifacts will still produce a matching hash.

Frame Extraction: 10 frames are sampled at evenly-spaced positions throughout the video (at positions i/(N+1) for i = 1..10), avoiding the first and last frames which frequently contain logos or black screens.

Hash Computation (per frame):

  1. The frame is scaled to 32x32 pixels in grayscale (1,024 bytes).
  2. A 2D Discrete Cosine Transform (Type-II) is applied — first row-wise, then column-wise — with standard normalization factors.
  3. The top-left 8x8 block of DCT coefficients is extracted (excluding the DC component at [0,0]), yielding 63 frequency-domain values.
  4. The median of these 63 values is computed.
  5. A 64-bit hash is constructed: bit i is set to 1 if coefficient i exceeds the median, 0 otherwise.
  6. The hash is stored as a 16-character hexadecimal string.

Distance Metric: The Hamming distance between two 64-bit hashes counts the number of differing bits. For a pair of videos, the average Hamming distance across all 10 frame pairs produces a single distance score:

DistanceInterpretation
0 – 10Identical content (same file or same encode)
10 – 32Same content, different quality (resolution upgrade candidate)
> 20Different content (verification failure)

4.2 The Four-Step Handshake

Every piece of content that enters Lumen passes through a four-step handshake. The exact flow varies between desktop and web clients, but the invariants are preserved.

Step 1: Client-Side Metadata Extraction ("Analyze")

The client parses the filename to extract a probable title, series indicators (S01E01 patterns), and year. The API searches the local database first (by TMDB ID), then falls back to the TMDB API for metadata matching. If the content already exists in the database, the API returns it as a database match with source "db". If not, it returns TMDB search results with source "tmdb". This distinction drives the deduplication logic in subsequent steps.

Step 1 — Analyze
ClientAPITMDB
1
ClientAPI

Sends filename and file size for content identification

POST /analyze
2
APITMDB

Searches TMDB for metadata matches using cleaned title

3
APIAPI

Cross-references TMDB IDs against local database for existing content

4
APIClient

Returns match candidates with source flag indicating DB or TMDB origin

matches[] — source: "db" | "tmdb"

Step 2: The Challenge ("Confirm")

The client confirms the match and, if the content already exists in the database (a dedup scenario), must prove possession. The flow diverges based on client capability:

Desktop Path (local pHash available):

  1. The desktop client computes pHash locally using its bundled FFmpeg/FFprobe.
  2. The client sends the 10 frame hashes to the API alongside the confirm request.
  3. The API compares the submitted hashes against the stored hashes for the canonical file.
  4. If the average Hamming distance is ≤ 10, possession is proven. Access is granted immediately — no upload required.
  5. If the distance is > 10 but ≤ 32, the content may be a quality upgrade candidate.
  6. If the distance is > 32, the hashes don't match. The request is rejected.
Step 2 — Desktop Confirm (with local pHash)
DesktopAPI
1
DesktopAPI

Sends content ID with 10 locally-computed perceptual hashes

POST /confirm { content_id, phashes[10] }
2
APIAPI

Compares submitted hashes against stored hashes using Hamming distance

distance = hamming_avg()
3
APIDesktop

If distance ≤ 10: proof of possession confirmed, access granted instantly — no upload needed

{ added_to_library: true }
4
APIDesktop

If no stored hash exists (new content): returns presigned upload URL

{ upload_url, ... }

Web Path (no local pHash):

  1. The web client cannot compute pHash (no FFmpeg in the browser).
  2. When confirming against existing content, the API returns upload_needed: true with a presigned URL targeting a verify.* key in R2.
  3. The client uploads the original file to this verification key.
  4. The R2 webhook detects the verify.* landing and sets the content status to verify_dedup.
  5. The server-side Verifier downloads the file, computes pHash, compares against the stored hashes, and either grants or denies access.
  6. The verification file is always deleted after processing. The content status is restored to ready.
Step 2 — Web Dedup Verification (no local pHash)
Web ClientAPIR2Verifier
1
Web ClientAPI

Confirms match. No pHash sent (browser cannot compute locally)

POST /confirm
2
APIWeb Client

Returns presigned URL for a verification upload

{ upload_needed: true, upload_url }
3
Web ClientR2

Uploads original file to a temporary verification key

PUT verify.mkv
4
R2API

Webhook notifies API that the verification file has landed

status → verify_dedup
5
VerifierR2

Downloads the verification file, extracts 10 frames, computes pHash

6
VerifierVerifier

Compares computed hashes against stored hashes for the canonical content

distance = hamming_avg()
7
VerifierAPI

If distance ≤ 10: grants user access. If > 10: rejects. Always deletes the verification file

grant / deny → status → ready

Step 3: The Verification ("Verify")

For new content (not a dedup scenario), the uploaded file must be verified to ensure the client-submitted hashes match the actual file:

  1. The client uploads the transcoded file (desktop) or original file (web) to R2.
  2. The webhook triggers a status transition to verifying (desktop) or transcoding (web).
  3. For web uploads, the elastic transcoding pool processes the file first (remux MKV to MP4, encode to H.264/AAC).
  4. The Verifier downloads the final stream.mp4, computes pHash, and compares against the client-submitted hashes.
  5. If the distance exceeds 20, the file is rejected and the status is set to error.
  6. If the distance is ≤ 20, the content is marked ready and phash_verified = true.

This step prevents a class of attack where a client submits hashes from Film A but uploads Film B.

Step 4: The Mapping ("Grant")

Access is granted at the moment of verification:

  • New content: An INSERT INTO user_episodes (or user_library for movies) links the uploading user to the content.
  • Desktop dedup: The INSERT happens inline during the confirm request, since pHash comparison is instantaneous.
  • Web dedup: The INSERT happens asynchronously when the Verifier completes successfully.

Access is scoped to the individual episode level for series content. A user who proves possession of S01E01 gains access to S01E01 only — not the entire series. This granularity ensures that proof of possession is never extrapolated beyond what the user has demonstrated.

4.3 Distance Thresholds and Security Margins

The choice of distance thresholds balances false positive rate (granting access to users who don't own the content) against false negative rate (rejecting legitimate owners whose files have minor encoding differences):

  • Threshold 10 (dedup match): Extremely conservative. Two encodes of the same source at different bitrates typically produce distances of 2–6. A threshold of 10 accommodates slight variations in frame extraction timing while remaining well below the distance between different films (typically 30–50+).
  • Threshold 20 (verification rejection): More permissive, accounting for the fact that a user's local transcode may introduce artifacts not present in the canonical server transcode. This threshold catches genuinely different content while accepting legitimate quality variations.
  • Threshold 32 (quality upgrade): Used only to flag potential quality upgrades, not for access control.

4.4 Automatic Quality Escalation

When multiple users upload the same content at different resolutions, Lumen automatically retains the highest quality version. Each upload carries a resolution_height value (e.g., 720, 1080, 2160) extracted during the probe phase. When a new upload for existing content arrives at a higher resolution than the canonical copy, the platform triggers a quality upgrade:

  1. The new file replaces the existing stream.mp4 in R2.
  2. The resolution_height on the content record is updated.
  3. The content status transitions through upgrading and back to ready.
  4. All users who have proven possession of that content — not just the uploader — immediately benefit from the higher-quality stream.

This creates a positive-sum dynamic: the content pool's quality monotonically increases over time. The first user to upload a DVD rip gets a watchable stream. When a second user uploads a Blu-ray encode of the same film, every user's stream upgrades to Blu-ray quality automatically. A subsequent 4K upload upgrades everyone again. No user action is required — the platform always serves the best available version.

The pHash distance threshold of 10–32 is what makes this possible. A 720p and a 4K encode of the same film produce perceptual hashes that are similar enough to confirm identity (typically distance 5–15) but with a resolution delta that triggers the upgrade path rather than a simple dedup match.


5. The Content Pipeline

5.1 Metadata Resolution

When a file enters the system, its filename is parsed through a multi-stage extraction pipeline:

  1. Series detection: Regex patterns match S01E01, 1x01, and bare 101 formats.
  2. Quality tag removal: Tokens like 2160p, 1080p, x264, BluRay, HDRip, and bracket-enclosed metadata are stripped.
  3. Year extraction: Four-digit years are identified and used as a search refinement signal.
  4. Title normalization: Remaining tokens are joined with spaces and cleaned of artifacts.

The cleaned title is searched against the TMDB API, which returns candidates with poster art, ratings, release years, and TMDB IDs. The API then cross-references these TMDB IDs against the local database to detect existing content — even when the user's filename differs from the canonical title (e.g., "The Office US" matching "The Office" in the database via shared TMDB ID 2316).

5.2 Background Metadata Enrichment

After a user confirms a match, the API returns the presigned upload URL immediately and populates extended metadata in the background. This includes:

  • Full cast and crew with roles and display order
  • Production companies with logos
  • Content keywords and genre tags
  • YouTube trailer keys
  • Similar/recommended content links
  • For series: complete episode listings with crew, guest stars, and per-episode ratings
  • Extended fields: IMDB ID, budget, revenue, spoken languages, production countries

This background enrichment ensures that the upload flow is never blocked by metadata fetching, while the content detail page is fully populated by the time the user navigates to it.

5.3 Subtitle Integration

Subtitles are sourced from two channels:

Embedded extraction: The desktop client uses FFmpeg to detect and extract text-based subtitle tracks from container formats (MKV, MP4). SRT subtitles are converted to WebVTT format and uploaded to R2 alongside the video.

OpenSubtitles API: All clients can search the OpenSubtitles database directly from the player interface. Selected subtitles are downloaded server-side, converted to VTT, and stored in R2. Subtitle playback uses a client-side VTT parser with binary search for O(log n) cue lookup, ensuring smooth real-time rendering even with large subtitle files.


6. Storage & Streaming Infrastructure

6.1 Cloudflare R2

Lumen's storage layer runs entirely on Cloudflare R2, chosen for a single decisive reason: zero egress fees. Traditional cloud storage providers charge $0.05–0.09 per GB of data transferred out. For a video streaming platform, egress is the dominant cost — a single user watching a 5 GB movie costs $0.25–0.45 in egress alone. At scale, this makes consumer-priced streaming economically impossible.

R2 charges $0.015/GB/month for storage with zero egress. A 500 GB content library costs $7.50/month to store, and streaming it costs nothing regardless of how many times it's watched or from how many devices.

6.2 Object Key Structure

content/{contentId}/original.{ext}              # Raw upload (pre-transcode)
content/{contentId}/stream.mp4                  # Transcoded playback file
content/{contentId}/s{SS}e{EE}/original.{ext}   # Episode raw upload
content/{contentId}/s{SS}e{EE}/stream.mp4       # Episode playback file
content/{contentId}/s{SS}e{EE}/verify.{ext}     # Dedup verification (temporary)
content/{contentId}/subtitles/{lang}.vtt         # Subtitle files

6.3 Upload Mechanism

Files are uploaded directly from the client to R2 via presigned URLs, bypassing the API server entirely. This eliminates the API as a bandwidth bottleneck and allows uploads to saturate the client's upstream connection.

  • Single-part upload: For files under 2 GB, a single presigned PUT URL with 6-hour expiry.
  • Multipart upload: For files over 2 GB, the API initiates a multipart upload and returns presigned URLs for each 100 MB part. The client uploads parts with up to four concurrent connections. After all parts complete, the client calls complete-multipart to finalize.

6.4 Streaming Delivery

Playback URLs are generated as time-limited presigned GET URLs from R2. Clients use platform-native players:

  • iOS/tvOS: AVPlayer with native HLS support
  • Android: Media3 ExoPlayer with HLS and MP4 support
  • Web/TV: HTML5 video element
  • Desktop: Embedded web player via Tauri URI scheme

7. Elastic Transcoding

7.1 The Problem

Video transcoding is computationally expensive and bursty. A platform with 1,000 uploads per day cannot maintain idle hardware for peak load, nor can it ask users to wait hours in a shared queue.

7.2 Pool Architecture

Lumen operates two elastic transcoding pools on DigitalOcean, separated by user tier:

PoolDroplet SizeMax DropletsConcurrencyUsers
Free4 vCPU, 8 GB22 per dropletFree tier
Paid8 vCPU, 16 GB102 per dropletStarter, Standard, Ultra

Each droplet is provisioned on-demand with a cloud-init script that installs FFmpeg, clones the transcoder repository, and begins polling the database for work. Droplets are tagged by pool and auto-destroyed after 900 seconds of idle time.

7.3 Priority Queue

Within each pool, jobs are prioritized by subscription tier:

TierPriorityPool
Free0Free
Starter1Paid
Standard2Paid
Ultra3Paid

7.4 Transcoding Specification

The transcoding pipeline prioritizes compatibility over compression efficiency:

  • Video: H.264 High Profile Level 4.1, CRF 20, veryfast preset, yuv420p pixel format. If the source is already H.264, the video stream is remuxed (copied) without re-encoding.
  • Audio: AAC stereo at 192 kbps. Multi-channel audio is downmixed.
  • Container: MP4 with faststart flag (moov atom at beginning for progressive playback).
  • Subtitles: Extracted from embedded tracks during transcode, converted to VTT, stored separately in R2.

7.5 Desktop Local Transcoding

The desktop client offers an alternative path: transcode locally before uploading. The client runs a 10-second benchmark at a random position in the video using the same encoding parameters as the server pipeline. The benchmark FPS, combined with the video's duration and frame rate, produces an estimated transcode time. If favorable, the client transcodes locally and uploads the final stream.mp4 directly — skipping the server transcoding queue entirely.

7.6 Crash Recovery

A background job runs every hour to detect and recover from transcoding failures:

  • Content stuck in uploading status for more than 1 hour is reset to pending.
  • Content stuck in transcoding or processing for more than 2 hours is re-queued.

8. Security & Session Architecture

8.1 Authentication

Lumen supports two authentication methods:

  1. Passwordless OTP: A 6-digit one-time passcode is sent via email with a 10-minute expiry. OTP codes are marked as used immediately upon verification to prevent replay.
  2. Password-based: Optional for users who prefer traditional login. Passwords are hashed with bcrypt.

8.2 Token Architecture

  • Access Token: JWT with 1-hour expiry. Stateless — the API validates the signature without a database lookup.
  • Refresh Token: 30-day expiry, stored in the database with device metadata (User-Agent, IP address, device type). Stored server-side to enable revocation.
  • Session Management: Users can view active sessions and revoke individual devices from the profile settings page.

8.3 TV Device Authorization

Television clients cannot easily input email addresses or passwords. Lumen implements a device code flow:

  1. The TV requests a 6-character alphanumeric user code and a longer device code.
  2. The TV displays the user code alongside a QR code linking to the verification URL.
  3. The user scans the QR code and enters the user code while authenticated on their phone or computer.
  4. The TV polls every 3 seconds. When the user code is verified, the poll returns access and refresh tokens.
  5. Device codes expire after 15 minutes.

8.4 Multi-Profile Support

Each account supports multiple viewer profiles (1 for Free, 3 for Standard, 5 for Ultra). Profiles have independent:

  • Watch progress (position, duration, percent per content item)
  • PIN protection (optional)
  • Display names and avatars

8.5 Presigned URL Security

All R2 interactions use presigned URLs with time-limited validity (6 hours for uploads, shorter for playback). The API server is the sole entity capable of generating these URLs — clients never receive R2 credentials.



10. Conclusion

Lumen solves a real problem: personal media is trapped on physical discs and local hard drives, inaccessible from the devices people actually use. Self-hosted solutions demand technical expertise and always-on hardware. Cloud alternatives either ignore copyright law or charge egress fees that make consumer pricing impossible.

The Verification & Deduplication Protocol is the key innovation that makes Lumen viable at scale. By requiring every user to independently prove possession of their media files — through perceptual hashing on the desktop or verification uploads on the web — Lumen maintains strict legal compliance while driving storage costs toward zero through content-addressed deduplication.

The platform ships today with production clients covering every major screen — macOS, Windows, iOS, Android, Apple TV, and smart TVs from Samsung, LG, Roku, Amazon, and Google — an elastic transcoding pipeline that scales with demand, and a zero-egress storage layer that makes $4.99/month pricing sustainable.

Lumen is not a streaming service. It is a private media cloud — a place where the media you own lives, accessible from anywhere, on any screen, without a server in your closet.


Lumen — Your media. Every screen. No hardware.

Confidential — For Authorized Recipients OnlyVersion 1.0March 2026

Ready to build your cinema?

Be the first to know when Lumen launches.

Join the Waitlist