Engineering

EBU R128 loudness in the browser

A quiet Twitch clip and a shouty TikTok shouldn't end up at wildly different volumes when an overlay plays them back-to-back on stream. EBU R128 LUFS is the standard fix. Doing it server-side with ffmpeg is easy. Doing it client-side in an OBS browser source, with cross-origin clips you don't host, is the interesting problem.

Published Jun 30, 2026

The problem

The toolset has three overlays that play someone else's audio on your stream: Clip play (mods drop a video URL with !playclip), BRB player (auto-cycles your channel clips), and Video shout-out (auto-plays a clip of a target streamer with !vso).

Source loudness is wildly inconsistent. A Twitch clip from a quiet ASMR streamer might integrate to -28 LUFS. A clip from a hype shooter streamer might integrate to -10 LUFS. Played back-to-back on your overlay, that's an 18 dB difference. Your viewers reach for the volume knob; the loud one ear-blasts whoever didn't.

Why not just normalize server-side

The textbook answer is ffmpeg loudnorm in two-pass mode. Server transcodes every clip ahead of time, stores the normalized version, overlay plays the normalized version. Clean.

We don't do this. Reasons:

We don't host the clips. Twitch clips live on Twitch's CDN. YouTube clips live on YouTube. Transcoding would mean downloading every clip the moment it's requested and storing it on our infrastructure. Bandwidth, storage, and legal exposure we don't want.
Latency. !playclip is a chat command. The overlay should react in under a second. A server transcode adds 5-15 seconds and a progress spinner.
The free-tier constraint. Cloudflare Workers don't do ffmpeg. We'd need to bolt on a paid transcoding worker per overlay event, which scales costs with chat activity in exactly the wrong way.

The Web Audio graph

Instead, we normalize client-side, in real time, inside the OBS browser source. The Web Audio graph for a playing clip:

<video> -> MediaElementSource
             -> AnalyserNode     (measure)
             -> GainNode         (boost or cut)
             -> CompressorNode   (smooth peaks)
             -> a hard limiter   (catch the rest)
             -> destination

The roles:

AnalyserNode takes a 3-second integration window at 4 Hz sampling, gives us a rolling LUFS-like estimate. Not bit-exact EBU R128 — that needs K-weighting and a longer integration window — but accurate enough for the "don't ear-blast the stream" threshold.
GainNode is the actual loudness fix. We compute a target gain from the measured loudness vs the target (default -16 LUFS, configurable per overlay). The clamp is asymmetric: up to +24 dB of boost, down to -12 dB of cut. Quiet clips need more headroom than loud ones.
CompressorNode is the safety net for clips that have a quiet body and a sudden loud spike (think a jumpscare in an otherwise quiet horror clip). It catches the peak without making the whole clip sound squished.
Hard limiter is the last defense. Anything still over 0 dBFS after compression gets clipped here rather than at the user's speakers. WaveShaperNode with a steep curve.

The noise-floor problem

The naive version of this — "boost everything to -16 LUFS" — destroys clips that are intentionally quiet. Someone's "hellooo? is this thing on?" whisper becomes a normal-volume voice, and the room tone behind it becomes audible static.

We track a noise-floor estimate alongside the loudness estimate. When the gap between "loud parts of the clip" and "quiet parts of the clip" is large (high dynamic range), we cap the boost lower than the asymmetric default. The quiet whisper stays quiet; we don't turn the room tone into a chainsaw.

This is the part that no off-the-shelf loudnorm equivalent does for free. Real broadcast loudness tools care about it too — the EBU recommendation explicitly carves out intentional dynamic range from the normalization target.

Cross-origin landmines

The <video> element happily plays a cross-origin URL. MediaElementSource happily accepts that element. But the AnalyserNode reads samples that require the audio to be CORS-readable, which Twitch's clip CDN doesn't serve.

Solution: the video element has crossOrigin="anonymous" set, and we accept that for some sources the analyser will return zeros. In that case, the gain is set from a stored per-clip estimate (computed once on first play, cached against the clip ID). Second play, the cached value drives the gain from frame one; first play, we ramp gain over the first 3 seconds as the analyser converges.

YouTube embeds are a different story — the iframe sandbox doesn't expose any AudioContext at all, so for YouTube clips we use the YouTube iframe API's setVolume and skip the dynamics graph entirely. Coarser, but it's what we get.

Result, in practice

A back-to-back rotation of 10 random Twitch clips through the BRB player overlay used to vary by 15-20 dB peak-to-peak. With the normalization graph in place, it's consistently within 4-6 dB — about the same range as commercial broadcast TV. Enough that nobody reaches for the knob.

The settings page exposes the target LUFS, the boost/cut clamps, and a per-overlay toggle so you can turn it off on overlays where you actually want clip-loudness variation (e.g. the music widget, where the source is already mastered).

See the settings doc for the user-facing controls, and play YouTube clips on stream for where this fires most often in practice.