You are building a web app that needs screen recording. Maybe an async-video tool. Maybe a Loom alternative. Maybe a video-chat product where users share their screen and you want a recorded copy on the server. Maybe a debugging tool that captures user sessions for support replay.

In 2026, the right answer almost always involves WebRTC. Browsers ship getDisplayMedia(), MediaRecorder, and a mature SFU ecosystem (LiveKit, Daily, Mediasoup, Twilio) on top. The path from "feature request" to "working capture" is shorter than it has ever been. The path from "working capture" to "production system" still has real engineering.

This guide walks through four implementation paths, the tradeoffs each carries, and the production considerations the docs page does not cover. Code snippets are JavaScript. The same patterns apply with TypeScript, React, Vue, Svelte, or whatever framework you prefer.

Quick Comparison

Path	Where Recording Lives	Best For	Difficulty
`getDisplayMedia` + `MediaRecorder`	Client-side, browser	MVP, simple apps	Easy
Custom signaling + WebRTC SFU	Server-side via media server	Multi-user, scalable	Hard
Hosted SFU (LiveKit, Daily)	Server, managed	Production speed, less ops	Medium
Hybrid (client capture, server upload)	Client capture, server storage	Long recordings, reliability	Medium

WebRTC and Screen Capture in 2026

WebRTC is the W3C and IETF standard for real-time peer-to-peer media. Originally built for video calling, the spec evolved through the 2010s to include screen capture (getDisplayMedia), data channels, and increasingly granular control over codec selection and bitrate.

For screen recording specifically, the relevant pieces are:

getDisplayMedia() — prompts the user for screen-share permission, returns a MediaStream containing video tracks (and optionally audio).
MediaRecorder — encodes a MediaStream into a Blob in WebM or MP4 format.
RTCPeerConnection — establishes a peer connection for streaming the captured stream to another peer or to a media server.
SFU (Selective Forwarding Unit) — server software that ingests WebRTC streams and forwards them to other peers or records them.

Browser support in 2026 is excellent for desktop browsers. Chrome, Edge, Firefox, and Safari all ship getDisplayMedia and MediaRecorder. Mobile is more constrained — iOS Safari supports getDisplayMedia only in narrow contexts, Android Chrome works mostly as expected.

Method 1: Browser-Native Capture and Save

The simplest implementation. The user clicks a button, the browser prompts for screen share, the recording happens entirely in the browser, the file downloads when they stop.

About 30 lines of JavaScript:

async function startRecording() {
  const stream = await navigator.mediaDevices.getDisplayMedia({
    video: {
      displaySurface: 'monitor',
      frameRate: 30,
    },
    audio: true,
  });

  const recorder = new MediaRecorder(stream, {
    mimeType: 'video/webm;codecs=vp9,opus',
    videoBitsPerSecond: 5_000_000,
  });

  const chunks = [];
  recorder.ondataavailable = (e) => chunks.push(e.data);
  recorder.onstop = () => {
    const blob = new Blob(chunks, { type: 'video/webm' });
    const url = URL.createObjectURL(blob);
    const a = document.createElement('a');
    a.href = url;
    a.download = 'recording.webm';
    a.click();
    URL.revokeObjectURL(url);
  };

  // Stop when the user ends the screen share via the browser UI
  stream.getVideoTracks()[0].onended = () => recorder.stop();

  recorder.start(1000); // emit a chunk per second
  return { recorder, stream };
}

That snippet is enough for a working MVP. Things this version does not handle:

Pause and resume
Mic capture in addition to system audio
Webcam picture-in-picture
Long recordings that need streaming upload (the whole Blob lives in memory until stop)
Format conversion to MP4 for cross-platform playback

For a quick prototype, ship as-is. For a real product, plan to extend.

`getDisplayMedia` Options Worth Knowing

The getDisplayMedia constraints object has a handful of options that materially change behavior.

displaySurface controls what the browser offers in the picker:

'monitor' — full-screen capture
'window' — single application window
'browser' — a single browser tab
'' (empty / unset) — let the user choose between all three

Setting displaySurface: 'browser' for an in-app guide is a strong UX choice — the user can only share a tab, which is typically what you want for a tutorial of your own product. Setting 'monitor' for a full-desktop capture matches Loom's default flow.

audio: true captures system audio, with caveats:

Chrome and Edge: works for tab capture (gets the tab's audio) and for window capture on Windows. macOS does not allow system audio capture from getDisplayMedia due to OS-level restrictions.
Firefox: tab audio only.
Safari: video only as of Safari 18; system audio capture coming in Safari 19 according to WebKit's published roadmap.

For mic audio, capture separately with getUserMedia({ audio: true }) and combine the tracks:

const display = await navigator.mediaDevices.getDisplayMedia({ video: true, audio: true });
const mic = await navigator.mediaDevices.getUserMedia({ audio: true });
const combined = new MediaStream([
  ...display.getVideoTracks(),
  ...display.getAudioTracks(),
  ...mic.getAudioTracks(),
]);

Pass combined to MediaRecorder. The encoder will mix the audio tracks together in the output.

frameRate is a hint, not a guarantee. Browsers cap screen capture at typically 30 or 60 fps depending on platform and source. Asking for 120 will not raise the cap.

cursor: 'always' | 'motion' | 'never' controls cursor visibility in the capture. 'motion' shows the cursor only when it moves — useful for cleaner static captures.

Try Screenify Studio — free, unlimited recordings

Auto-zoom, AI captions, dynamic backgrounds, and Metal-accelerated export.

Download Free

Method 2: Client-Side `MediaRecorder` Done Properly

Method 1 was the toy version. A real implementation needs more.

Streaming to a server while recording. The toy version accumulates Blobs in memory. A 30-minute 1080p recording at 5 Mbps is over 1 GB. Browsers will OOM. Solution: stream chunks to your server as they arrive.

recorder.ondataavailable = async (e) => {
  if (e.data.size === 0) return;
  await fetch('/api/upload-chunk', {
    method: 'POST',
    body: e.data,
    headers: {
      'X-Recording-Id': recordingId,
      'X-Chunk-Index': String(chunkIndex++),
    },
  });
};
recorder.start(2000); // 2-second chunks

On the server, append chunks to a single file or use a multipart upload to S3, R2, or your storage of choice. A 2-second chunk at 5 Mbps is around 1.25 MB — small enough for reliable HTTP uploads, large enough to keep request count manageable.

Pause and resume. MediaRecorder exposes pause() and resume(). The output file remains a single contiguous WebM, with the gap simply skipped. No timecode discontinuity.

Codec selection. MediaRecorder.isTypeSupported(mimeType) tells you which codecs the browser can encode. Probe in this order:

const candidates = [
  'video/webm;codecs=av01,opus',     // AV1 — best compression, slower encode
  'video/webm;codecs=vp9,opus',      // VP9 — good compression, broad support
  'video/webm;codecs=vp8,opus',      // VP8 — fallback
  'video/mp4;codecs=avc1,mp4a',      // MP4/H.264 — Safari preferred
];
const mimeType = candidates.find((m) => MediaRecorder.isTypeSupported(m));

Chrome and Firefox encode VP9 efficiently. Safari prefers MP4/H.264 and lacks VP9 encoder support entirely. AV1 encoding is supported in Chrome 116+ but is CPU-heavy on lower-end machines — falling back to VP9 is sensible.

Bitrate. videoBitsPerSecond: 5_000_000 (5 Mbps) is a reasonable default for 1080p screen content. Screen content is mostly flat colors and text, which compresses well — you can push lower (3 Mbps) for clean captures and still get sharp text. 4K screen content typically needs 10-15 Mbps.

File container. WebM is the natural output for VP9 and AV1 streams. MP4 is the natural output for H.264. Cross-platform players prefer MP4. Many products convert WebM to MP4 server-side using ffmpeg before serving for download. The conversion is fast (it can be done with -c copy in some cases, or with a re-encode in others).

Method 3: Server-Side Recording via SFU

Client-side recording works for one-user-records-their-own-screen scenarios. For multi-user video calls where you want a recorded copy of everyone's video and screen share, you need server-side recording through an SFU.

The architecture: each participant's browser opens a RTCPeerConnection to a media server. The media server (the SFU) ingests their tracks. A separate "recording bot" subscribes to the relevant tracks and writes them to disk or cloud storage.

In 2026 the practical SFU options are:

LiveKit — open source, mature, has hosted (LiveKit Cloud) and self-hosted options. Recording API records a composite of all participants' tracks. Free tier exists; paid tiers scale to enterprise.
Mediasoup — open source, low-level, requires more wiring than LiveKit. Battle-tested in production at multiple unicorns.
Janus — older, still maintained, broader plugin ecosystem. Less common for greenfield projects in 2026.
Daily.co — managed service. Recording API charges $0.0058 per participant-minute, so a 1-hour 4-person call costs roughly $1.39. Strong API, friendly pricing for early-stage.
Twilio Video — pay-per-use, more enterprise-flavored billing. Recording is bundled into the participant minute pricing.
Vonage Video API — formerly OpenTok, similar territory to Twilio.

Picking between them is mostly a build-vs-buy question. LiveKit self-hosted is the open-source go-to. LiveKit Cloud, Daily, Twilio, and Vonage are managed services that get you to recording in days rather than weeks.

The recording bot pattern is the same across all of them. The bot is a server-side WebRTC peer that subscribes to tracks, decodes them, mixes them into a composite frame, and encodes the result with ffmpeg into a single MP4. The composite layout (grid, presenter-plus-thumbnails, screen-share-plus-camera) is configurable. Each platform ships a default layout and lets you override.

Method 4: Hosted SFU with Built-In Recording

For most teams, hosted SFU is the right call. The build-it-yourself path involves operating media servers, managing recording bots, dealing with codec negotiation, and handling the long tail of network conditions. Hosted services do that for you.

A typical Daily.co integration:

import DailyIframe from '@daily-co/daily-js';

const call = DailyIframe.createCallObject();
await call.join({ url: 'https://yourcompany.daily.co/room' });

// Start cloud recording
await call.startRecording({
  layout: { preset: 'active-speaker' },
  width: 1920,
  height: 1080,
});

// Stop recording — the recording is uploaded to your S3 bucket or downloadable via Daily's API
await call.stopRecording();

LiveKit's API is similar:

import { RoomServiceClient, EgressClient } from 'livekit-server-sdk';

const egress = new EgressClient(LIVEKIT_URL, API_KEY, API_SECRET);
await egress.startRoomCompositeEgress('my-room', {
  file: {
    fileType: 'mp4',
    filepath: 'recording-{room_name}-{time}.mp4',
    s3: { accessKey, secret, bucket: 'recordings', region: 'us-east-1' },
  },
});

Both ship as a few lines of code. The complexity moves from "build and run an SFU" to "wire up auth, billing, and storage."

Try Screenify Studio — free, unlimited recordings

Auto-zoom, AI captions, dynamic backgrounds, and Metal-accelerated export.

Download Free

Browser Support and Quirks

Cross-browser quirks are where WebRTC projects spend a meaningful chunk of debugging time.

Chrome and Edge — the reference implementation. getDisplayMedia, MediaRecorder, all WebRTC features work as documented. Use Chrome as your dev target.

Firefox — works for most cases. Edge cases: Firefox does not support all VP9 profiles, and MediaRecorder mime-type negotiation can return slightly different strings than Chrome.

Safari — gets getDisplayMedia and MediaRecorder but with restrictions. Safari prefers MP4 over WebM. Safari does not support RTCRtpScriptTransform (for end-to-end encryption customization). iOS Safari has the tightest restrictions — getDisplayMedia works only on iOS 17+ and only via specific user-gesture triggers.

Mobile — Android Chrome behaves like desktop Chrome with reduced bitrate caps. iOS Safari is the most constrained; some apps that need iOS screen capture skip WebRTC and use ReplayKit broadcast extensions instead.

For developer-tool products specifically, see our best screen recorder for developers overview, which covers when to build versus integrate versus use a desktop app.

Privacy and Permissions

Browser screen capture is permission-gated by design. The browser shows a picker, the user explicitly grants access, the chosen surface is the only thing that gets captured. There is no path to "auto-capture without user consent" — and there should not be, this is a deliberate security boundary.

Practical implications for product design:

The picker UI is browser-controlled. You cannot style it. You cannot pre-select a surface.
Permission is per-session. If the user reloads, they have to grant access again.
Some browsers offer "remember this site" for getDisplayMedia, others do not. Do not rely on remembered permissions.
Show a clear in-app explanation before triggering the permission prompt. Users decline more often when prompted out of context.

For team-shared environments (kiosks, shared computers), persistent permissions are a security hole; lean on the browser's per-session model.

Production Considerations

A handful of issues that bite at scale.

Bitrate adaptation. WebRTC negotiates bitrate based on network conditions. For recording, you typically want a fixed high bitrate, not adaptive. RTCRtpSender.setParameters({ encodings: [{ maxBitrate: 8_000_000 }] }) lets you pin a ceiling. For pure local MediaRecorder capture (not over WebRTC), the videoBitsPerSecond option does the same.

Encryption. WebRTC encrypts in transit by default (DTLS-SRTP). For recordings, the encryption is removed at the SFU before saving. If the recording itself needs to be encrypted at rest, encrypt the MP4 file after the SFU writes it. End-to-end encryption (E2EE) for recorded content requires custom keying with RTCRtpScriptTransform and is rarely worth the complexity for screen-share use cases.

Scaling. A single SFU machine handles roughly 200-500 concurrent participants depending on hardware and codec. For larger scale, run multiple SFUs and route participants by room or geography. Hosted services do this automatically.

Storage and retention. A 1-hour 1080p recording is 200-500 MB depending on bitrate and content. A team using the product daily generates terabytes of recordings per month. Plan retention policies, lifecycle to cheaper storage tiers (S3 Glacier, R2), and offer users a "delete after N days" toggle.

Transcoding. Browsers produce WebM, your customers want MP4. Run an ffmpeg transcoding job after upload. For shorter recordings, the conversion is a few seconds. For long recordings, run async with a queue.

Reference Implementations

Open source and commercial projects to study or build on:

Cap — open source Loom alternative. Not WebRTC-based itself (uses native Mac and Windows recording), but the sharing layer is web-native and worth reading.
Plug — open source web-based screen recorder built on MediaRecorder. Small enough to read end-to-end in an afternoon.
LiveKit's example apps — the livekit/agents-js repo includes recording-bot examples.
Daily.co's prebuilt UI — embed a fully-recordable call in 10 lines of HTML; useful as a baseline before customizing.
Mediasoup demos — the mediasoup-demo repo shows recording integration with ffmpeg.

For commercial APIs:

Daily.co — friendly free tier, recording API at $0.0058 per participant-minute.
LiveKit Cloud — free tier with bandwidth limits, paid plans by usage.
Twilio Video — enterprise-leaning, pay-per-use.
Vonage Video API — similar to Twilio, also pay-per-use.

Troubleshooting

Permission prompt does not appear. getDisplayMedia must be called from a user-gesture handler (button click). If you call it programmatically (e.g., from useEffect in React), browsers silently reject. Wire it to an onClick.

Recording is silent on macOS. macOS does not allow getDisplayMedia to capture system audio. Combine with mic-only audio, or instruct users to use a tool like BlackHole as a virtual audio device. For native Mac recording with system audio, see our guide on recording internal audio on Mac.

Saved file will not play in QuickTime. WebM files do not play in QuickTime by default. Either convert to MP4 server-side, or instruct users to open in VLC or a browser. For productized apps, transcode to MP4.

Long recordings hit memory limits. Switch to streaming uploads (chunks emitted every 1-2 seconds, sent to server immediately). Do not let the in-memory Blob array grow unbounded.

MediaRecorder stops randomly. Most often caused by the underlying MediaStream losing a track — the user closed the shared window, the screen-share session was revoked by the OS, or a network blip dropped the WebRTC peer. Listen for track.onended and recorder.onerror to handle gracefully.

Audio out of sync with video. Symptom of mixing two MediaStreams with different clock domains. Combine tracks into a single stream before passing to MediaRecorder (do not use multiple recorders). For long recordings, drift can still appear; the fix is server-side re-mux with ffmpeg's -async 1 flag.

FAQ

Can I record getDisplayMedia output without sending it over WebRTC?

Yes. MediaRecorder works directly on a local MediaStream. WebRTC peer connections are needed only when you want to stream to another peer or to a server in real time. Pure local capture-and-save uses MediaRecorder alone.

What is the maximum recording length?

There is no hard browser limit. Practical limits come from memory (if you accumulate Blobs) and from MediaRecorder codec stability over long runs. Chrome reliably records 4+ hour sessions; longer than that occasionally hits encoder issues. Stream chunks to disk or server to avoid memory pressure.

How does WebRTC compare to native screen recording?

WebRTC runs in browsers, no install required, works on any OS. Native recorders like QuickTime, OBS, and Screenify Studio get higher quality, lower CPU, system audio access on macOS, and more editing features. WebRTC is right for in-app recording features. Native is right for standalone tools. For an AI-feature comparison across both categories, see our AI screen recording tools 2026 roundup.

Can I capture a single browser tab without showing the picker?

No. The picker is browser-enforced. The closest workaround is displaySurface: 'browser', which limits the picker to tabs only, but the user still has to choose which tab. There is no programmatic tab selection.

Is WebRTC the right choice for a Loom alternative?

Often yes. For an MVP, getDisplayMedia plus MediaRecorder plus a simple upload endpoint is enough to ship a v1. As the product grows, you may want native desktop apps (for system-audio capture and better quality), but the web version makes onboarding frictionless. Loom itself uses native desktop apps for its primary recording flow and falls back to WebRTC for browser-only sessions.

What about recording video calls in 1-on-1 conversations?

For a 1-on-1 call, you can record both peers client-side (each peer records their own outgoing stream) or server-side via an SFU acting as a recording bot. Client-side is simpler and free; server-side is more reliable and lets you produce a single composite file. Most production tools use server-side.

Are there any patent or licensing concerns?

H.264 and the VP9 / AV1 codecs all have patent licensing arrangements through MPEG-LA, AOM, or similar bodies. Browsers handle the licensing at the OS level — your app does not need a license to use them in browser. If you re-encode server-side with ffmpeg, you may need to be aware of x264 commercial-use terms in some jurisdictions. AV1 is patent-pool-free under the AOM agreement, which is one reason it is gaining adoption in 2026.

How does ChatGPT Plus or other AI services fit into this?

Increasingly, recording products integrate AI summarization on top of the captured WebRTC stream. ChatGPT Plus at $20 per month is a common starting point for individual creators wiring up post-recording AI workflows; teams often graduate to API access once volume justifies it. The AI runs after capture, on the recorded file, so it does not change the recording pipeline directly.

WebRTC Screen Share Recording (Developer Guide)

Try Screenify Studio