How to Build a Custom FLV Stream Player (Step-by-Step)Flash Video (FLV) remains in use in legacy systems and some niche streaming workflows. This guide walks through building a custom FLV stream player from scratch: design choices, key components, decoding and playback, networking for streaming, and a simple example implementation. It assumes familiarity with programming (C/C++, JavaScript, or similar), basic multimedia concepts, and access to development tools.
Overview and key considerations
Before coding, decide on these fundamentals:
- Purpose: playback on desktop, web, embedded device, or mobile.
- Performance vs compatibility: hardware-accelerated decoding (faster) or software decoding (wider portability).
- Licensing and codecs: FLV commonly wraps H.263, Sorenson Spark, VP6, or H.264 video and MP3/ AAC audio — ensure you have rights and the proper decoders.
- Latency requirements: live streaming needs low-latency buffering and fast reconnect logic; VOD can afford larger buffers.
- Target platform tools: desktop apps can use FFmpeg/libav, mobile can use platform decoders (Android MediaCodec, iOS VideoToolbox), web can use WASM builds.
If you only need broad compatibility with minimal code, use an existing library (FFmpeg/libav, GStreamer, libVLC). Building from scratch is educational or necessary for tight customization/size constraints.
Architecture: components and data flow
A basic FLV stream player contains these components:
- Network input (file or stream, e.g., HTTP/RTMP)
- FLV demuxer (parses FLV container, extracts audio/video packets and metadata)
- Packet queueing and buffering (separate audio/video queues, jitter/latency control)
- Decoders (audio and video codecs)
- Synchronization and clock (A/V sync, PTS/DTS handling)
- Renderers (video output to screen, audio to sound device)
- Control UI and event handling (play/pause/seek/reconnect/errors)
Data flow: Network -> FLV demuxer -> packet queues -> decoders -> sync -> renderers.
FLV container basics
FLV structure in brief:
- Header: signature “FLV”, version, flags (audio/video), header size.
- Tag stream: sequence of tags; each tag has TagType (8=audio, 9=video, 18=script/data), DataSize, Timestamp, StreamID, then Data.
- Script tags typically carry metadata (duration, width, height, codecs).
- Video tags contain codec ID (e.g., Sorenson, VP6, AVC/H.264) and frame type (key/inter). For H.264 in FLV, video data uses AVC packet types with extra NALU size fields.
- Audio tags include codec ID (MP3/ADPCM/AAC) and raw payload (AAC often in ADTS/ASC formats or raw AAC frames).
Understanding timestamps (32-bit with extended timestamp handling) and tag boundaries is critical for sync and seeking.
Demuxing FLV: parsing tags
Key steps for a demuxer:
- Read and validate FLV header (first 9 bytes, then PreviousTagSize0).
- Loop: read PreviousTagSize (4 bytes), then TagHeader (TagType 1 byte, DataSize 3 bytes, Timestamp 3 bytes + TimestampExtended 1 byte, StreamID 3 bytes), then read DataSize bytes as payload.
- Dispatch payload by TagType:
- Script/Data (18): parse AMF0/AMF3 to extract metadata (e.g., duration, width, height, codec info).
- Audio (8): parse first byte(s) for codec, sample rate, sample size, channel; then extract AAC/MP3 frames.
- Video (9): parse first byte for FrameType & CodecID; for AVC/H.264, read AVCPacketType and composition time then NALU lengths + NALUs.
Useful tips:
- Implement a robust byte buffer with incremental parsing to support streaming input.
- Handle partial reads and resume parsing when more data arrives.
- Validate timestamps and detect discontinuities for live streams.
Choosing decoders
Options:
- FFmpeg/libav: supports most FLV codecs; easiest route — use avcodec for decoding and avformat for demuxing if you accept a full-featured dependency.
- GStreamer: modular, good for pipelines and platforms.
- Platform decoders: Android MediaCodec, iOS VideoToolbox for hardware acceleration.
- WASM ports: compile FFmpeg to WebAssembly for browser playback.
- Implementing codecs yourself is complex; avoid unless you need a tiny footprint and only one simple codec (e.g., MP3).
For a custom player, you might implement your own FLV demuxer and hand decoded packets to FFmpeg decoders or platform decoders.
Buffering, jitter, and synchronization
- Maintain separate queues for audio and video packets.
- Use audio clock as master (most common) because audio hardware/drift tolerance is stricter. For muted streams, video can be master.
- Convert timestamps to a unified clock (seconds or milliseconds). Use PTS (presentation timestamp) for rendering time.
- Buffer strategy:
- VOD: buffer enough to prevent stalls (e.g., 1–3 seconds).
- Live: keep a small buffer (100–500 ms) to reduce latency.
- Handle network jitter by dropping or duplicating frames if necessary. For H.264, drop non-keyframes when seeking or recovering.
Implementing renderers
Video renderer:
- For desktop/mobile, upload decoded frames (YUV or RGB) to GPU textures and draw with shaders. Use double buffering to avoid tearing.
- For browser (WASM), use WebGL or WebCodecs if available.
- Convert color spaces (e.g., YUV420P -> RGB) using shaders for speed.
Audio renderer:
- Feed decoded PCM to audio output APIs: ALSA/PulseAudio/CoreAudio/ WASM WebAudio/Android AudioTrack.
- Use ring buffers and audio callbacks to keep steady playback.
Networking: streaming protocols
Sources for FLV:
- HTTP progressive download (file or chunked responses).
- HTTP Live Streams (not native FLV, but some servers stream FLV over HTTP).
- RTMP (real-time messaging protocol) often carries FLV payloads — requires RTMP client implementation or library.
- WebSockets or custom TCP/UDP transports carry FLV tagged streams.
For HTTP:
- Use range requests for seeking (if server supports).
- Handle Content-Length unknown (chunked) for live.
For RTMP:
- Implement RTMP handshake, chunking, and message parsing OR use librtmp/rtmpdump libraries.
For unreliable networks:
- Implement reconnect with exponential backoff.
- Resume from last processed timestamp if server supports seek/resume.
Example: minimal player design (high-level)
We’ll outline a minimal native player using a custom FLV demuxer + FFmpeg decoders + SDL2 for audio/video output (C pseudo-steps):
- Open network/file and create an incremental read buffer.
- Start demuxer thread:
- Parse FLV tags, push audio/video packets onto respective thread-safe queues with their timestamps.
- Parse metadata and send to main thread.
- Start decoder threads:
- Audio decoder: pop audio packets, decode using avcodec_send_packet/receive_frame, enqueue decoded PCM frames to audio renderer.
- Video decoder: pop video packets, decode frames, enqueue decoded frames to video renderer.
- Start renderer:
- Audio: SDL audio callback pulls PCM from ring buffer.
- Video: main loop pops frames, calculates sleep based on audio clock and frame PTS, renders via SDL texture.
- Control UI: handles play/pause/seek by signaling threads and flushing queues/decoders.
This architecture separates concerns and improves responsiveness.
Code example: demuxing FLV tags (JavaScript, simplified)
Note: This is illustrative; production code needs error handling, partial reads, and codec handing.
// Simple FLV tag parser for streamed ArrayBuffer chunks class FlvParser { constructor() { this.buffer = new Uint8Array(0); this.offset = 0; this.onTag = null; // callback(tagType, timestamp, data) } push(chunk) { // append new data const newBuf = new Uint8Array(this.buffer.length + chunk.byteLength); newBuf.set(this.buffer); newBuf.set(new Uint8Array(chunk), this.buffer.length); this.buffer = newBuf; this._parse(); } _readUint24(off) { return (this.buffer[off] << 16) | (this.buffer[off+1] << 8) | this.buffer[off+2]; } _parse() { let i = 0; // need at least FLV header on first parse if (!this.headerRead) { if (this.buffer.length < 9) return; if (String.fromCharCode(...this.buffer.slice(0,3)) !== 'FLV') { throw new Error('Not FLV'); } this.headerRead = true; i = 9; // skip header } while (true) { if (this.buffer.length < i + 4) break; // need PrevTagSize // prevTagSize = readUint32BE(this.buffer, i); i += 4; i += 4; if (this.buffer.length < i + 11) { i -= 4; break; } // need full tag header const tagType = this.buffer[i]; const dataSize = this._readUint24(i+1); const timestamp = this._readUint24(i+4) | (this.buffer[i+7] << 24); // streamID = readUint24(i+8); i += 11; if (this.buffer.length < i + dataSize) { i -= 11; break; } const data = this.buffer.slice(i, i + dataSize); if (this.onTag) this.onTag(tagType, timestamp, data); i += dataSize; } // keep remaining bytes this.buffer = this.buffer.slice(i); } }
Handling H.264 inside FLV
H.264 is common in modern FLV. Key points:
- FLV video payload for AVC/H.264 includes:
- 1 byte: FrameType(4 bits) | CodecID(4 bits) where CodecID==7 indicates AVC.
- 1 byte: AVCPacketType (0=config, 1=NALU, 2=end)
- 3 bytes: CompositionTime (signed)
- For NALU packets: sequence of [4-byte NALU length][NALU bytes].
- On receiving AVC sequence header (AVCPacketType==0), parse the AVCDecoderConfigurationRecord to extract SPS/PPS (needed to configure H.264 decoder).
- Feed raw NALUs to decoder; if decoder expects Annex B format (start codes), you may need to convert length-prefixed NALUs to start-code prefixed NALUs by inserting 0x00000001 before each NALU.
Seeking and random access
- FLV container itself supports seeking if you have an index or server supports byte-range requests.
- Script metadata sometimes contains “keyframes” table with timestamps and filepositions — parse it to implement accurate seeking.
- For live streams, seeking may be unsupported — implement rewind/seek UI accordingly.
Error handling and robustness
- Handle partial tags and resume on next data chunk.
- Validate timestamps to detect backwards jumps or corrupt data.
- When decoder errors occur, flush decoder and resync on the next keyframe.
- For live network glitches, attempt reconnect and resume from last timestamp if supported.
Performance tips
- Use hardware decoders where possible.
- Perform color conversion on GPU via shaders.
- Avoid copying frames: use zero-copy APIs (e.g., media codec direct rendering to texture).
- Tune thread priorities: decoding and audio callback threads are higher priority.
- Preallocate buffers to avoid frequent GC/allocations (important in JS/WASM).
Testing and tooling
- Test with a variety of FLV files: H.264+AAC, VP6+MP3, legacy Sorenson.
- Use FFprobe/FFmpeg to inspect FLV files: codecs, timestamps, keyframe positions.
- Use network simulation tools (tc/netem, Browser devtools) to test jitter, packet loss, and latency.
- Use logs and verbose decoder output for diagnosing issues.
Security considerations
- Validate incoming data lengths and guard against oversized allocations to prevent DoS.
- Be careful when handling AMF data (script tags) — avoid executing untrusted code.
- Sanitize metadata and user-facing strings before rendering.
Summary checklist (practical steps)
- Choose whether to use libraries (FFmpeg/GStreamer) or custom demuxer + decoders.
- Implement or reuse a robust FLV demuxer.
- Extract and parse metadata, SPS/PPS for H.264.
- Decode audio/video with suitable decoders (hardware/software).
- Implement audio/video synchronization and buffering policies.
- Render video on GPU and audio to the sound device.
- Implement network resilience (reconnect, buffering, seek support).
- Test across codecs, players, and network conditions.
Building a custom FLV stream player is a multi-disciplinary task touching networking, systems programming, multimedia codecs, real-time synchronization, and UI. Start small: get a demuxer to print tags and timestamps, then wire in decoders and renderers incrementally.
Leave a Reply