Troubleshooting Guide: Why Your FLV Stream Player Won’t Play

How to Build a Custom FLV Stream Player (Step-by-Step)Flash Video (FLV) remains in use in legacy systems and some niche streaming workflows. This guide walks through building a custom FLV stream player from scratch: design choices, key components, decoding and playback, networking for streaming, and a simple example implementation. It assumes familiarity with programming (C/C++, JavaScript, or similar), basic multimedia concepts, and access to development tools.

Overview and key considerations

Before coding, decide on these fundamentals:

Purpose: playback on desktop, web, embedded device, or mobile.
Performance vs compatibility: hardware-accelerated decoding (faster) or software decoding (wider portability).
Licensing and codecs: FLV commonly wraps H.263, Sorenson Spark, VP6, or H.264 video and MP3/ AAC audio — ensure you have rights and the proper decoders.
Latency requirements: live streaming needs low-latency buffering and fast reconnect logic; VOD can afford larger buffers.
Target platform tools: desktop apps can use FFmpeg/libav, mobile can use platform decoders (Android MediaCodec, iOS VideoToolbox), web can use WASM builds.

If you only need broad compatibility with minimal code, use an existing library (FFmpeg/libav, GStreamer, libVLC). Building from scratch is educational or necessary for tight customization/size constraints.

Architecture: components and data flow

A basic FLV stream player contains these components:

Network input (file or stream, e.g., HTTP/RTMP)
FLV demuxer (parses FLV container, extracts audio/video packets and metadata)
Packet queueing and buffering (separate audio/video queues, jitter/latency control)
Decoders (audio and video codecs)
Synchronization and clock (A/V sync, PTS/DTS handling)
Renderers (video output to screen, audio to sound device)
Control UI and event handling (play/pause/seek/reconnect/errors)

Data flow: Network -> FLV demuxer -> packet queues -> decoders -> sync -> renderers.

FLV container basics

FLV structure in brief:

Header: signature “FLV”, version, flags (audio/video), header size.
Tag stream: sequence of tags; each tag has TagType (8=audio, 9=video, 18=script/data), DataSize, Timestamp, StreamID, then Data.
Script tags typically carry metadata (duration, width, height, codecs).
Video tags contain codec ID (e.g., Sorenson, VP6, AVC/H.264) and frame type (key/inter). For H.264 in FLV, video data uses AVC packet types with extra NALU size fields.
Audio tags include codec ID (MP3/ADPCM/AAC) and raw payload (AAC often in ADTS/ASC formats or raw AAC frames).

Understanding timestamps (32-bit with extended timestamp handling) and tag boundaries is critical for sync and seeking.

Demuxing FLV: parsing tags

Key steps for a demuxer:

Read and validate FLV header (first 9 bytes, then PreviousTagSize0).
Loop: read PreviousTagSize (4 bytes), then TagHeader (TagType 1 byte, DataSize 3 bytes, Timestamp 3 bytes + TimestampExtended 1 byte, StreamID 3 bytes), then read DataSize bytes as payload.
Dispatch payload by TagType:
- Script/Data (18): parse AMF0/AMF3 to extract metadata (e.g., duration, width, height, codec info).
- Audio (8): parse first byte(s) for codec, sample rate, sample size, channel; then extract AAC/MP3 frames.
- Video (9): parse first byte for FrameType & CodecID; for AVC/H.264, read AVCPacketType and composition time then NALU lengths + NALUs.

Useful tips:

Implement a robust byte buffer with incremental parsing to support streaming input.
Handle partial reads and resume parsing when more data arrives.
Validate timestamps and detect discontinuities for live streams.

Choosing decoders

Options:

FFmpeg/libav: supports most FLV codecs; easiest route — use avcodec for decoding and avformat for demuxing if you accept a full-featured dependency.
GStreamer: modular, good for pipelines and platforms.
Platform decoders: Android MediaCodec, iOS VideoToolbox for hardware acceleration.
WASM ports: compile FFmpeg to WebAssembly for browser playback.
Implementing codecs yourself is complex; avoid unless you need a tiny footprint and only one simple codec (e.g., MP3).

For a custom player, you might implement your own FLV demuxer and hand decoded packets to FFmpeg decoders or platform decoders.

Buffering, jitter, and synchronization

Maintain separate queues for audio and video packets.
Use audio clock as master (most common) because audio hardware/drift tolerance is stricter. For muted streams, video can be master.
Convert timestamps to a unified clock (seconds or milliseconds). Use PTS (presentation timestamp) for rendering time.
Buffer strategy:
- VOD: buffer enough to prevent stalls (e.g., 1–3 seconds).
- Live: keep a small buffer (100–500 ms) to reduce latency.
Handle network jitter by dropping or duplicating frames if necessary. For H.264, drop non-keyframes when seeking or recovering.

Implementing renderers

Video renderer:

For desktop/mobile, upload decoded frames (YUV or RGB) to GPU textures and draw with shaders. Use double buffering to avoid tearing.
For browser (WASM), use WebGL or WebCodecs if available.
Convert color spaces (e.g., YUV420P -> RGB) using shaders for speed.

Audio renderer:

Feed decoded PCM to audio output APIs: ALSA/PulseAudio/CoreAudio/ WASM WebAudio/Android AudioTrack.
Use ring buffers and audio callbacks to keep steady playback.

Networking: streaming protocols

Sources for FLV:

HTTP progressive download (file or chunked responses).
HTTP Live Streams (not native FLV, but some servers stream FLV over HTTP).
RTMP (real-time messaging protocol) often carries FLV payloads — requires RTMP client implementation or library.
WebSockets or custom TCP/UDP transports carry FLV tagged streams.

For HTTP:

Use range requests for seeking (if server supports).
Handle Content-Length unknown (chunked) for live.

For RTMP:

Implement RTMP handshake, chunking, and message parsing OR use librtmp/rtmpdump libraries.

For unreliable networks:

Implement reconnect with exponential backoff.
Resume from last processed timestamp if server supports seek/resume.

Example: minimal player design (high-level)

We’ll outline a minimal native player using a custom FLV demuxer + FFmpeg decoders + SDL2 for audio/video output (C pseudo-steps):

Open network/file and create an incremental read buffer.
Start demuxer thread:
- Parse FLV tags, push audio/video packets onto respective thread-safe queues with their timestamps.
- Parse metadata and send to main thread.
Start decoder threads:
- Audio decoder: pop audio packets, decode using avcodec_send_packet/receive_frame, enqueue decoded PCM frames to audio renderer.
- Video decoder: pop video packets, decode frames, enqueue decoded frames to video renderer.
Start renderer:
- Audio: SDL audio callback pulls PCM from ring buffer.
- Video: main loop pops frames, calculates sleep based on audio clock and frame PTS, renders via SDL texture.
Control UI: handles play/pause/seek by signaling threads and flushing queues/decoders.

This architecture separates concerns and improves responsiveness.

Code example: demuxing FLV tags (JavaScript, simplified)

Note: This is illustrative; production code needs error handling, partial reads, and codec handing.

// Simple FLV tag parser for streamed ArrayBuffer chunks class FlvParser {   constructor() {     this.buffer = new Uint8Array(0);     this.offset = 0;     this.onTag = null; // callback(tagType, timestamp, data)   }   push(chunk) {     // append new data     const newBuf = new Uint8Array(this.buffer.length + chunk.byteLength);     newBuf.set(this.buffer);     newBuf.set(new Uint8Array(chunk), this.buffer.length);     this.buffer = newBuf;     this._parse();   }   _readUint24(off) {     return (this.buffer[off] << 16) | (this.buffer[off+1] << 8) | this.buffer[off+2];   }   _parse() {     let i = 0;     // need at least FLV header on first parse     if (!this.headerRead) {       if (this.buffer.length < 9) return;       if (String.fromCharCode(...this.buffer.slice(0,3)) !== 'FLV') {         throw new Error('Not FLV');       }       this.headerRead = true;       i = 9; // skip header     }     while (true) {       if (this.buffer.length < i + 4) break; // need PrevTagSize       // prevTagSize = readUint32BE(this.buffer, i); i += 4;       i += 4;       if (this.buffer.length < i + 11) { i -= 4; break; } // need full tag header       const tagType = this.buffer[i];       const dataSize = this._readUint24(i+1);       const timestamp = this._readUint24(i+4) | (this.buffer[i+7] << 24);       // streamID = readUint24(i+8);       i += 11;       if (this.buffer.length < i + dataSize) { i -= 11; break; }       const data = this.buffer.slice(i, i + dataSize);       if (this.onTag) this.onTag(tagType, timestamp, data);       i += dataSize;     }     // keep remaining bytes     this.buffer = this.buffer.slice(i);   } }

Handling H.264 inside FLV

H.264 is common in modern FLV. Key points:

FLV video payload for AVC/H.264 includes:
- 1 byte: FrameType(4 bits) | CodecID(4 bits) where CodecID==7 indicates AVC.
- 1 byte: AVCPacketType (0=config, 1=NALU, 2=end)
- 3 bytes: CompositionTime (signed)
- For NALU packets: sequence of [4-byte NALU length][NALU bytes].
On receiving AVC sequence header (AVCPacketType==0), parse the AVCDecoderConfigurationRecord to extract SPS/PPS (needed to configure H.264 decoder).
Feed raw NALUs to decoder; if decoder expects Annex B format (start codes), you may need to convert length-prefixed NALUs to start-code prefixed NALUs by inserting 0x00000001 before each NALU.

Seeking and random access

FLV container itself supports seeking if you have an index or server supports byte-range requests.
Script metadata sometimes contains “keyframes” table with timestamps and filepositions — parse it to implement accurate seeking.
For live streams, seeking may be unsupported — implement rewind/seek UI accordingly.

Error handling and robustness

Handle partial tags and resume on next data chunk.
Validate timestamps to detect backwards jumps or corrupt data.
When decoder errors occur, flush decoder and resync on the next keyframe.
For live network glitches, attempt reconnect and resume from last timestamp if supported.

Performance tips

Use hardware decoders where possible.
Perform color conversion on GPU via shaders.
Avoid copying frames: use zero-copy APIs (e.g., media codec direct rendering to texture).
Tune thread priorities: decoding and audio callback threads are higher priority.
Preallocate buffers to avoid frequent GC/allocations (important in JS/WASM).

Testing and tooling

Test with a variety of FLV files: H.264+AAC, VP6+MP3, legacy Sorenson.
Use FFprobe/FFmpeg to inspect FLV files: codecs, timestamps, keyframe positions.
Use network simulation tools (tc/netem, Browser devtools) to test jitter, packet loss, and latency.
Use logs and verbose decoder output for diagnosing issues.

Security considerations

Validate incoming data lengths and guard against oversized allocations to prevent DoS.
Be careful when handling AMF data (script tags) — avoid executing untrusted code.
Sanitize metadata and user-facing strings before rendering.

Summary checklist (practical steps)

Choose whether to use libraries (FFmpeg/GStreamer) or custom demuxer + decoders.
Implement or reuse a robust FLV demuxer.
Extract and parse metadata, SPS/PPS for H.264.
Decode audio/video with suitable decoders (hardware/software).
Implement audio/video synchronization and buffering policies.
Render video on GPU and audio to the sound device.
Implement network resilience (reconnect, buffering, seek support).
Test across codecs, players, and network conditions.

Building a custom FLV stream player is a multi-disciplinary task touching networking, systems programming, multimedia codecs, real-time synchronization, and UI. Start small: get a demuxer to print tags and timestamps, then wire in decoders and renderers incrementally.