HTTP Live Streaming Overview
With iPhone OS version 3.0 or later, iPhone is capable of receiving streaming audio and video over HTTP from an ordinary web server. Because the system uses HTTP, it is supported by nearly all edge servers, media distributors, caching systems, routers, and firewalls. To support content providers who wish to protect their media, the streaming system provides for media encryption and user authentication over HTTPS.
Note: Many existing streaming services require specialized servers to distribute content to end users. It requires specialized skills to set up and maintain these servers, and in a large-scale deployment these servers can be costly. Apple has designed a system that avoids this by using standard HTTP to deliver the streams.
Architecture
Conceptually, HTTP Live Streaming consists of three parts: the server component, the distribution component, and the client software.
The server component is responsible for taking input streams of media and encoding them digitally, encapsulating them in a format suitable for delivery, and preparing the encapsulated media for distribution.
The distribution component consists of standard web servers. They are responsible for accepting client requests and delivering prepared media and associated resources to the client. HTTP Live Streaming is designed to work seamlessly in conjunction with media distribution networks for large scale operations.
The client software is responsible for determining the appropriate media to request, downloading those resources, and then reassembling them so that the media can be presented to the user in a continuous stream.
iPhone includes built-in client software: the media player, which is automatically launched when Safari encounters an <OBJECT> or <VIDEO> tag with a URL whose MIME type is one that the media player supports. The media player can also be launched from custom iPhone applications using the media player framework.
In a typical configuration, a hardware encoder takes audio-video input and turns it into an MPEG-2 transport stream, which is then broken into a series of short media files by a software stream segmenter. The segmenter also creates and maintains an index file containing a list of the media files. The URL of the index file is published on the web server, which responds to file requests in the usual way. The client software reads the index, then requests the listed media files in order and displays them without any pauses or gaps between segments.
An example of a simple HTTP streaming configuration is shown in Figure 1-1.
As you can see, input can be live or from a prerecorded source. It is encoded into a stream, which is then broken into segments and saved as a series of one or more files. An index file lists the files that contain segments of the stream. The URL of the index file is accessed by the clients, which then request the indexed files in sequence.
Server Requirements
The server requires a media encoder, which can be off-the-shelf hardware, and a way to break the encoded media into segments and save them as files, which can be software such as the media stream segmenter provided by Apple (available in beta for download from the Apple Developer Connection member download site at https://connect.apple.com/cgi-bin/WebObjects/MemberSite.woa/wa/getSoftware?bundleID=20333).
Media Encoder
The media encoder takes a real-time signal from an audio-video device, encodes the media, and encapsulates it for delivery. Currently, the supported format is MPEG-2 transport streams or program streams containing H.264 video and AAC audio (HE-AAC or AAC-LC). Audio-only streams can alternatively consist of MPEG-2 elementary streams, HE-AAC or AAC-LC files with ADTS headers, or MP3 files.
Note: The protocol specification is capable of accommodating other formats, but only MPEG-2 streams (with H.264 video and AAC audio), and MP3 audio files are supported at this time.
The encoder delivers an MPEG-2 transport stream over the local network to the stream segmenter.
Stream Segmenter
The stream segmenter is a process—typically software—that reads the transport stream from the local network and divides it into a series of small media files of equal duration. Even though each segment is in a separate file, video files are made from a continuous stream which can be reconstructed seamlessly (audio broadcasts may be made up of discrete MP3 files).
The segmenter also creates an index file containing references to the individual media files. Each time the segmenter completes a new media file, the index file is updated. The index is used to track the availability and location of the media files. The segmenter may also encrypt each media segment and create a key file as part of the process.
Media segments are saved as .ts files (MPEG-2 streams) and index files are saved as .m3u8 files, an extension of the .m3u format used for MP3 playlists.
Note: Because the index file format is an extension of the .m3u file format, and because the system also supports .mp3 audio media files, the client software may also be compatible with typical MP3 playlists used for streaming Internet radio.
Here is a very simple example of an .m3u8 file a segmenter might produce if the entire stream were contained in three unencrypted 10-second media files:
#EXTM3U |
#EXT-X-TARGETDURATION:10 |
#EXTINF:10, |
http://media.example.com/segment1.ts |
#EXTINF:10, |
http://media.example.com/segment2.ts |
#EXTINF:10, |
http://media.example.com/segment3.ts |
#EXT-X-ENDLIST |
The index file may also contain URLs for encryption key files or alternate index files for different bandwidths. For details of the index file format, see the HTTP Live Streaming Protocol Specification (available for download from the Apple Developer Connection website).
Distribution Requirements
The distribution system is a web server or a web caching system that delivers the media files and index files to the client over HTTP. No custom server modules are required to deliver the content, and typically very little configuration is needed on the web server.
Necessary configuration is typically limited to specifying MIME-type associations for .m3u8 files and .ts files.
File extension | MIME type |
.m3u8 | application/x-mpegURL |
.ts | video/MP2T |
Tuning time-to-live (TTL) values for .m3u8 files may also be necessary to achieve desired caching behavior for downstream web caches, as these files are frequently overwritten, and the latest version should be downloaded for each request.
Client Requirements
The client software begins by fetching the index file, based on a URL identifying the stream. The index file in turn specifies the location of the available media files, decryption keys, and any alternate streams available. For the selected stream, the client downloads each available media file in sequence. Each file contains a consecutive segment of the stream. Once it has a sufficient amount of data downloaded, the client begins presenting the reassembled stream to the user.
The client is responsible for fetching any decryption keys, authenticating or presenting a user interface to allow authentication, and decrypting media files as needed.
This process continues until the client encounters the #EXT-X-ENDLIST tag in the index file. If no #EXT-X-ENDLIST tag is encountered, the index file is part of an ongoing broadcast. The client loads a new version of the index file when it has two media files remaining in its download queue. The client looks for new media files and encryption keys in the updated index and adds these URLs to its queue.
Session Types
The HTTP Live Streaming protocol supports live broadcast sessions and video on demand (VOD) sessions.
For live sessions, as new media files are created and made available the index file is updated. The new index file includes the new media files; older files are typically removed. The updated index file presents a moving window into a continuous stream. This type of session is suitable for continuous broadcasts.
For VOD sessions, media files are available representing the entire duration of the presentation. The index file is static and contains a complete list of all files created since the beginning of the presentation. This kind of session allows the client full access to the entire program.
It is possible to create a live broadcast of an event that is instantly available for video on demand. To convert a live broadcast to VOD, do not remove the old media files from the server or delete their URLs from the index file, and add an #EXT-X-ENDLIST tag to the index when the broadcast ends. This allows clients to join the broadcast late and still see the entire event. It also allows an event to be archived for rebroadcast with no additional time or effort.
VOD can also be used to deliver “canned” media. It is typically more efficient to deliver such media as a single file using QuickTime or MPEG-4 format, but HTTP streaming has the advantage of allowing for media encryption and supporting dynamic switching between streams of different bit rates in response to changing connection speeds. (QuickTime also supports multiple-data-rate movies, but it does not switch dynamically from one to another in mid-movie in response to changing bandwidth.)
Content Protection
Media files containing stream segments may be individually encrypted. When encryption is employed, references to the corresponding key files appear in the index file so that the client can retrieve the keys for decryption.
When a key file is listed in the index file, the key file contains a cipher key that must be used to decrypt subsequent media files listed in the index file. Currently HTTP Live Streaming supports AES-128 encryption using 16-octet keys. The format of the key file is a packed array of these 16 octets in binary format.
The media stream segmenter available from Apple provides encryption and supports three modes for configuring encryption.
The first mode allows you to specify a path to an existing key file on disk. In this mode the segmenter inserts the URL of the existing key file in the index file. It encrypts all media files using this key.
The second mode instructs the segmenter to generate a random key file, save it in a specified location, and reference it in the index file. All media files are encrypted using this randomly generated key.
The third mode instructs the segmenter to generate a random key file, save it in a specified location, reference it in the index file, and then regenerate and reference a new key file every n files. This mode is referred to as key rotation. Each group of n files is encrypted using a different key.
Note: All media files may be encrypted using the same key, or new keys may be required at intervals. The theoretical limit is one key per media file, but because each media key adds a file request and transfer to the overhead for presenting the following media segments, changing to a new key periodically is less likely to impact system performance than changing keys for each segment.
You can serve key files using either HTTP or HTTPS. You may also choose to protect the delivery of the key files using your own session-based authentication scheme.
Caching and Delivery Protocols
HTTPS is commonly used to deliver key files. It may also be used to deliver the content files and index files, but this is not recommended when scalability is important, since HTTPS requests often bypass web server caches, causing all content requests to be routed through your server and defeating the purpose of edge network distribution systems.
For this very reason, however, it is important to make sure that any content delivery network you use understands that the .m3u8 index files are not to be cached for longer than one media segment duration.
Stream Alternates
Index files may reference alternate streams of content. References can be used to support delivery of multiple streams of the same content with varying quality levels for different bandwidths or devices. The client software uses heuristics to determine appropriate times to switch between the alternates. Currently, these heuristics are based on recent trends in measured network throughput.
The index file points to alternate streams of media by including a specially tagged list of other index files, as illustrated in Figure 1-2
Note that the client may choose to change to an alternate stream at any time, such as when a mobile device enters or leaves a WiFi hotspot.
Frequently Asked Questions
What kinds of encoders are supported?
The protocol specification does not limit the encoder selection. However, the current Apple implementation should interoperate with encoders that produce MPEG-2 transport streams containing H.264 video and AAC audio (HE-AAC or AAC-LC). Encoders that are capable of broadcasting the output stream over UDP should also be compatible with the current implementation of the Apple provided segmenter software.
Apple has tested the current implementation with the following commercial encoders:
Inlet Technologies Spinnaker 7000
Envivio 4Caster C4
What are the specifics of the video and audio formats supported?
Although the protocol specification does not limit the video and audio formats, the current Apple implementation supports the following formats:
Video: H.264 Baseline Level 3.0
Audio:
HE-AAC or AAC-LC up to 48 kHz, stereo audio
MP3 (MPEG-1 Audio Layer 3) up to 48 kHz, stereo audio
What duration should media files be?
The main point to consider is that shorter segments result in more frequent refreshes of the index file, which might create unnecessary network overhead for the client. Longer segments will extend the inherent latency of the broadcast and initial startup time. A duration of 10 seconds of media per file seems to strike a reasonable balance for most broadcast content.
How many files should be in listed in the index file during a continuous, ongoing session?
The client identifies an ongoing session by the lack of an #EXT-X-ENDLIST tag in the index file. The client does not allow the user to seek into the last two files in the index for ongoing broadcasts. When it begins loading the third-to-last file, it requests a new copy of the index. The specification therefore requires at least 3 media files be listed in the index file at all times.
The important point to consider when choosing the optimum number is that the number of files available during a live session constrains the client's behavior when doing play/pause and seeking operations. The longer the list, the longer the client can be paused without losing its place in the broadcast, the further back in the broadcast a new client begins, and the wider the time range within which the client can seek. The trade-off is that a longer index file adds to network overhead—during live broadcasts, the clients are all refreshing the index file regularly, so it does add up, even when the index file is small.
Another point to consider is that clients typically request new copies of the index file at higher rate when the index contains a shorter list of files.
Example: Assuming files are of 10-second duration, maintaining an index with 182 entries allows the client to seek within a 30 minute window. If the user watches the media files in sequence, without seeking ahead, the client requests a new index file every half hour. Similarly, an index with 3 entries gives the user a 10 second window and the client requests a new index file every 10 seconds.
What data rates are supported?
The current implementation has been tested using audio-video streams with data rates as low as 100 Kbps and as high as 1.6 Mbps to iPhone. The data rate that a content provider chooses for a stream is most influenced by the target client platform and the expected network topology. The streaming protocol places no limitations on the data rates that can be used.
Note: If the data rate exceeds the available bandwidth, there is more latency before startup and the client may have to pause to buffer more data periodically. During a broadcast using an index file that provides a moving window into the content, the client will eventually fall behind in such cases, causing one or more segments to be dropped. In the case of VOD, no segments are lost, but inadequate bandwidth does cause slower startup and periodic stalling while data buffers.
What is a .ts file?
A .ts file contains an MPEG-2 transport stream. This file format encapsulates a series of encoded media samples—typically audio and video. The file format supports a variety of compression formats, such as MP3, AAC, H.264, MPEG-2 video, and so on. Not all possible formats are currently supported in the Apple HTTP streaming implementation, however. (For a list of currently supported formats, see “Media Encoder.”)
What is an .m3u8 file?
An .m3u8 file is a extensible playlist file format. It is an m3u playlist containing UTF-8 encoded text. The m3u file format is a de facto standard playlist format suitable for carrying lists of media file URLs. This is the format used as the index file for NRT streaming over HTTP. For details, see HTTP Live Streaming Protocol, available on the Apple Developer Connection website.
How does the client software determine when to switch streams?
The current implementation of the client observes the effective bandwidth while playing a stream. If a higher-quality stream is available and the bandwidth appears sufficient to support it, the client switches to a higher quality. If a lower-quality stream is available and the current bandwidth appears insufficient to support the current stream, the client switches to a lower quality.
Where can I download a copy of the media stream segmenter from Apple?
A beta version can be downloaded from the Apple Developer Connection member download site, at https://connect.apple.com/cgi-bin/WebObjects/MemberSite.woa/wa/getSoftware?bundleID=20333.
What settings are recommended for a typical HTTP stream, with alternates, using the media segmenter from Apple?
ouput .ts files
10s segments
H.264 Baseline 3.0 video
HE-AAC (version 1) stereo audio at 44.1 kHz
Three streams:
Low—96 Kbps video, 64 Kbps audio
Medium—256 Kbps video, 64 Kbps audio
High—800 Kbps video, 64 Kbps audio
Note: For concerts or broadcasts where audio quality is paramount, you might substitute an audio data rate of 128 Kpbs, reducing video bandwidth proportionally.
Examples of typical command line arguments for the media stream segmenter from Apple are included in the read me file downloaded with the segmenter.
What are the hardware requirements or recommendations for servers?
See question #1 for encoder hardware recommendations.
The Apple stream segmenter is capable of running on any Intel-based Mac. We recommend using a Mac with two Ethernet network interfaces, such as a Mac Pro or an XServe. One network interface can be used to obtain the encoded stream from the local network, while the second network interface can provide access to a wider network.
Does the Apple implementation of HTTP Live Streaming support DRM?
No. However, media can be encrypted and key access can be limited using HTTPS authentication.
What client platforms are supported?
iPhone and iPod touch (requires iPhone OS version 3.0 or later).
Is the protocol specification available?
Yes, from the Apple Developer Connection website, at http://developer.apple.com/iphone/prerelease/library/documentation/NetworkingInternet/Conceptual/HTTPLiveStreaming/index.html.
Does the client cache content?
The index file can contain an instruction to the client that content should not be cached. Otherwise, the client may cache data for performance optimization when seeking within the media.
Is this a real-time delivery system?
No. It has inherent latency corresponding to the size and duration of the media files containing stream segments. At least one segment must fully download before it can be viewed by the client, and two may be required to ensure seamless transitions between segments. In addition, the encoder and segmenter must create a file from the input; the duration of this file is the minimum latency before media is available for download. Typical latency with recommended settings is in the neighborhood of 30 seconds.
What is the latency?
Approximately 30 seconds, with recommended settings. See question #15.
Do I need to use a hardware encoder?
No. Using the protocol specification, it is possible to implement a software encoder.
What advantages does this approach have over RTP/RTSP?
HTTP is less likely to be disallowed by routers, NAT, or firewall settings. No ports need to be opened that are commonly closed by default. Content is therefore more likely to get through to the client in more locations and without special settings. HTTP is also supported by more content-distribution networks, which can affect cost in large distribution models. In general, more available hardware and software works unmodified and as intended with HTTP than with RTP/RTSP. Expertise in customizing HTTP content delivery using tools such as PHP is also more widespread.
Where can I get help or advice on setting up an HTTP audio/video server?
You can visit the Apple Developer Forum at http://devforums.apple.com/.







