Microsoft Corporation
June 2000
Summary: This paper discusses the multimedia streaming capabilities using Windows Media™ components included in Microsoft® Windows® CE DirectX Platform Adaptation Kit 1.2 (or DXPAK), and its differences from other versions of Microsoft Windows. (14 printed pages)
Contents
Overview
Technical Fundamentals
Windows Media Player Control
Windows Media Technologies
Windows Media Formats
Windows Media Features
Windows Media Protocols
Windows Media Codecs
DirectShow
Filter Graph Manager
Filters
For More Information
Overview
The digital revolution has taken the consumer electronics space by storm. Digital cable set top boxes offering hundreds of channels are quickly replacing older, analog cable boxes. Portable digital audio players offer multiple hours of music playback at a much higher sound quality and fraction of the size of a portable tape player. The overall change in consumer electronics devices from analog to digital is both a radically new phenomenon and a natural evolution. Because of this evolution, it is now generally accepted that the PC will no longer be the only source for digital multimedia.
Many of these new consumer devices entertain consumers by playing digital music, movies, TV, or other multimedia content. The amount of data needed to store digital content, even when compressed using the best available coding algorithms such as Microsoft Windows Media™ Audio, is very large. For example, when compressed at 120 Kbits/sec, digital audio requires 3.6 MB to store a four-minute song, while to store a two-hour movie, digital video at 300 Kbits/sec uses 270 MB, and at 4 Mbits/sec would require 3.6 GB!
Large media files can be managed by either:
- Downloading the entire file into local memory and then processing a sequence of stored data blocks (local streaming) or processing the entire stored file (non-streaming)
—or— - Downloading and processing a sequence of individual data blocks (network streaming).
Network streaming can also be used for processing media data created on the server in real-time and never stored as a single file.
Sending the next relatively small amount of data needed for immediate processing and playback (and not the complete data set) works because multimedia content is a series of digital data without strong long-term temporal coupling. In other words, a block of sound values in a song or pixels in a video frame can be processed and displayed independently, at least when they are separated by sufficient time. This allows a stream of multimedia data to be broken up into temporal groups of independent data. These data are then encoded, transmitted, and played back in temporal order independently of either proceeding or succeeding group of data that will be displayed at different times. This method of sending blocks of time-ordered, temporally decoupled data to the client device is called streaming. The client device needs only to buffer enough packets of data to allow for server, network, or client side interruptions or irregularities in the creation, transmission, and time correct display of the streamed data.
How does Windows CE fit into this picture? By providing Windows Media components, Windows CE DirectX Platform Adaptation Kit 1.2 (or DXPAK) enables many of the rich multimedia playback and streaming capabilities found on PCs, but does so with smaller, more configurable components. These components run on many of the high-performance CPUs supported by Windows CE (x86, MIPS R4300 and compatible, and SH4, available now with DXPAK 1.1; ARM, StrongARM, and integer MIPS planned for DXPAK 1.2). In addition, the modularity of Windows Media components for Windows CE gives you flexibility in choosing which components your platform uses. When building an operating system image for your hardware using Platform Builder, you can decide whether you want a particular DirectX or user interface component, communications protocol, or file system. This kind of flexibility allows you to ship only those technologies you are actually using on your platform, saving space and reducing complexity.
Windows CE 3.0 with the DirectX Platform Adaptation Kit 1.2 (or DXPAK) provide a complete solution for developing the next “killer” consumer appliance or application. It is a robust, powerful real-time operating system that now provides a rich set of components for enabling digital multimedia devices.
Technical Fundamentals
The Microsoft DirectShow® portion of Microsoft DirectX® provides the foundation for all multimedia services on Windows CE. It is possible provide a rich multimedia application using solely DirectShow, and in fact many companies are doing just that, but it is not the only way to proceed. A communication structure has been built on top of DirectShow to make the application developer's job easier. How everything fits together is illustrated in the following diagram:
Figure 1. Communication structure based on DirectShow
The user sees the top layer of this diagram: the web browser or other application. The application either has the Window Media Player (WMP) control embedded within it, or communicates via COM with the DirectShow interfaces. In either case, DirectShow manages the flow of data from the source to the hardware. The application developer is most concerned with the second and third layers of information flow. The driver developer is most concerned with the bottom layers. What follows is a look at the sections the application developer needs to understand: the WMP control, WMT, and DirectShow.
Windows Media Player Control
Recognizing the importance of multimedia to web content, Microsoft created the Windows Media Player (WMP) control. This technology enables the Windows Media Player to exist as a Microsoft ActiveX control inside a web page along with other content. The WMP control is a versatile tool for presenting local and streaming multimedia files. It supports playback of nearly all major media file formats, including Windows Media format files (ASF, ASX, WMA, WMX, WMV, WVX, WM, and WMX files), Motion Pictures Experts Group formats (MPG and MPE files), and audio formats such as MP3, MIDI, WAV, and AIFF, and multimedia format AVI files. All these file formats can be streamed from locally stored files using just the WMP control, and, when combined with the Microsoft Windows Media Technologies, streaming over networks is supported. DirectX Platform Adaptation Kit 1.2 (or DXPAK) supports the Windows Media Player 6.4 version of the controls.
Placing the WMP Control in a Web Page
The OBJECT tag is used to embed ActiveX objects into a Web page. The following example shows how to use the OBJECT tag to insert the Windows Media Player (WMP) control.
<OBJECT ID="MediaPlayer" CLASSID="CLSID:22d6f312-b0f6-11d0-94ab-0080c74c7e95" TYPE="application/x-oleobject" WIDTH="320" HEIGHT="240" STYLE="position:absolute; left:50px; top:50px;" > <PARAM NAME="FileName" VALUE="http://example.microsoft.com/media/sample.asf"> <PARAM NAME="ShowControls" VALUE="1"> <PARAM NAME="ShowStatusBar" VALUE="1"> </OBJECT>
The ID attribute of the OBJECT tag specifies a name for the WMP object, for later use in scripting. The CLASSID attribute is required for Internet Explorer to create the object on the page, and should always be the string listed in the preceding example. The TYPE attribute indicates to the browser that the type of embedded OBJECT is an ActiveX object. The optional WIDTH and HEIGHT parameters set the size of the window used for the WMP object. The STYLE parameters enable you to position the object window anywhere on the Web page.
All WMP functionality is exposed to the Microsoft JScript web scripting language. There is currently no support for any other scripting language.
Note that the CODEBASE attribute is conspicuously absent from the OBJECT tag. The CODEBASE attribute contains a Uniform Resource Locator (URL) pointing to a location where the WMP control can be downloaded if it is unavailable on a user's system. This functionality is not supported on Windows CE. The WMP control for Windows CE must be included in your OS image if it is going to be used by your application.
The PARAM tags have two attributes: The first is the name of the property being set, and the second specifies the value of that property. The PARAM tags initialize the WMP object with specified data when it is created. In this example, the first PARAM tag sets the FileName property to the URL http://example.microsoft.com/media/sample.asx, defining which file the WMP control will play. The value could also be a path to a local media file, such as C:\sample.asx. The remaining PARAM tags specify that the playback controls and status bar should both be visible for this object. Any of these elements could be hidden instead, enabling you to customize the appearance of the WMP control and user interface items.
After you have created the object and specified a valid file name, you should see the WMP control on your Web page.
There are three other HTML tags that are used to include audio and video in Web pages—the embed object tag <EMBED>, the image tag <IMG>, and the anchor tag <A>. The <EMBED> tag was created by Netscape to support browser plug-ins. Netscape does not support embedding objects with the <OBJECT> tag, so this tag should be used if you are trying to maintain compatibility with Netscape browsers. Although the <IMG> tag can be used to include video clips in a Web page, its use is limited to certain media types (MPEG, QT, and AVI files.) The <IMG> tag does not provide access to any of the WMP control parameters and does not work at all with audio media. In order to use the full functionality of the WMP control, the use of the <IMG> tag is not recommended.
The <A> tag can also be used to create links to media. Media is played either by a helper application determined by the media type or the browser. The <A> tag cannot be used to embed media in a Web page as with the <OBJECT> or <EMBED> tags. One key point to keep in mind is that the WMP control is distinct from the Windows Media Player application. As a result, not all Internet media content can be handled by the WMP control. An example of this is content accessed with the <A> tag. Normally, when a user is browsing the Internet with a PC and clicks on a link to play a media file, the Windows Media Player application appears on the desktop and control over the media content is passed from the browser to it. If you are trying to build a device or application that supports all existing Internet media content, you must make sure the control recognizes and correctly handles all the different ways to deliver content over the Internet. The WMPHLPR sample included with DXPAK 1.2 provides an example of how to enable the WMP control to playback media accessed via the <A> tag. Using this sample, when a user clicks on a media file link, the browser navigates to a page that hosts the control. The media file is passed to the control and playback begins from within the browser. This behavior is quite useful for set-top boxes, Internet portal devices, or any other device with either limited memory or a desire to run completely within a single window.
The Windows Media Player control for Windows CE contains a few differences from the version of the control that is available for x86-based PCs. The driving force behind including just a subset of the desktop WMP control's features is to provide a smaller, robust control that encapsulates the key features of the WMP control required by embedded devices. In addition, the WMP control for Windows CE also supports a subset of the properties, methods, and events from the desktop control. Some of these properties, methods, and events have not been included in the CE version of the WMP control, as they have no practical value for non-PC devices such as set-top boxes and audio jukeboxes. There is no support for backward compatibility with the Microsoft NetShow® player control, since all of the NetShow functionality has been encapsulated in the WMP control. Certain UI elements such as the context menu, Display panel, Closed Captioning panel, and Go To Bar are not supported, but can be authored for a Web page with scripting.
The Microsoft PowerPoint® (PPT) streaming or hotspots ASF authoring features are not supported. With URL flipping, it is still possible to have the WMP control playing media in one frame while displaying slides or other graphics in another frame. Clickable hotspots can turn images or video clips into hyperlinks or script locations, and can also be implemented with the proper usage of URL scripting commands embedded at certain times in an ASF file.
Windows Media Technologies
Window Media Technologies (WMT) is a set of COM interfaces and codecs that support a broad range of server and client applications that stream audio, video, and script commands as a continuous flow of data.
Today, Windows CE 3.0 with the DirectX Platform Adaptation Kit 1.2 (or DXPAK) provides Windows Media Technologies version 4.1 components. These components support client playback using advanced Windows Media formats and world-class codecs, such as Windows Media Audio, Microsoft MPEG-4 video, and Sipro ACELP.net low bit-rate speech.
What follows is a look at the formats, features, protocols, and codecs supported by WMT on Windows CE. Occasionally, the Windows CE implementation of WMT differs from other versions of Windows. When this is the case, the differences and their workarounds are discussed.
Windows Media Formats
To store and stream data, WMT uses the Advanced Streaming Format (ASF). ASF is an application-level multimedia transmission file format (as opposed to a wire or transmission control format) for arranging and organizing synchronized multimedia data. ASF supports media data delivery over a wide variety of networks, network bandwidths, and protocols. It is optimized for streaming multimedia packets over both low bit-rate and broadband networks.
Windows CE also supports the Advanced Stream Redirector v3 (ASX) and Windows Media Station (NSC) metafiles. The ASX metafile provides mechanisms by which a client can support hyperlinks to streams, support specification of multiple pieces of source content, and the protocol rollover rules the client will use to process them, as well as support for media playlists.
The Microsoft Windows Media Station metafile serves to describe a particular channel to an ASF client wishing to access that channel. The model for access to a channel is similar to a television accessing a broadcast channel. This metafile is used for multicasting support.
Windows Media Features
Windows CE provides WMT client DirectShow filters that allow playback of ASF streams sent using UDP, TCP, and HTTP protocols (as described in the next section, Windows Media Protocols. Windows CE WMT supports smart streaming using a multi-data rate encoded ASF file, where multiple streams with different bit rates are created in one ASF file and the client negotiates with the server for the appropriate stream. The server then automatically adjusts the stream depending on playback conditions and can select from multiple video streams based on available network bandwidth.
With smart streaming, the Windows CE WMT client can dynamically thin the stream based on the available bandwidth using an algorithm that adjusts delivery smoothly from full frames down to key-frame only. If necessary, the WMT client can ask the server to send only audio and no video packets. As bandwidth is reduced, audio is always given the highest priority, since it is usually critical to the user experience. As network bandwidth conditions improve, WMT can progressively step the video bit-rate back up to restore the viewing to an optimal level. In addition, the WMT UDP resend capability allows the client, if time is available, to request missing packets from the server. Finally, WMT also provides ASX event-driven stream switching where the client sends ASX control commands to the server.
Windows CE WMT does not support older ASX v2.0 or v1.0 formats. All of the functionality of these earlier versions has been encapsulated into ASX v3.0. In addition, the PREVIEWDURATION, BANNER, or LOGO ASX elements are not supported for Windows CE. Preview mode can be implemented within an application using the WMP control by providing access to playback control via scripting. The functionality of the BANNER and LOGO elements can be implemented using DHTML and scripting.
Windows Media Technologies for Windows CE provides support for Authentication. Authentication involves user validation before any information exchange takes place. When a client initiates a request to the server that has authentication enabled, the server challenges the client to confirm its identity. Typically, this amounts to inspecting the name and password of the user account under various authentication protocols. For any given interaction, both client and server must adhere to one agreed protocol. The WMT supports the following two protocols, HTTP-Basic for Internet applications and NTLM which is suitable for Intranet applications.
On the desktop NTLM uses authentication information established when the user logs on, it requires the client and server to be on the same or trusted domains. Since Windows CE does not allow a user to log in, the WMT pops up a dialog box to extract the authentication information when NTLM authentication is required.
Windows Media Protocols
The following protocols are supported by the WMT: multicasting, local file streaming, HTTP streaming, and MMS streaming.
Multicast enables the client to receive multicast streams. It allows the administrator to send one copy of the content to many users on the network, as long as that network is multicast-enabled. IP Multicast streaming is done through ASF with the Microsoft Windows Media Station Metafile. Networks that are not multicast-enabled and ASF files not being streamed from a Windows Media server are sent through unicast. Unicast means that one stream is sent for every request.
WMT can provide local file streaming for systems with persistent storage. Data is read from persistent storage into a buffer in main memory and rendered. Local file streaming provides lower latency and a significant physical memory savings over reading the entire ASF file from the persistent store into main physical memory before rendering the file.
MMS is Microsoft's proprietary protocol for streaming media. A typical MMS session uses a TCP connection for sending and receiving media control commands, and a UDP or TCP connection for streaming the data. Invoking the MMS protocol using mms:// invokes the protocol rollover mechanism. The client first tries to receive the stream through UDP. If UDP does not work, the stream automatically rolls over to TCP transmission. Finally, if TCP does not work, the client will try to receive the stream through HTTP. MMSU enables the client to receive streams through UDP. It is well suited to audio because it sends packets regardless of connection quality. Therefore, users hear fewer delays or pauses. If time allows, missed packets are requested and resent. MMST enables the client to receive streams through TCP. TCP forms a reliable stream—if packets are lost, the stream stops and lost packets are recovered. Users experience more delays and pauses over a network that is congested when using MMST.
A regular HTTP server can be used to deliver ASF data streams, but there are several reasons to use the Windows Media Server instead. The packets within an ASF data stream must be delivered sequentially, one per network packet, for the full benefit of data streaming to be realized. Only an ASF-compatible server, such as Windows Media Server, will avoid fragmentation by transmitting ASF packets one at a time, encapsulated neatly within individual Internet or other network protocol packets. The error correction, streaming playback, and bit-rate optimization inherent to ASF depend on the client and server not having to figure out where ASF data packets begin and end on the fly. An HTTP server doesn't have this ability because it doesn't recognize the significance of ASF packets; it just shoves data to the client as quickly as possible by filling each network packet with an arbitrary amount of data. Additionally, several features of Windows Media, such as the ability to fast-forward or rewind ASF data streams, are not available on a regular Web server.
Windows Media Codecs
The following table lists the supported codecs that can be contained within an ASF file. Windows CE WMT will only support content that is created with the Windows Media Tools. The Windows Media Encoder uses templates to encode live source or AVI, WAV, or MP3 content into ASF formats with the codecs listed in the table below. The templates also provide the option of using other codecs, but DirectShow for Windows CE only supports WMT codecs. While other codecs that are supported by DirectShow for Windows CE (such as Cinepak or MPEG-1) can be created within an ASF file using other authoring tools, there can be no guarantee as to their streaming performance, and their use is not recommended.
Codec name | Description |
MPEG-4 v3, v2 | MS MPEG-4 video codec; up to 30 fps QCIF (176x144) - CIF (352x288) resolution video at 28.8 kbps – 300 kbps |
WMAudio v2 | New Windows Media audio codec based on non-uniform modulated lapped bioorthogonal transforms (NMLBT) in place of DCT for perceptual coding of both voice and high-fidelity; 8 – 48 kHz stereo at 56 – 128 kbps; near-FM quality at 28.8 kbps and near-CD quality at 64 kbps |
ACELP.net | Sipro ACELP voice codec; speech-quality 8 – 16 kHZ mono at 5 – 16 kbps |
MPEG-1 Layer 3 | Fraunhofer MP3 perceptual audio codec; near-CD quality at 128 kbps |
The componentization of the WMT for Windows CE allows you to build a fully customizable streaming media client that is tailored to your specific streaming environment. The WMT for CE has been fragmented such that you can decide which components to include in your application. Each of the following components can be selected as appropriate:
- MMS streaming
- HTTP streaming
- File streaming
- Broadcast and Multi Bit Rate Streaming
- Windows Media Station support
- ASX support
- Codecs
DirectShow
DirectShow provides the underlying services for playback of multimedia streams from either local files or over a network from a server. Specifically, DirectShow enables playback of video and audio content compressed in various file and streaming formats, including Windows Media, MPEG, Audio-Video Interleaved (AVI), and WAV.
Applications control filter graph activities by communicating with the filter graph manager. You can do this either indirectly by using the Microsoft Windows Media Player control, or directly by calling COM interface methods.
At the heart of the DirectShow services are modular sets of pluggable components called filters that can be arranged depending upon media type into a connected configuration called a filter graph. Filters operate on data streams to read, parse, decode, format, or render them.
Filters are arranged in a configuration called a filter graph, controlled by the Filter Graph Manager (FGM). A DirectShow filter graph (see Figure 2) consists of a directed sequence of filters from source to final renderers, all connected by input and output filter pins. Filter pins negotiate which media types they will support. The FGM controls the multimedia data flow between the graph filters. Because DirectShow has a flexible, re-configurable filter graph architecture, DirectShow can support playback and streaming of many media types using the same software components. Developers can also extend DirectShow multimedia support by writing their own filters.
Figure 2. DirectShow Filter Graph
Filter Graph Manager
An application uses the Filter Graph Manager (FGM) interfaces to create, connect, and control filter graphs. Filters use the FGM interfaces to post event notifications and to force reconnection of the filter pins as needed. In particular, the IGraphBuilder interface allows applications to call the filter graph manager to attempt to build a complete filter graph, or a partial filter graph if given only partial information such as the name of a file or the interfaces of two separate pins. The filter mapper looks up the available filters in the registry to configure the filter graph in a meaningful way. The IGraphBuilder interface creates a filter graph, adds filters to or removes filters from a filter graph, enumerates all the filters in a filter graph, and forces connections when adding a filter. To cause the appropriate filter graph to be constructed, an application just needs to create an instance of the IGraphBuilder interface and then call its RenderFile method.
In addition, the FGM exposes media control and media positioning interfaces to the application. The media control interface, IMediaControl, allows the application to issue commands to run, pause, and stop the stream. Playback starts when the Run method is invoked. The positioning interface, IMediaSeeking, lets the application specify which section of the stream to play.
Internally, the FGM will use the individual filter's well-known BaseFilter interface to locate and enumerate a filter's input and output pins.
Filters
Filters are registered DirectShow classes and perform most media processing tasks. Filter tasks include:
- Source acquisition (e.g., acquire a media stream)
- Parsing (e.g., perform packet reading, splitting, and formatting on the stream)
- Transformation (e.g., decode WMA and MPEG-4 audio and video streams)
- Rendering (e.g., generate audio PCM or video RGB/YUV output at the right time and pass data on to DirectSound and DirectDraw)
Filters use several types of interfaces, such as pins, enumerators, transports, and clock interfaces to perform their tasks. Filters implement and expose numerous interfaces. The FGM uses these interfaces to create, connect, and control the graph. A filter will always implement the IBaseFilter interface that contains methods to:
- Run, stop, and pause filter state
- Retrieve filter and vendor information
- Get and set the reference clock
- Retrieve filter state information
- Enumerate filter pins
- Locate pins when rebuilding a filter graph
Individual filters expose an IBaseFilter interface so that the Filter Graph Manager can issue the run, pause, and stop commands. The Filter Graph Manager is responsible for calling these methods in the correct order on all the filters in the filter graph. Your application should not do this directly.
However, unlike the IBaseFilter interface, only the renderer filter exposes an IMediaSeeking interface. Therefore, the Filter Graph Manager calls only the renderer filter with positioning information. The renderer then passes this position control information upstream through IMediaSeeking interfaces exposed on the pins, which simply pass it on. The positioning of the media stream is actually handled by the output pin on the filter that is able to seek to a particular position, usually a parser filter such as the AVI splitter.
Supported Filters
Windows CE DXPAK 1.2 provides the following DX 6.1 DirectShow filters:
- Source Filters—File Source (Asynchronous and URL)
- Parsers and splitters—MPEG-1 Stream Splitter, AVI Splitter, QuickTime Parser, WAVE Parser, MIDI Parser, ASF Parser, ASX Parser, ASF Splitter
- Decoders—MSAudio Decoder, MPEG-1 Audio Decoder, MPEG-1 Video Decoder, MPEG-4 Video Decoder, AVI Decompressor, ACM Audio Decompressor
- Renderers—Audio Renderer (using WaveOut), DirectSound Audio Renderer, MIDI Audio Renderer, Video Renderer (using either DirectDraw or GDI)
- Miscellaneous filters—Overlay Mixer
In order to support streaming of Windows Media Formats, special ASF/ASX streamer source and WMA codec transform filters are provided. In addition, the Fraunhofer MP3 audio and Sipro ACELP.net speech codecs use the Audio Compression Manager (ACM) wrapper Audio Decompressor transform filter.
DirectShow broadcast technology and DV filters are not included in DXPAK, but are available as part of the Windows CE WebTV Microsoft TV (MSTV) Kit.
Example
DirectShow makes it easy to play or stream multimedia files. Here is a sample code fragment showing how to write a trivial multimedia file player application (note that we have, among other simplifications, suppressed checking the QueryInterface return status).
HRESULT PlayMovie(LPTSTR lpszMovie) { // we will use several DirectShow interfaces IMediaControl *pMC = NULL; IGraphBuilder *pGB = NULL; IMediaEventEx *pME = NULL; long evCode; // something to hold a returned event code // instantiate a filter graph as in-proc server hr = CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC, IID_IGraphBuilder, (void **) &pGB); // we’ll use this interface to build the graph hr = pGB->QueryInterface(IID_MediaControl, (void **) &pMC); // we’ll want to wait for completion of the rendering, so we need a media event interface hr = pMC->QueryInterface(IID_IMediaEventEx, (void **) &pME); // now we’re ready to build the filter graph based on the source file data types hr = pGB->RenderFile(lpszMovie, NULL); // play the source file hr = pMC->Run(); // block application until video rendering operations finish hr = pME->WaitForCompletion(INFINITE, &evCode); // release interfaces }
Several comments are in order. CoCreateInstance instantiates a filter graph object, but no filters, as it does not yet know what media types it needs for playback. It returns the IGraphBuilder interface needed to build the filter graph once the media type is known. A query interface is made to get IMediaControl for running, pausing, and stopping the streaming of media through its filters. Since Windows CE currently supports only in-process COM servers, CLSCTX_INPROC_SERVER is the only valid server context for CoCreateInstance. Trying anything else will return E_NOTIMPL.
IGraphBuilder is used to create a filter graph, add filters to or remove filters from a filter graph, enumerate all the filters in a filter graph, and force connections when adding a filter. We are using its RenderFile method to build the graph. The final graph construction depends upon the video and audio formats contained in source file. Finally, we can play back the file using IMediaController::Run. Since we want the application to wait until the rendering is finished, we have added IMediaEvent::WaitforCompletion.
For More Information
You can find additional information about Windows CE DXPAK at http://www.microsoft.com/presspass/press/2000/Feb00/DxpackPR.asp.