Title: A Survey of Works that suggest using
MPEG-4 for delivery of Multimedia over Wireless and Mobile Network

Xuebin Xu

 email: xxu1@kent.edu,

Prepared for Prof. Javed I. Khan
Department of Computer Science, Kent State University
Date: November 2003

Abstract: MPEG 4 is an object-based multimedia standards, which is intends to specify techniques to improve video robustness in error-prone environments, such as mobile network. This paper is to survey the applications where MPEG 4 is used in a real-time wireless/mobile multimedia networking environments, along with other techniques. We classify these applications into several groups and compare their advantages and disadvantages also.

[Keyword]: MPEG 4, mobile networking, QoS, mobile multimedia internet.

Other Survey's on Internetwork-based Applications
Back to Javed I. Khan's Home Page


overview of MPEG-4

Examples of MPEG-4 applications in wireless/mobile multimedia network

Transcoding - a video bitstream requantization and framework for multimedia network

Transcaling - a video coding and framework for wireless IP multimedia services

shaped-based scene composition application on mobile platforms



Research Groups



The ISO MPEG (Moving Picture Expert Group) committee completed the first generation film compression, MPEG-1, in the early 1990s. MPEG-1 was intended for bit-rates up to 1.5mbit/s. Manufacturers tried selling motion pictures pressed on CDs in this format under the name of VCD. But, quality of VCD motion pictures was lower than on regular VCR tapes. The VCD was never a huge commercial success due to conceptual flaws, there was no real demand for such a product. Far more successful was MPEG-1 layer3 developed at the Frauenhofer institute in Germany, a codec for storing music (audio). MPEG-1 layer3 also know as MP3 was, and still is, widespread for digital storage of music [Rickard Neehr]

In the following years, MPEG published MPEG-2 whose bit-rate ranges between 1.5 to 15 mbit/s. So MPEG-2 offers a lot better qualities compared to MPEG-1. MPEG-2 addressed the functions of multiplexing one or more elementary streams of video and audio, as well as other data streams, into single or multiple streams suitable for storage or transmission. But high bit-rate MPEG-2 is very space consuming. A two-hour video DVD quality film requires more than four gigabyte of space. This is no problem on DVDs, but when transferring over computer networks it will become an issue [Rickard Neehr].

In 1993, the MPEG group initiated the new MPEG-4 standards. The goal of MPEG-4 is to develop algorithms and tools for high efficiency coding and representation of audio and video to meet the challenges of video conferencing applications. Those most important additions to the standards are content-based coding (i.e. the ability to represent a scene as a set of audiovisual objects), universal accessibility (which includes video robustness to errors) and good coding efficiency. In the following, I will give more detailed overview about MPEG-4 techniques in the next section.

Besides that, I will review some applications, in which MPEG-4 techniques, along with some other techniques, are used to obtain a content quality of service in unstable and severe wireless/mobile computing environment. The classification among these applications is to be done also. The advantages and disadvantages of each class are attempted to interpret.


Overview of MPEG-4

MPEG-4 is a kind of content-based multimedia standards. A scene of picture in MPEG-4 is decomposed of a set of objects when encoding, illustrated in the following figure 1. In the figure 1, a scene about 


Figure 1
. Decomposition of a video frame.

tennis game is decomposed into two planes: one is the object of the player and the other is the background of the scene. The original scene is multiplexed in the encoding side. After decomposition, these object planes can be coded using some traditional compression algorithms to decrease the bit-rate. Besides these objects planes, there are also some meta-data parts defining the relationships among these objects in the scene. At the decoding side, these compressed object planes will be used to compose the original scene according to the relationships among these objects. The working process is illustrated in the figure 2.  Figure 2 shows a high level view of an MPEG-4 terminal. It explains how those objects are assembled and the result scene is rendered. A set of individually coded audiovisual objects (natural or synthetic) is obtained multiplexed from a storage or transmission medium. They are accompanied with scene description information, which describes how these objects should be combined in space and time in order to form the scene intended by the content creator. The scene description is thus used during composition and rendering, which results in individual frames or audio samples being presented to the user. In addition, the user may have the option to interact with the content, either locally or with the source, using an upstream channel (if available).

Figure 2. A high level view of an MPEG-4 terminal.

The MPEG-4 standards also have good errors resilience features to improve video robustness in error-prone environments, such as mobile/wireless computing environment. These features are used to detect and localize errors, to recover data after errors, and to visually conceal the effect of errors. To utilize many of these features, error detection must be implemented in the decoder to detect invalid, out of range, or excess data. The standards support the use of flexible re-synchronization markers to detect and localize errors. Data partitioning can also be used with video packet mode to support additional error localization. Intra frame refresh techniques such as cyclic and adaptive intra frame refresh reduce the propagation and persistence of errors further by forcing macroblocks to be intra coded to shorten the length of predictive sequences. All these features lead the MPEG-4 standards are very robust to errors when decoding.

So, in general, the MPEG-4 standards have achieved a unique position in realizing such revolutionary advances with respect to delivering high-quality video and audio to consumers. Among their highlights and features, they offer highly efficient compression, error resilience, bandwidth scalability ranging from 5Kbits to 20Mbits/second, network and transport protocol independence, as well as content security and object-based interactivity [Hassan Shojania, etc., 01.]. These characteristics lend a lot of chances for MPEG-4 to be applied in instable mobile/wireless low-bandwidth networking environment. With respect to streaming, its low-bit-rate audio and video coding capability and its built-in error resilience are especially attractive.

In the following section, I will use some example MPEG-4 applications to interpret how it, along with other techniques, provides a good quality of service in mobile computing environment. I am also trying to classify these applications into several groups in order to obtain better and systematic understanding in this field.


Examples of MPEG-4 applications in wireless/mobile multimedia network

I survey some wireless/mobile multimedia networking applications in which MPEG-4 are used to achieve a better quality of service. In these examples, we will see how MPEG-4 standards’ characteristics are used, such as object-based encoding, error resilience ability, etc.

Transcoding - a video bitstream requantization and framework for multimedia network 

Video transcoding technique maps a non-scalable video stream into another non-scalable stream coded at a bitrate lower than the first stream bitrate. It is a requantization-based technique. It is well known that multimedia network needs a large bandwidth to provide a high quality service. But in wireless/mobile network, network bandwidth is very low, the environment is severely in-stable. In order to get a non-interrupted multimedia service, video transcoding techniques are applied, with video quality scarified to a certain degree. MPEG-4 video bit streams are the objects of video transcoding techniques. The important information in the video bit streams is kept, while the rest part is down sampled. MPEG-4 standards let this idea possible because it is content (object) based coding mechanism. On multimedia network server side, the scene is segmented based on some criteria. Some important object(s) are extracted and applied low transcoding ratio. For those object(s) are not important, a much higher transcoding ratio is used to decrease the required bandwidth. By using this idea, wireless/mobile multimedia network can run video transcoder to adapt to the low and in-stable bandwidth for a non-interrupted service. So the issue is how to decide the important object(s) in the video scenes and how to react to the bandwidth variation in wireless/mobile network. The works done by Dr. Khan’s lab have pretty much answered all these questions:

Which object(s) important?

Eye tracker could be used to track the interest spot of the audience. This method need use a special physical instrument to track the human eyeball movement and then decide the interest spot in the video scene by eyeball’s position (Javed I. Khan and Oleg Komogortsev). Besides this method, motion information also could be used to decide the moving object(s) in the scene. Based on psychology, moving object(s) are most likely the interest spot of audiences (Javed I. Khan and Zhong Guo).

How to react to the highly variant wireless/mobile bandwidth?

There are two kinds of mechanisms are used to make video transcoder adapt to the network bandwidth variation: one is active network framework; the other is iTCP.

i) active network framework mechanism

Dr. Khan and S. Yang proposed an active network framework. Media transcoder exists on some active nodes. The packets could be routed to those active nodes with media transcoder, where video bitstream is re-quantized based on the current network traffic and other conditions, such as destination computing power, display size, etc. This mechanism is illustrated in the following figure 3. In this figure, the multimedia server serves three different clients. In between the server and clients, there is a splitter which is a active router and whose function is to decide which route the packets will take according to the network conditions. In the network, there are some nodes which can provide media transcoder for the packets.

Figure 3. The base topology between one server and three players.

ii) iTCP

Dr. Khan and R.Zagal proposed a interactive TCP layer as the mechanism to invoke media transcoder in response to the network traffic variation. Interactive TCP defines some custom network messages and the corresponding handler. When network traffic changes, some iTCP message is thrown by iTCP layer. Then a proper message handler in the library is called. In message handler, video transcoder is invoked to make the multimedia adapt to the bandwidth change. The idea is illustrated in figure 4.

Figure 4. The base topology between one server and three players.

Transcaling - a video coding and framework for wireless IP multimedia services 

Unlike transcoding talked about just now, transcaling derives one or more scalable streams covering different bandwidth ranges from another scalable stream. Hayder Radha used transcaling techniques to adapt to the frequent bandwidth variations in the wireless IP multimedia network. He also defined a framework in which he deployed the transcaling-based gateway to achieve a better service.

Transcaling-based technique for video over wireless over the wireless internet is described within the context of Receiver Driven Multicast (RDM). A simplified RDM architecture is illustrated in figure 5. RDM of video generates a layered coded video bitstream that consists of multiple streams. The minimum quality stream is known as the base-layer (BL) and the other streams are the Enhancement Layers (ELs). A receiver can “subscribe” and “unsubscribe” EL(s) based on the network traffic. This approach results in an efficient distribution of video by utilizing minimal bandwidth resources over variant network environment. In wireless LAN, RDM is deployed on the edge router. In the edge router, an environmental monitor will monitor the network traffic changes and adapt to these changes by using different amount of ELs.


Figure 5. a simplified view of RDM architecture and a transcaling based node.

Then how MPEG-4 makes this idea possible? In order to meet the bandwidth variation requirements of the Internet and wireless networks, the MPEG-4 fine granular scalability (FGS) video coding method is used to cover any desired bandwidth range while maintaining a very simple scalability structure. As illustrated in figure 6, the FGS structure consists of only two layers: a base-layer coded at a bitrate Rb and a single enhancement-layer coded using a fine-grained scheme (or totally embedded) to a maximum bitrate of Rc. The encoder only needs to know the range of bandwidth over which it has to code the content, and it does not need to be aware of the particular bitrate the content will be streamed at. The streaming server (i.e. edge gateway) on the other hand has a total flexibility in sending any desired portion of any enhancement layer frame. On the receiver side, the FGS framework composes the video according to MPEG-4.

Figure 6. examples of the MPEG-4 FGS scalability structure.

shaped-based scene composition application on mobile platforms

I.Pandzic proposed a facial animation framework for the web and mobile platforms. In the application, facial animation is implemented in MPEG-4. As we know, MPEG-4 can encode any shape information of the object. Facial animation application use a vertex array to describe a 3D facial model, illustrated in figure 7. And the facial animation MPEG-4 framework defines 66 low-level Facial Animation Parameters (FAPs) and two high-level FAPs. The low-level FAPs are based on the study of minimal facial actions and are closely related to muscle actions. They represent a complete set of basic facial actions, and therefore allow the representation of most nutural facial expressions. So, each time, the array of vertices about facial model and these FAPs are encoded to describe a facial scene. The paper presents several good facial animation examples on mobile platforms. We can image these “vector” 3D models should be much smaller than “bitmap” about facial animation scenes, so that it could obtain a good result on mobile platforms.


Figure 7. facial data vertices in facial animation application.


From the above example applications, we can see MPEG-4 encoding is so flexible that they are used in in-stable wireless/mobile multimedia network to adapt to the sever environment. This is because MPEG-4 is a content (object)-based standard, not like its predecessors, MPEG-1 and MPEG-2, are pixel-based standards. So wireless applications can operate on these objects to get tradeoff between quality of service and required bitrates of the scene. Also, MPEG-4 has error-resilience ability. This characteristics also makes it is robust in the wireless/mobile environment.


[1] Andreas Vogel, etc., Distributed Multimedia Applications and Quality of Service - A Survey –( October 1994);

[2] Gerald Kijhne, Christoph Kuhmijnch, Transmitting MPEG-4 Video Streams over the Internet:Problems and Solutions (October 1999);

[3] Hassan Shojania, Baochun Li, Experiences with MPEG-4 Multimedia Streaming (October 2001);


[5] Igor S. Pandzic, Facial Animation Framework for the Web and Mobile Platforms (February  2002); 

[6] Javed I. Khan and Oleg Komogortsev, A Hybrid Scheme for Perceptual Object Window Design with Joint Scene Analysis and Eye-Gaze Tracking for Media Encoding based on Perceptual Attention , Proceedings of the IS&T/ SPIE Symposium of Visual Communications and Image Processing 2004  EI04 Electronic Imaging 2004, January 2004, San Jose, California (accepted, to appear);

[7] Javed I. Khan and Seung S. Yang, A Framework for Building Complex Netcentric Systems on Active Network, DARPA Active Network Research Conference and Exposition, DANCE 2002, Sun Francisco, May 28-31, 2002,pp.409-426;

[8] Javed I. Khan and Zhong Guo, Flock-of-Bird Algorithm for Fast Motion Based Object Tracking and Transcoding in Video Streaming, The 13th IEEE International Packet Video Workshop 2003, Nantes, France, April 2003;


[10] Rickard Neehr, MPEG4 – Short introduction, found at http://www.cdt.luth.se/~peppar/kurs/smd074/seminars/1/2/1/mpeg4.pdf, 2001-10-27; 

[11] Peter Schojer, etc., Architecture of a Quality Based Intelligent Proxy (QBIX) for MPEG­4 Videos (May 2003);

[12] Steven Gringeri, Roman Egorov, etc., Robust Compression and Transmission of MPEG-4 Video (October 1999);

Research Groups

BS-Immersive Media & 3D Video Group
MediaNet Lab of Kent State University
IBM MPEG-Technologies


this survey paper uses "MPEG 4", "mobile networking", "QoS", "mobile multimedia internet" as key words searched in the digital libraries of www.ACM.org and www.COMPUTER.org