Prepared for Prof. Javed I. Khan
Department of Computer Science, Kent State University
Date: November 2003
The ISO MPEG (Moving Picture Expert Group) committee completed the
first generation film compression, MPEG-1, in the early 1990s. MPEG-1
was intended for bit-rates up to 1.5mbit/s. Manufacturers tried selling
motion pictures pressed on CDs in this format under the name of VCD.
But, quality of VCD motion pictures was lower than on regular VCR
tapes. The VCD was never a huge commercial success due to conceptual
flaws, there was no real demand for such a product. Far more successful
was MPEG-1 layer3 developed at the Frauenhofer institute in Germany, a
codec for storing music (audio). MPEG-1 layer3 also know as MP3 was,
and still is, widespread for digital storage of music [Rickard Neehr]
In the following years, MPEG published MPEG-2 whose bit-rate ranges between 1.5 to 15 mbit/s. So MPEG-2 offers a lot better qualities compared to MPEG-1. MPEG-2 addressed the functions of multiplexing one or more elementary streams of video and audio, as well as other data streams, into single or multiple streams suitable for storage or transmission. But high bit-rate MPEG-2 is very space consuming. A two-hour video DVD quality film requires more than four gigabyte of space. This is no problem on DVDs, but when transferring over computer networks it will become an issue [Rickard Neehr].
In 1993, the MPEG group initiated the new MPEG-4 standards. The goal of MPEG-4 is to develop algorithms and tools for high efficiency coding and representation of audio and video to meet the challenges of video conferencing applications. Those most important additions to the standards are content-based coding (i.e. the ability to represent a scene as a set of audiovisual objects), universal accessibility (which includes video robustness to errors) and good coding efficiency. In the following, I will give more detailed overview about MPEG-4 techniques in the next section.
Besides that, I will review some
applications, in which MPEG-4 techniques, along with some other
techniques, are used to obtain a content quality of service in unstable
and severe wireless/mobile computing environment. The classification
among these applications is to be done also. The advantages and
disadvantages of each class are attempted to interpret.
MPEG-4 is a kind of content-based multimedia standards. A scene of picture in MPEG-4 is decomposed of a set of objects when encoding, illustrated in the following figure 1. In the figure 1, a scene about
The MPEG-4 standards also have good errors resilience features to improve video robustness in error-prone environments, such as mobile/wireless computing environment. These features are used to detect and localize errors, to recover data after errors, and to visually conceal the effect of errors. To utilize many of these features, error detection must be implemented in the decoder to detect invalid, out of range, or excess data. The standards support the use of flexible re-synchronization markers to detect and localize errors. Data partitioning can also be used with video packet mode to support additional error localization. Intra frame refresh techniques such as cyclic and adaptive intra frame refresh reduce the propagation and persistence of errors further by forcing macroblocks to be intra coded to shorten the length of predictive sequences. All these features lead the MPEG-4 standards are very robust to errors when decoding.
So, in general, the MPEG-4 standards have achieved a unique position in realizing such revolutionary advances with respect to delivering high-quality video and audio to consumers. Among their highlights and features, they offer highly efficient compression, error resilience, bandwidth scalability ranging from 5Kbits to 20Mbits/second, network and transport protocol independence, as well as content security and object-based interactivity [Hassan Shojania, etc., 01.]. These characteristics lend a lot of chances for MPEG-4 to be applied in instable mobile/wireless low-bandwidth networking environment. With respect to streaming, its low-bit-rate audio and video coding capability and its built-in error resilience are especially attractive.
In the following section, I will use some example MPEG-4 applications to interpret how it, along with other techniques, provides a good quality of service in mobile computing environment. I am also trying to classify these applications into several groups in order to obtain better and systematic understanding in this field.
I survey some wireless/mobile multimedia networking applications in
which MPEG-4 are used to achieve a better quality of service. In these
examples, we will see how MPEG-4 standards’ characteristics are used,
such as object-based encoding, error resilience ability, etc.
Video transcoding technique maps a non-scalable video stream into another non-scalable stream coded at a bitrate lower than the first stream bitrate. It is a requantization-based technique. It is well known that multimedia network needs a large bandwidth to provide a high quality service. But in wireless/mobile network, network bandwidth is very low, the environment is severely in-stable. In order to get a non-interrupted multimedia service, video transcoding techniques are applied, with video quality scarified to a certain degree. MPEG-4 video bit streams are the objects of video transcoding techniques. The important information in the video bit streams is kept, while the rest part is down sampled. MPEG-4 standards let this idea possible because it is content (object) based coding mechanism. On multimedia network server side, the scene is segmented based on some criteria. Some important object(s) are extracted and applied low transcoding ratio. For those object(s) are not important, a much higher transcoding ratio is used to decrease the required bandwidth. By using this idea, wireless/mobile multimedia network can run video transcoder to adapt to the low and in-stable bandwidth for a non-interrupted service. So the issue is how to decide the important object(s) in the video scenes and how to react to the bandwidth variation in wireless/mobile network. The works done by Dr. Khan’s lab have pretty much answered all these questions:
Eye tracker could be used to track the interest spot of the audience. This method need use a special physical instrument to track the human eyeball movement and then decide the interest spot in the video scene by eyeball’s position (Javed I. Khan and Oleg Komogortsev). Besides this method, motion information also could be used to decide the moving object(s) in the scene. Based on psychology, moving object(s) are most likely the interest spot of audiences (Javed I. Khan and Zhong Guo).
There are two kinds of mechanisms are used to make video transcoder adapt to the network bandwidth variation: one is active network framework; the other is iTCP.
Dr. Khan and S. Yang proposed an active
network framework. Media transcoder exists on some active nodes. The
packets could be routed to those active nodes with media transcoder,
where video bitstream is re-quantized based on the current network
traffic and other conditions, such as destination computing power,
display size, etc. This mechanism is illustrated in the following
figure 3. In this figure, the multimedia server serves three different
clients. In between the server and clients, there is a splitter which
is a active router and whose function is to decide which route the
packets will take according to the network conditions. In the network,
there are some nodes which can provide media transcoder for the packets.
Dr. Khan and R.Zagal proposed a interactive TCP
layer as the mechanism to invoke media transcoder in response to the
network traffic variation. Interactive TCP defines some custom network
messages and the corresponding handler. When network traffic changes,
some iTCP message is thrown by iTCP layer. Then a proper message
handler in the library is called. In message handler, video transcoder
is invoked to make the multimedia adapt to the bandwidth change. The
idea is illustrated in figure 4.
Figure 4. The base topology between one server and three players.
Unlike transcoding talked about just now, transcaling derives one or more scalable streams covering different bandwidth ranges from another scalable stream. Hayder Radha used transcaling techniques to adapt to the frequent bandwidth variations in the wireless IP multimedia network. He also defined a framework in which he deployed the transcaling-based gateway to achieve a better service.
Transcaling-based technique for video over
wireless over the wireless internet is described within the context of
Receiver Driven Multicast (RDM). A simplified RDM architecture is
illustrated in figure 5. RDM of video generates a layered coded video
bitstream that consists of multiple streams. The minimum quality stream
is known as the base-layer (BL) and the other streams are the
Enhancement Layers (ELs). A receiver can “subscribe” and “unsubscribe”
EL(s) based on the network traffic. This approach results in an
efficient distribution of video by utilizing minimal bandwidth
resources over variant network environment. In wireless LAN, RDM is
deployed on the edge router. In the edge router, an environmental
monitor will monitor the network traffic changes and adapt to these
changes by using different amount of ELs.
Figure 5. a simplified view of RDM architecture and a transcaling based node.
Then how MPEG-4 makes this idea possible? In order to meet the bandwidth variation requirements of the Internet and wireless networks, the MPEG-4 fine granular scalability (FGS) video coding method is used to cover any desired bandwidth range while maintaining a very simple scalability structure. As illustrated in figure 6, the FGS structure consists of only two layers: a base-layer coded at a bitrate Rb and a single enhancement-layer coded using a fine-grained scheme (or totally embedded) to a maximum bitrate of Rc. The encoder only needs to know the range of bandwidth over which it has to code the content, and it does not need to be aware of the particular bitrate the content will be streamed at. The streaming server (i.e. edge gateway) on the other hand has a total flexibility in sending any desired portion of any enhancement layer frame. On the receiver side, the FGS framework composes the video according to MPEG-4.
Figure 6. examples of the MPEG-4 FGS scalability structure.
I.Pandzic proposed a facial animation framework for the web and mobile platforms. In the application, facial animation is implemented in MPEG-4. As we know, MPEG-4 can encode any shape information of the object. Facial animation application use a vertex array to describe a 3D facial model, illustrated in figure 7. And the facial animation MPEG-4 framework defines 66 low-level Facial Animation Parameters (FAPs) and two high-level FAPs. The low-level FAPs are based on the study of minimal facial actions and are closely related to muscle actions. They represent a complete set of basic facial actions, and therefore allow the representation of most nutural facial expressions. So, each time, the array of vertices about facial model and these FAPs are encoded to describe a facial scene. The paper presents several good facial animation examples on mobile platforms. We can image these “vector” 3D models should be much smaller than “bitmap” about facial animation scenes, so that it could obtain a good result on mobile platforms.
Figure 7. facial data vertices in facial animation application.
From the above example applications, we can see
MPEG-4 encoding is so flexible that they are used in in-stable
wireless/mobile multimedia network to adapt to the sever environment.
This is because MPEG-4 is a content (object)-based standard, not like
its predecessors, MPEG-1 and MPEG-2, are pixel-based standards. So
wireless applications can operate on these objects to get tradeoff
between quality of service and required bitrates of the scene. Also,
MPEG-4 has error-resilience ability. This characteristics also makes it
is robust in the wireless/mobile environment.
 Andreas Vogel, etc., Distributed Multimedia
Applications and Quality of Service - A Survey –( October 1994);
 Gerald Kijhne, Christoph Kuhmijnch, Transmitting MPEG-4 Video Streams over the Internet:Problems and Solutions (October 1999);
 Hassan Shojania, Baochun Li, Experiences with MPEG-4 Multimedia Streaming (October 2001);
 Hayder Radha, TRANSCALING: A VIDEO CODING AND MULTICASTING FRAMEWORK FOR WIRELESS IP MULTIMEDIA SERVICES (July 2001);
 Igor S. Pandzic, Facial Animation Framework for the Web and Mobile Platforms (February 2002);
 Javed I. Khan and Oleg Komogortsev, A Hybrid Scheme for Perceptual Object Window Design with Joint Scene Analysis and Eye-Gaze Tracking for Media Encoding based on Perceptual Attention , Proceedings of the IS&T/ SPIE Symposium of Visual Communications and Image Processing 2004 EI04 Electronic Imaging 2004, January 2004, San Jose, California (accepted, to appear);
 Javed I. Khan and Seung S. Yang, A Framework for Building Complex Netcentric Systems on Active Network, DARPA Active Network Research Conference and Exposition, DANCE 2002, Sun Francisco, May 28-31, 2002,pp.409-426;
 Javed I. Khan and Zhong Guo, Flock-of-Bird Algorithm for Fast Motion Based Object Tracking and Transcoding in Video Streaming, The 13th IEEE International Packet Video Workshop 2003, Nantes, France, April 2003;
 Peter Kauff, etc., AN IMMERSIVE 3D VIDEO-CONFERENCING SYSTEM USING SHARED VIRTUAL TEAM USER ENVIRONMENTS (September 2002);
 Rickard Neehr, MPEG4 – Short introduction, found at http://www.cdt.luth.se/~peppar/kurs/smd074/seminars/1/2/1/mpeg4.pdf, 2001-10-27;
 Peter Schojer, etc., Architecture of a Quality Based Intelligent Proxy (QBIX) for MPEG4 Videos (May 2003);
 Steven Gringeri, Roman Egorov, etc., Robust Compression and Transmission of MPEG-4 Video (October 1999);
BS-Immersive Media & 3D Video Group
MediaNet Lab of Kent State University