email: xxu1@kent.edu,
Prepared for Prof. Javed I. Khan
Department of Computer Science, Kent State University
Date: November 2003
The ISO MPEG (Moving Picture Expert Group) committee completed the
first generation film compression, MPEG-1, in the early 1990s. MPEG-1
was intended for bit-rates up to 1.5mbit/s. Manufacturers tried selling
motion pictures pressed on CDs in this format under the name of VCD.
But, quality of VCD motion pictures was lower than on regular VCR
tapes. The VCD was never a huge commercial success due to conceptual
flaws, there was no real demand for such a product. Far more successful
was MPEG-1 layer3 developed at the Frauenhofer institute in Germany, a
codec for storing music (audio). MPEG-1 layer3 also know as MP3 was,
and still is, widespread for digital storage of music [Rickard Neehr]
In the following years,
MPEG published MPEG-2 whose bit-rate ranges between 1.5 to 15 mbit/s.
So MPEG-2 offers a lot better qualities compared to MPEG-1. MPEG-2 addressed
the functions of multiplexing one or more elementary streams of video
and audio, as well as other data streams, into single or multiple
streams suitable for storage or transmission. But high bit-rate MPEG-2
is very space consuming. A two-hour video DVD quality film requires
more than four gigabyte of space. This is no problem on DVDs, but when
transferring over computer networks it will become an issue [Rickard Neehr].
In 1993, the MPEG group initiated the new
MPEG-4 standards. The goal of MPEG-4 is to develop algorithms and tools
for high efficiency coding and representation of audio and video to
meet the challenges of video conferencing applications. Those most
important additions to the standards are content-based coding (i.e. the
ability to represent a scene as a set of audiovisual objects),
universal accessibility (which includes video robustness to errors) and
good coding efficiency. In the following, I will give more detailed
overview about MPEG-4 techniques in the next section.
Besides that, I will review some
applications, in which MPEG-4 techniques, along with some other
techniques, are used to obtain a content quality of service in unstable
and severe wireless/mobile computing environment. The classification
among these applications is to be done also. The advantages and
disadvantages of each class are attempted to interpret.
MPEG-4 is a kind of content-based multimedia standards. A scene of picture in MPEG-4 is decomposed of a set of objects when encoding, illustrated in the following figure 1. In the figure 1, a scene about
The MPEG-4 standards also have good errors resilience features to improve video robustness in error-prone environments, such as mobile/wireless computing environment. These features are used to detect and localize errors, to recover data after errors, and to visually conceal the effect of errors. To utilize many of these features, error detection must be implemented in the decoder to detect invalid, out of range, or excess data. The standards support the use of flexible re-synchronization markers to detect and localize errors. Data partitioning can also be used with video packet mode to support additional error localization. Intra frame refresh techniques such as cyclic and adaptive intra frame refresh reduce the propagation and persistence of errors further by forcing macroblocks to be intra coded to shorten the length of predictive sequences. All these features lead the MPEG-4 standards are very robust to errors when decoding.
So, in general, the MPEG-4 standards have achieved
a unique position in realizing such revolutionary advances with respect
to delivering high-quality video and audio to consumers. Among their
highlights and features, they offer highly efficient compression, error
resilience, bandwidth scalability ranging from 5Kbits to
20Mbits/second, network and transport protocol independence, as well as
content security and object-based interactivity [Hassan Shojania, etc.,
01.]. These characteristics lend a lot of chances for MPEG-4 to be
applied in instable mobile/wireless low-bandwidth networking
environment. With respect to streaming, its low-bit-rate audio and
video coding capability and its built-in error resilience are
especially attractive.
In the following section, I will use some example MPEG-4 applications to interpret how it, along with other techniques, provides a good quality of service in mobile computing environment. I am also trying to classify these applications into several groups in order to obtain better and systematic understanding in this field.
I survey some wireless/mobile multimedia networking applications in
which MPEG-4 are used to achieve a better quality of service. In these
examples, we will see how MPEG-4 standards’ characteristics are used,
such as object-based encoding, error resilience ability, etc.
Video transcoding technique maps a non-scalable
video stream into another non-scalable stream coded at a bitrate lower
than the first stream bitrate. It is a requantization-based technique.
It is well known that multimedia network needs a large bandwidth to
provide a high quality service. But in wireless/mobile network, network
bandwidth is very low, the environment is severely in-stable. In order
to get a non-interrupted multimedia service, video transcoding
techniques are applied, with video quality scarified to a certain
degree. MPEG-4 video bit streams are the objects of video transcoding
techniques. The important information in the video bit streams is kept,
while the rest part is down sampled. MPEG-4 standards let this idea
possible because it is content (object) based coding mechanism. On
multimedia network server side, the scene is segmented based on some
criteria. Some important object(s) are extracted and applied low
transcoding ratio. For those object(s) are not important, a much higher
transcoding ratio is used to decrease the required bandwidth. By using
this idea, wireless/mobile multimedia network can run video transcoder
to adapt to the low and in-stable bandwidth for a non-interrupted
service. So the issue is how to decide the important object(s) in the
video scenes and how to react to the bandwidth variation in
wireless/mobile network. The works done by Dr. Khan’s lab have pretty
much answered all these questions:
Eye tracker could be used to track the
interest spot of the audience. This method need use a special physical
instrument to track the human eyeball movement and then decide the
interest spot in the video scene by eyeball’s position (Javed I. Khan and Oleg Komogortsev).
Besides this method, motion information also could be used to decide
the moving object(s) in the scene. Based on psychology, moving
object(s) are most likely the interest spot of audiences (Javed
I. Khan and Zhong Guo).
There are two kinds of mechanisms are used to make video transcoder adapt to the network bandwidth variation: one is active network framework; the other is iTCP.
Dr. Khan and S. Yang proposed an active
network framework. Media transcoder exists on some active nodes. The
packets could be routed to those active nodes with media transcoder,
where video bitstream is re-quantized based on the current network
traffic and other conditions, such as destination computing power,
display size, etc. This mechanism is illustrated in the following
figure 3. In this figure, the multimedia server serves three different
clients. In between the server and clients, there is a splitter which
is a active router and whose function is to decide which route the
packets will take according to the network conditions. In the network,
there are some nodes which can provide media transcoder for the packets.
Dr. Khan and R.Zagal proposed a interactive TCP
layer as the mechanism to invoke media transcoder in response to the
network traffic variation. Interactive TCP defines some custom network
messages and the corresponding handler. When network traffic changes,
some iTCP message is thrown by iTCP layer. Then a proper message
handler in the library is called. In message handler, video transcoder
is invoked to make the multimedia adapt to the bandwidth change. The
idea is illustrated in figure 4.
Figure 4. The
base topology between one server and three players.
Unlike transcoding talked about just now,
transcaling derives one or more scalable streams covering different
bandwidth ranges from another scalable stream. Hayder Radha used
transcaling techniques to adapt to the frequent bandwidth variations in
the wireless IP multimedia network. He also defined a framework in
which he deployed the transcaling-based gateway to achieve a better
service.
Transcaling-based technique for video over
wireless over the wireless internet is described within the context of
Receiver Driven Multicast (RDM). A simplified RDM architecture is
illustrated in figure 5. RDM of video generates a layered coded video
bitstream that consists of multiple streams. The minimum quality stream
is known as the base-layer (BL) and the other streams are the
Enhancement Layers (ELs). A receiver can “subscribe” and “unsubscribe”
EL(s) based on the network traffic. This approach results in an
efficient distribution of video by utilizing minimal bandwidth
resources over variant network environment. In wireless LAN, RDM is
deployed on the edge router. In the edge router, an environmental
monitor will monitor the network traffic changes and adapt to these
changes by using different amount of ELs.
Figure 5. a simplified view of RDM
architecture and a transcaling based node.
Then how MPEG-4 makes this idea possible? In order to meet the
bandwidth variation requirements of the Internet and wireless networks,
the MPEG-4 fine granular scalability (FGS) video coding method is used
to cover any desired bandwidth range while maintaining a very simple
scalability structure. As illustrated in figure 6, the FGS structure
consists of only two layers: a base-layer coded at a bitrate Rb
and a single enhancement-layer coded using a fine-grained scheme (or
totally embedded) to a maximum bitrate of Rc. The encoder
only needs to know the range of bandwidth over which it has to code the
content, and it does not need to be aware of the particular bitrate the
content will be streamed at. The streaming server (i.e. edge gateway)
on the other hand has a total flexibility in sending any desired
portion of any enhancement layer frame. On the receiver side, the FGS
framework composes the video according to MPEG-4.
Figure 6. examples of the MPEG-4 FGS scalability structure.
I.Pandzic proposed a facial animation framework for the web and mobile platforms. In the application, facial animation is implemented in MPEG-4. As we know, MPEG-4 can encode any shape information of the object. Facial animation application use a vertex array to describe a 3D facial model, illustrated in figure 7. And the facial animation MPEG-4 framework defines 66 low-level Facial Animation Parameters (FAPs) and two high-level FAPs. The low-level FAPs are based on the study of minimal facial actions and are closely related to muscle actions. They represent a complete set of basic facial actions, and therefore allow the representation of most nutural facial expressions. So, each time, the array of vertices about facial model and these FAPs are encoded to describe a facial scene. The paper presents several good facial animation examples on mobile platforms. We can image these “vector” 3D models should be much smaller than “bitmap” about facial animation scenes, so that it could obtain a good result on mobile platforms.
Figure 7. facial data vertices in
facial animation application.
From the above example applications, we can see
MPEG-4 encoding is so flexible that they are used in in-stable
wireless/mobile multimedia network to adapt to the sever environment.
This is because MPEG-4 is a content (object)-based standard, not like
its predecessors, MPEG-1 and MPEG-2, are pixel-based standards. So
wireless applications can operate on these objects to get tradeoff
between quality of service and required bitrates of the scene. Also,
MPEG-4 has error-resilience ability. This characteristics also makes it
is robust in the wireless/mobile environment.
[1] Andreas Vogel, etc., Distributed Multimedia
Applications and Quality of Service - A Survey –( October 1994);
[2] Gerald Kijhne, Christoph Kuhmijnch, Transmitting
MPEG-4 Video Streams over the Internet:Problems and Solutions (October 1999);
[3] Hassan Shojania,
Baochun Li, Experiences with MPEG-4 Multimedia
Streaming (October 2001);
[4] Hayder Radha, TRANSCALING:
A VIDEO CODING AND MULTICASTING FRAMEWORK FOR WIRELESS IP MULTIMEDIA
SERVICES (July 2001);
[5] Igor S. Pandzic, Facial
Animation Framework for the Web and Mobile Platforms (February 2002);
[6] Javed I. Khan and Oleg Komogortsev, A
Hybrid Scheme for Perceptual Object Window Design with Joint Scene
Analysis and Eye-Gaze Tracking for Media Encoding based on Perceptual
Attention , Proceedings of the IS&T/ SPIE Symposium of Visual
Communications and Image Processing 2004 EI04 Electronic Imaging
2004, January 2004, San Jose, California (accepted, to appear);
[7] Javed I. Khan and Seung S. Yang, A
Framework for Building Complex Netcentric Systems on Active Network,
DARPA Active Network Research Conference and Exposition, DANCE 2002,
Sun Francisco, May 28-31, 2002,pp.409-426;
[8] Javed I. Khan and Zhong Guo,
Flock-of-Bird Algorithm for Fast Motion Based Object Tracking and
Transcoding in Video Streaming, The 13th IEEE International Packet
Video Workshop 2003, Nantes, France, April 2003;
[9] Peter Kauff, etc., AN
IMMERSIVE 3D VIDEO-CONFERENCING SYSTEM USING SHARED VIRTUAL TEAM USER
ENVIRONMENTS (September 2002);
[10] Rickard Neehr, MPEG4 – Short introduction,
found at http://www.cdt.luth.se/~peppar/kurs/smd074/seminars/1/2/1/mpeg4.pdf,
2001-10-27;
[11] Peter Schojer, etc., Architecture
of a Quality Based Intelligent Proxy (QBIX) for MPEG4 Videos (May 2003);
[12] Steven Gringeri,
Roman Egorov, etc., Robust Compression and Transmission of
MPEG-4 Video (October 1999);
BS-Immersive Media & 3D Video Group
MediaNet Lab of Kent State
University
IBM
MPEG-Technologies