Prepared for Prof. Javed I. Khan
Department of Computer Science, Kent State University
Date: November 2003
Sensor networks may potentially be used in many different applications. The most commonly cited example is a military application: sensor nodes may be scattered on a battlefield and used to collect information on enemy troop locations, movements, and other information that is vital to gain the upper hand in a conflict. Other possible uses for these devices may be in such things as monitoring the habitats of certain types of animals, or measuring temperatures in large forests in an effort to detect forest fires before they grow out of control; measuring the temperature of international travelers to prevent or slow the spread of contagious diseases such as SARS; or many others.
Unlike convential wired networks which are relatively unconstrained with regard to how often the network interface can be used and how often and how much processing can be done, sensor networks are much more sensitive to power demands. Conventional systems have, for all practical purposes, unlimited energy for communications and processing, and are only really constrained with regard to how long it takes for computation and communication to take place. The situation is quite different when we are discussing wireless sensornets. The most important constraining factor in most sensor networks comes down to one and only one thing: the battery power of each node. With this constraint in mind, almost all algorithms dealing with data storage, retrieval, and transfer must be modified.
As the capabilities of sensor node may vary widely based on their intended application, the types of data collected also vary greatly. In almost all cases, the data are indexed in some way based on the time of collection, and possibly by the location of the collecting node. The measurements may consist of scalar measurements such as temperature, magnetic field, or light intensity, or more complex measurements such as still images or video footage from cameras. The storage and transmission costs of some of these (such as scalar measurements) is most likely quite low, but more sophisticated measurements will necessarily require more energy and memory to transmit, store, and process.
The second paper is by Ghose, Grossklags, and Chuang, entitled "Resilient Data-Centric Storage in Wireless Ad-Hoc Sensor Networks". This paper proposes a method called R-DCS (Resilient Data-Centric Storage), This paper extends the work done by Ratnasamy et al. The way that the data is stored the network is quite simple: there is one pricipal node for each type of data to be stored, and a number of backup nodes (or replica nodes) that store copies of the data. In addition, they describe monitor nodes, which are geographically distributed and store summaries of the data and control information about the network. In this scheme, every piece of data is given a unique name (hashtable key), and is inserted into a distributed hash table, which is stored at the principal nodes and replica nodes. The nodes are broken up into zones, and each zone has a number of replica nodes and monitor nodes. In this way, there is robustness and load-balancing inherent in the system, provided there are enough zones.
The next paper is from Intanagonwiwat, Govindan, and Estrin, entitled "Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks". In this paper is described a system whereby each piece of data to be transmitted is described by a name. This name is inherently task-dependent, but is generally of a form such that it contains an identification of the task being performed, as well as identifying information about the data, such as time, location, confidence, etc. When a request for data (known as an interest) is sent, it goes out to all the nodes in the network, and those that have data matching the interest pass their data back along the path which the interest took. Thus this particular data handling scheme does not globally and permanently describe a piece of data, but the name of a particular datum may change depending on the particular interest which it corresponds to.
Fourth we consider a paper from Kulik, Heinzelman, and Balakrishnan, with the title "Negotiation-based Protocols for Dissemination Information in Wireless Sensor Networks". This paper talks about a scheme that the authors have named SPINS (Sensor Protocols for Information via Negotiation). The SPINS architecture describes methods of disseminating data based on information describing the data, which they call meta-data. This metadata is specific to the application, but should be unique (i.e. no two pieces of data should be described by the same metadata unless they are identical). It may consist of the geographic and temporal coordinates, the identifier of the particular sensor that made the reading, or any number of other possible identifiers. The one constraint on metadata is that it must be smaller than the data it describes, otherwise it does no good. The protocols described use a negotiation approach, in which before sending data, a node will negotiate with its neighbor whether or not the data should be sent, based on various things such as the amount of available energy, similarity to data already sent, etc. The idea is that energy will be conserved and less traffic will be generated by intelligently transmitting and forwarding data.
The final paper is by Ratnasamy et al., and is entitled, "Data-Centric Storage in Sensornets with GHT, a Geographic Hash Table". This is the paper that Ghose et al. built off. In this paper, each piece of data is described by a unique name, and this name hashes to a particular geographic location (unrelated to the locatio it was collected). When it is entered into the hash table, this piece of data is routed to the node closest to the geographic location it hashed to. A node will periodically make backups with the nodes it its nearest to, and in this way even when a node moves or fails, its data will still be retained in the network.
With this basic description of the papers done with, we now come to the meat of this article: categorization. The approaches of these papers fall under two basic categories: those in which the data being stored or requested is stored is kept for long-term use, and which are only concerned with data as it becomes available. We may better examine the advances and tradeoffs of each paper with those that deal with similar issues.
Ratnasamy's paper deals with Geographic Hash Tables (GHT). As described above, when data is entered into the hash table, it is routed to the node nearest to the geographic coordinates that the particular key chosen maps to. This is quite a clever idea, and the result of this is that, provided that the sensor nodes are evenly distributed, there should not be any node that has significantly more data than any other. One downside to this approach is that the data collected at one node is not stored there, but may be one the opposite side of the sensornet, leading to quite a large amount of data being unnecessarily sent around. One the other hand, if one node fails, all the data it has collected and stored will not be lost, as the hashing scheme (if it is working properly) should have distributed that data to many different nodes all over the network. In addition to this type of robustness, the authors thought to add the additional precaution of mirroring the data actually stored in each node with its neighboring nodes, which will take care of those cases in which a node moves or fails -- in those cases, the neighbors should be able to retrieve the data that they have mirrored from the failed node.
Ghose's paper has a different approach to this whole issue. It keeps the idea of data-centric storage, but does not use the Geographic Hash Table idea. Individual pieces of data are still stored based on their keys, but the data is not disseminated across the entire sensornet based on geographic coordinates. Instead, the sensornet is divided into zones, each with (hopefully) similar numbers of nodes. For each type of data being collected in the sensornet, there is a particular node which is in charge of storing that type of data. In addition, there are a number of replica nodes for each type of data, which communicate amongst themselves and make backup copies of all the data. In each of the zones there is also a monitor mode, which keeps track of the replica nodes, and when it is necessary to retrieve or store data, the request or storage instruction is forwarded to the closest replica node. That closest replica node then handles that request.
Ganesan's paper, on the other hand, takes a very different type of approach. In that paper, techniques based on signal processing are used, most specifically using wavelet compression. Their technique involves grouping all of the nodes into clusters. Within these clusters, each node provides some portion of its data to the cluster head, which performs wavelet compression on the combined data from all of its cluster members. The cluster heads then send this to their superclusters, which eventually send it to the highest-level cluster head. In this way the data at each scale should be available at lower resolutions.
Which of these three approaches is most useful certainly depends on the application it is being used for. The approach from Ganesan is very useful if it is important to be able to quickly retrieve data at different resolutions, and those this data for a long time. This is useful when trying to mine for patterns within the collected data. Ratnasamy and Ghose are similar in what applications they would seem to be useful for, but it is not clear at this time whether the GHT approach or the Resilient approach is more practical at this time. It is certainly clear that more research needs to be done before any claims can be made either way.
Intanagonwiwat's approach is a novel one: no information is directed toward a particular node or geographic region, unlike almost all other approaches used. Instead, a node requesting information from others sends out what is known as an interest message. This interest message may contain such things as what time period the information should be sent, how often, what type of data, and other things to narrow down which nodes should respond. This interest message is broadcast out to all the surrounding nodes, which is known as diffusion. Each node that forwards the interest keeps track of which node it received it from, and if it receives a duplicate interest message, it simply drops it. In this way, there are set up what are known as gradients, which are followed in the reverse direction when a node is sending data back in response to an interest message. This has the benefit of both choosing the best path, and not having to build complex routing tables or lists of nodes that have particular data, as the network is essentially self-managing in this way.
The approach taken in Kulik's paper appears useful for avoiding the unnecessary transmission of data. One major idea put forth in the paper concerns when two nodes have overlapping sensor coverage ranges. In this case, it is quite possible that both of the nodes have made the same measurement, so for both of them to send it on would be a waste of effort and energy. A classic flooding situation, where all nodes send all their data to everyone in their immediate vicinity, who are then expected to pass this data on in turn to their neighbors, is clearly inadequate in such a situation. So what the negotiation-based protocols do is for a node with new data to advertise this data to its upstream neighbors. These can then choose whether or not to request this new data, depending on whether or not they have heard it and whether they have the energy to expend on non-critical communication.
These two approaches clearly solve some of the same problems, but seem suited to different types of applications. In a case where there is a lot of duplicated data which may be passed around, a negotiation-based approach seems to be quite useful. On the other hand, in a situation where a requesting node does not necessarily know what nodes to ask for particular task data, a directed diffusion approach may well yield much better results. It is not clear which approach will win out overall, or possibly both will find their own niches.
At this point in time it is still not clear which of these five approaches will be the winning one, or perhaps some new approach not yet developed will be able to combine the best points of all of these. One thing is clear, however: wireless sensornets are here to stay, and they will only keep improving, both in complexity and in capabilities. It is critical, then that there be good solutions for how to store, retrieve, transmit, and recover data even after nodes fail, lose power, or simply go out of range. Whatever researcher comes up with a good solution will likely not have much trouble finding work or funding in the future.