"Distributed Storage Challenges Data Glut"

IEEE Computer August 2002 Issue, p. 23; Paulson, Linda Dailey

Several research projects promise improved distributed storage technologies as a way to cope with rapidly growing data volumes and increasingly scattered organizations.

The OceanStore Project (http://oceanstore.cs.berkeley.edu) at the University of California, Berkeley, stores data on servers throughout a network. Other computers will be able to join the network dynamically. John D. Kubiatowicz, and assistant professor in UC Berkeley's Department of Electrical Engineering and Computer Sciences, said the approach is akin to an electric grid in that the system is designed to transparently and dependably provide storage resources to users.

OceanStore will function as platform-neutral software that can be integrated with other applications, he explained. It will differ from private, closed storage area networks (SANs), because data will be globally accessible.

The prototype system is using servers in Australia and the US, and researchers hope to expand this network in the near future.

University of Tennessee researchers have developed the Internet Backplane Protocol (http://loci.cs.utk.edu/ibp/), which functions as middleware for managing and using remote storage. Basically, the IBP lets storage nodes communicate easily via the Internet, thereby letting users access external storage resources.

Micah Beck, an associate professor and director of the university's Logistical Computing and Internetworking Laboratory, said the researchers' goal was to create a protocol that would serve as the storage equivalent of IP.

Internet service providers could use the new protocol to provide storage, freeing users from deploying their own resources. ISPs could offer their customers various value-added storage-related resources, including point-to-point networks and community-wide resources such as Web-cached data for subsequent use.

Users' resources, rather than the storage network, would provide the system's intelligence and accompanying overhead, Beck explained. This would let the networks scale more easily than SANs and network-attached storage, which must provide their own intelligence.

The researchers have submitted a proposal to the US National Science Foundation for a 100-terabyte storage testbed that would attract various organizations and universities throughout the country and connect to the high-bandwidth Internet2 research project.

Hewlett-Packard's HP Labs is working on the iShadow distributed storage project. Storage would be distributed across nodes in a network, including disks, servers, and data centers. SANs or Ethernet networks using standards such as Fibre Channel and the Internet small computer system interface (iSCSI) would link the nodes.

HP Labs senior research scientist Ram Swaminathan said services such as authentication and security will be layered on top of iShadow. HP has also developed self-managing storage technologies. For example, the Hippodrome prototype can analyze an existing storage system, create a new design for optimizing the way data will be stored, and migrate the system to the new design.

Computing and storage resources and dropping in price faster than bandwidth, which could make bandwidth a price and technology bottleneck for new storage systems.