PARA'04 State-of-the-Art
in Scientific Computing
June 20-23, 2004 (Home page)

Updated: February 4, 2004

Parallel and Distributed Techniques for Merging and Extracting Large Ontologies as Resource in a Grid Environment

Andrew Flahive, Mehul Bhatt, Carlo Wouters, Wenny Rahayu
Computer Science Department
La Trobe University, Australia
emails: {apflahiv,mbhatt,cewouter,wennyr}@cs.latrobe.edu.au
and
David Taniar
Monash University, Australia
email: David.Taniar@infotech.monash.edu.au

Increasingly large ontologies have been becoming more and more common over the past few years. Many highly theoretical professions are finding that ontologies are the key for storing their ever growing information needs. Ontologies can store huge amounts of data whilst maintaining the vastly complex relationships between the sets of data. The main objective through the use of ontologies is information sharing.

The idea of Grid Computing is to share computer resources over a wide area. Each Grid location has one or more resources that it shares and maintains locally among the wider grid community. In this case several locations may maintain their own ontology with their own data.

This paper presents a system that merges several ontologies from different sources as well as extracts a smaller, interconnected sub-ontology using parallel and distributed methods designed as a grid resource in a grid environment. A remote user of the system describes where to locate the large ontologies and pre-selects the specific information required from these different ontologies. This system aids the sharing of information between parties without the need for a central ontology location or local processing plant.

For large ontologies it is crucial to split up and distribute the huge amount of processing that is required to merge and extract. The system locates the ontologies to be merged and extracts the key features as specified by the user. Once the final sub-ontology has been produced, it should contain only the information required by the user. The result is a sub-ontology, which is in its own right a complete and valid ontology[1,2]. The system uses an extension to the classic task-farming approach to distribute the work load, pertaining to the merging and extracting of the ontologies, among a number of processors at a HPC facility. The system is designed to be a grid resource and deployed on a HPC system.

This paper describes a system that is part of a bigger project [1,2,3]. The results from the distributed extraction of a sub-ontology on a HPC has already been reported in [3]. The main focus of this paper is to present the extension of this project, the merging of several ontologies from different sources and the design of this system as a grid resource. The parallel and distributed techniques used for both the merging and extracting of the ontologies, is described in this paper, as well as the emergence of this system into a grid resource.

References:
1. Wouters, C, Dillon, T., Rahayu, W,. Chang, E. and Meersman, R., A Practical Walkthrough of the Ontology Derivation Rules, in Web Information Systems, Ch 6, IDEA group pub. 2004.
2. Wouters, C, Dillon, T., Rahayu, W, Chang, E., and Meersman, R. Ontologies on the MOVE, Proceedings of 9th international conference on Database Systems for Advanced Application (DASFAA 2004), Lecture Notes in Computer Science, Springer Verlag, April 2004.
3. Bhatt M., Flahive A., Wouters C., Rahayu W., and Taniar D. A Distributed Approach to Sub-Ontology Extraction. In Proceedings of the 18th International Conference on Advanced Information Networking and Applications (AINA'04) Fukuoka, Japan, March 2004.

Home page


Jerzy Wasniewski
2004-02-04