University of Tennessee
College of Engineering
spacer
A-Z Index  /  WebMail  /  Dept. Directory
Michael W. Berry 18 May, 2008  
spacer
spacer
spacer
spacerHome
spacerPublications
spacerCourses
spacerStudents
spacerPrograms
spacerUtilities
spacer
spacer
spacer
  UTK CS Home
  UTK Home
spacer
spacer Current theme is ut
Change to:
spacer
   
spacer  
Computational Science Projects Printer-friendly 
	    version of this page
Home > Computational Science Projects

  Latent Semantic Indexing (or LSI) is a concept-based information retrieval model. Terms and documents are both encoded for vector space representation so that documents may be clustered (semantically) near each other yet share no common terms. LSI addresses the two fundamental problems which plague traditional lexical-matching indexing schemes: synonymy and polysemy. Content Analyst Company, LLC owns the original patent to LSI: Computer information retrieval using latent semantic structure U.S. Patent No. 4,839,853, June 13, 1989. SVDPACK comprises four numerical (iterative) methods for computing the singular value decomposition (SVD) of large sparse matrices using double precision ANSI Fortran-77. A compatible ANSI-C version (SVDPACKC) is also available. SVDPACK and SVDPACKC implement Lanczos and subspace iteration-based methods for determining several of the largest singular triplets for large sparse matrices. The development of SVDPACK was motivated by the need to compute large-rank approximations to sparse term-document matrices from information retrieval applications such as Latent Semantic Indexing (described at the left). SVDPACKC is now used in in the InfoMap project developed in the Computational Semantics Laboratory at Stanford University.
IMP: The Integrated Modeling Project (IMP) sponsored by the Environmental Impacts Program of the USDA Forest Service is an integrated forest health and productivity assessment of southern and southeastern forests in relation to changing climate, air quality, and land use changes. The primary research focus of Prof. Michael W. Berry and Research Associate Karen S. Minser (Dept. of Computer Science) is the development of a problem-solving environment or PSE which facilitates the horizontal integration of forest responses to environmental stresses and disturbances through the use of micro-scale cellular automata.

ICAT: The Interactive Cluster Analysis Toolkit (or ICAT) utilizes the Enhanced Hoshen-Kopelman algorithm to provide a highly adaptable method for cluster analysis. Within the context of diabetic retinopathy, different neighborhood rules implemented within ICAT provide better approaches for classifying retinal features such as neovascularization and exudates. The flexible design of ICAT allows new metrics for characterizing cluster geometry or new neighborhood rules for cluster identification to be easily incorporated.

RSim: A Regional Simulation model (RSim) designed to integrate environmental effects of on-base military training testing as well as off-base development. Effects considered include air and water quality, noise, and habitats for endangered and game species. A risk assessment approach is being used to determine impacts of single and integrated risks. The RSim simulation will eventually be available on the Web and will be used in a gaming mode so that users can explore repercussions of military and land-use decisions. RSim is currently being developed for the region around Fort Benning, Georgia but is broadly applicable. This project is sponsored by the Strategic Environmental Research & Development Program (SERDP) -- an initiative funded by the U.S. Deparments of Energy and Defense and the U.S. Environmental Protection Agency (EPA). A user interface for RSim is under current development.

LUCAS: Land-Use Change Analysis System for the simulation of land-cover changes on a heterogeneous (distributed) computing environment. LUCAS generates new maps of land cover representing the amount of land-cover change so that issues such as biodiversity conservation, assessing the importance of landscape elements to meet conservation goals, and long-term landscape integrity can be addressed.
Whole Genome Phylogeny: As whole genome sequences continue to expand in number and complexity, effective methods for comparing and categorizing both genes and species represented within extremely large datasets are required. Current methods have generally utilized incomplete (and likely insufficient) subsets of the available data even as additional data becomes available at a rapid rate. In collaboration with Prof. Gary Stuart at Indiana State University, an accurate and efficient method for producing robust gene and species phylogenies using very large whole genome protein datasets has been developed. This method relies on multidimensional protein vector definitions supplied by the singular value decomposition (SVD) of large sparse data matrices in which each protein is uniquely represented as vector of overlapping tetrapeptide frequencies. Link above is to presentation slides shown on March 23 at the UT-ORNL Bioinformatics Summit 2002, and an updated presentation was made at a Indiana Univ. School of Informatics Colloquim on Nov. 14, 2003 (audio/slides).

SGO: Understanding the functional relationship between genes remains to be a major challenge in interpretation of genomic data. Bioinformatics tools to automate extraction and utilization of gene information from the biological databases and the scientific literature are being developed. We present a new software environment called Semantic Gene Organizer © (SGO) which utilizes Latent Semantic Indexing (LSI), a concept-based vector space model, to automatically extract gene relationships from titles and abstracts in MEDLINE citations.

FAUN: We have develop a Web-based bioinformatics tool called Feature Annotation Using Nonnegative matrix factorization (FAUN) to facilitate both the discovery and classification of functional relationships among genes. Both the computational complexity and parameterization of nonnegative matrix factorization (NMF) for processing gene sets are currently being investigated. FAUN has been tested on several manually constructed gene collections (size ranging from 50 to 800 genes) and has been particularly engineered to analyze several microarray-derived gene sets obtained from studies of the developing cerebellum in normal and mutant mice. FAUN provides utilities for collaborative knowledge discovery and identification of new gene relationships from text streams and repositories (e.g., MEDLINE). It is particularly useful for the validation and analysis of gene associations suggested by microarray experimentation.

GST Retreat Poster (March 14, 2008, 4.7MB ppt) New!
BioPerl Links: (July 2005)


Encyclopedia of Computer Science and Engineering: Dr. Michael W. Berry is serving as the Applications area editor of the Encyclopedia of Computer Science and Engineering (Wiley Interscience) which is being edited by Prof. Benjamin Wah at the University of Illinois at Urbana-Champaign. Publication anticipated for 2004.
Three-Day Seminar Course on Information Retrieval, Facultad de Matemátics Universidad Autónoma de Yucatán (UADY) Mérida, México, March 10-12,2004.
News Release. All links are powerpoint files (password protected).

 
spacer