 |
|
Current theme is ut
Change to:
|
 |
|
|
|
 |
|
|
Computational Science Projects |
|
 |
 |
|
Home > Computational Science Projects |
Latent Semantic Indexing
(or LSI) is a concept-based information retrieval model. Terms and documents
are both encoded for vector space representation so that documents may be
clustered (semantically) near each other yet share no common terms. LSI
addresses the two fundamental problems which plague traditional
lexical-matching indexing schemes: synonymy and polysemy.
Content Analyst Company, LLC owns the original patent to LSI:
Computer information retrieval using latent semantic structure
U.S. Patent No. 4,839,853, June 13, 1989.
|
SVDPACK
comprises four numerical (iterative) methods for computing the singular
value decomposition (SVD) of large sparse matrices using double precision ANSI
Fortran-77. A compatible ANSI-C version (SVDPACKC) is also available. SVDPACK
and SVDPACKC implement Lanczos and subspace iteration-based methods for
determining several of the largest singular triplets for large sparse matrices.
The development of SVDPACK was motivated by the need to compute large-rank
approximations to sparse term-document matrices from information
retrieval applications such as Latent Semantic Indexing
(described at the left). SVDPACKC is now used in
in the InfoMap project
developed in the Computational Semantics Laboratory at
Stanford University.
|
IMP:
The Integrated Modeling Project (IMP) sponsored by the
Environmental Impacts Program of the USDA Forest Service
is an integrated forest health and productivity
assessment of southern and southeastern forests in relation to
changing climate, air quality, and land use changes. The
primary research focus of Prof. Michael W. Berry and Research
Associate Karen S. Minser (Dept. of Computer Science)
is the development of a problem-solving environment or PSE
which facilitates the horizontal integration of
forest responses to environmental stresses and disturbances
through the use of micro-scale cellular automata.
ICAT:
The Interactive Cluster
Analysis Toolkit (or ICAT)
utilizes the Enhanced Hoshen-Kopelman algorithm to provide a highly adaptable
method for cluster analysis. Within the context of diabetic retinopathy,
different neighborhood rules implemented within ICAT
provide better approaches for
classifying retinal features such as neovascularization and
exudates. The flexible design of ICAT allows
new metrics for characterizing cluster geometry or new neighborhood rules
for cluster identification to be easily incorporated.
RSim:
A Regional Simulation model (RSim) designed
to integrate environmental effects of on-base military training
testing as well as off-base development. Effects considered include
air and water quality, noise, and habitats for endangered and game
species. A risk assessment approach is being used to determine
impacts of single and integrated risks. The RSim simulation
will eventually be available on the Web and will be used in a
gaming mode so that users can explore repercussions of
military and land-use decisions. RSim is currently being developed
for the region around Fort Benning, Georgia but is broadly applicable.
This project is sponsored by the
Strategic Environmental Research &
Development Program (SERDP) -- an initiative funded by the
U.S. Deparments of Energy and Defense and the U.S. Environmental
Protection Agency (EPA). A
user interface
for RSim is under current development.
LUCAS:
Land-Use Change Analysis System for the simulation of land-cover changes
on a heterogeneous (distributed) computing environment. LUCAS generates
new maps of land cover representing the amount of
land-cover change so that issues such as biodiversity
conservation, assessing the importance of landscape
elements to meet conservation goals, and long-term
landscape integrity can be addressed.
|
Whole Genome Phylogeny:
As whole genome sequences continue to expand in number and
complexity, effective methods for comparing and categorizing both genes
and species represented within extremely large datasets are required.
Current methods have generally utilized incomplete (and likely
insufficient) subsets of the available data even as additional data
becomes available at
a rapid rate. In collaboration with Prof. Gary Stuart at Indiana
State University, an accurate and efficient method for
producing robust gene and species phylogenies using very large whole genome
protein datasets has been developed.
This method relies on multidimensional protein vector
definitions supplied by the singular value decomposition (SVD) of
large sparse data matrices in which each protein is uniquely represented as
vector of overlapping tetrapeptide frequencies. Link above is to
presentation slides shown on March 23 at the
UT-ORNL
Bioinformatics Summit 2002, and an updated presentation
was made at a
Indiana Univ.
School of Informatics Colloquim on Nov. 14, 2003 (audio/slides).
SGO:
Understanding the functional relationship between genes remains to be a
major challenge in interpretation of genomic data. Bioinformatics tools
to automate extraction and utilization of gene information from the
biological databases and the scientific literature are being developed.
We present a new software environment called Semantic Gene Organizer
© (SGO) which utilizes Latent Semantic Indexing (LSI), a
concept-based vector space model, to automatically extract gene
relationships from titles and abstracts in MEDLINE citations.
FAUN:
We have develop a Web-based bioinformatics tool called Feature
Annotation Using Nonnegative matrix factorization
(FAUN) to facilitate
both the discovery and classification of functional relationships among
genes. Both the computational complexity and parameterization of
nonnegative matrix factorization (NMF) for processing gene sets are
currently being investigated.
FAUN has been tested on several manually constructed gene
collections (size ranging from 50 to 800 genes) and has been particularly
engineered to analyze several microarray-derived gene sets obtained
from studies
of the developing cerebellum in normal and mutant mice.
FAUN provides
utilities for collaborative knowledge discovery and identification of new
gene relationships from text streams and repositories (e.g., MEDLINE). It
is particularly useful for the validation and analysis of gene
associations suggested by microarray experimentation.
GST
Retreat Poster (March 14, 2008, 4.7MB ppt)
BioPerl Links: (July 2005)
Encyclopedia of
Computer Science and Engineering:
Dr. Michael W. Berry is
serving as the Applications area editor of the Encyclopedia of
Computer Science and Engineering (Wiley Interscience) which is
being edited by
Prof. Benjamin Wah
at the University of Illinois at Urbana-Champaign. Publication
anticipated for 2004.
|
Three-Day Seminar Course on Information Retrieval,
Facultad de
Matemátics Universidad Autónoma de
Yucatán (UADY) Mérida, México,
March 10-12,2004. News Release.
All links are powerpoint files (password protected).
|
|  |