History of the Workshop
This is the sixth in the series of Text Mining workshops held in conjunction with SDM. Previous ones have taken place in 2001, 2002, 2003, 2006, and 2007, and Last year in Minneapolis, 31 authors representing industry, academia and national research laboratories from 8 different countries submitted a total of 12 papers. After careful review, 7 papers and were selected for publication and presentation. In addition, NASA Ames sponsored a text mining competition based on anomaly detection using documents from the Airline Safety Reporting System (ASRS). Photos shown above and below are from the 2007 SDM Text Mining Workshop.
General Topics
The
proliferation of digital computing devices and their use in
communication has resulted in an increased demand for systems and
algorithms capable of mining textual data. Thus, the development of
techniques for mining unstructured, semi-structured, and fully
structured textual data has become quite important in both academia and
industry. As a result, this Workshop will survey the emerging field of
Text Mining - the application of techniques of machine learning in
conjunction with natural language processing, information extraction
and algebraic/mathematical approaches to computational information
retrieval. Many issues are being addressed in this field ranging from
the development of new learning approaches to the parallelization of
existing algorithms. The goal of this workshop is to provide a venue
for researchers to share initial approaches and preliminary results of
recent research in Text Mining. Through the careful selection and
review of submitted workshop papers, we hope to provide a suitable
selection of topics that will both generate interest and provide
insight into the state of the field of Text Mining.
Special Topics - Text Mining with the Enron Data Set and VAST 2007 Contest Data
Because of the continued interest generated from the availability of the Enron data set of 1.3 million email messages (see Enron Email Dataset) and its versatility in terms of potential research topics (link analysis, pattern matching), researchers are encouraged to submit papers to this workshop. In addition, the dataset of news stories and blog entries used in the IEEE Symposium on Visual Analytics Science and Technology (VAST) 2007 Contest is an interesting corpus for research in topic detection/tracking, role playing, and scenario analysis (see VAST 2007 Contest for more details on this dataset). Researchers whose work is more focused on social networking models of the Enron and VAST-2007 datasets should contact the organizers of the SDM Link Analysis (SLA) Workshop. With the authors' permission, a paper may be re-assigned to the SLA workshop (especially if the Program Committee makes the recommendation based on the content of the paper).
Other Specific Topics of Interest Include:
|
Attendees
are
required to register for SDM 2008 so that no separate registration is
needed for this workshop.
New for SDM08: a one-day registration for the conference.
Workshop attendees do not have to register at the complete
conference rate.
Click
here
for more details.
To
submit a paper, upload your paper in PDF format (Papers should be printable
on 8.5 × 11 paper only and be roughly 10 pages in length using a
11pt font in two-column font with 1 inch margins)
by accessing the review system via
http://www.cs.utk.edu/TextMiningPapers.
In the Authors section you will find the instructions:
1. Use the abstract submission interface to provide the main information
on your paper. You will be given an id/password which must later be used
to access the system during the following steps, so save the login information
message that you will receive from the system.
2. Once an abstract has been submitted, you can upload your paper.
To guarantee consideration,
manuscripts must be received by
January 11, 2008 deadline has passed.
Submission of work in progress is
also encouraged.
Papers
Due: January 11, 2008
deadline has passed
Notifications
sent: January 25, 2008
notifications have been made.
Camera
ready: Final Papers due to workshop: February 11, 2008
all papers have been submitted to SIAM
Title of Presentation: Engineering Knowledge for the HumanitiesProgram in PDF format (posted February 5, 2008)
Abstract: Over the last decade NCSA's Automated Learning Group has innovated data mining technologies for industry, government, and the sciences. In the past few years, we have broadened our focus to include knowledge discovery in the humanities. My presentation will focus on how we are negotiating humanities computing's special challenges for data mining and analysis. I will discuss our early collaborative projects, FeatureLens and Nora, and SEASR (Software Environment for the Advancement of Scholarly Research), the Andrew W. Mellon Foundation-funded project we are now leading. Each of these projects has developed technologies customized to meet specific needs of the digital humanities community. FeatureLens--an early MONK (Metadata Offer New Knowledge) application--uses the machine learning approach of frequent pattern mining to identify fuzzy repetition patterns in a data collection, and with no initial human input. Nora--a case study for eighteenth- and nineteenth-century British and American literature--uses predictive modeling techniques to classify documents, even given complex and notoriously indistinct expert classes such as sentimental fiction. SEASR is our most ambitious project yet, employing a semantic-based, service-oriented architecture to build software bridges that allow users to access data stored in disparate formats and on incompatible platforms and to provide an enhanced environment for workflow and data sharing. The essential infrastructure SEASR provides will advance the capabilities of projects like our partner, MONK, a digital environment designed to help humanities scholars discover and analyze patterns.
Biography: Loretta Auvil directs the Automated Learning Group, a unit of the Data-Intensive Technology and Applications division at the National Center for Supercomputing Applications. After receiving an M.S. in Computer Science from Virginia Tech in 1992, Loretta spent several years creating tools for visualizing performance data of parallel computer programs at Rome Laboratory and Oak Ridge National Laboratory. At NCSA Loretta worked on the team that developed D2K (Data to Knowledge). For nearly ten years, she has served on the IEEE Visualization Conference Committee. Her main research interests are data mining and information visualization approaches for multi- dimensional data.
Co-Chairs:
Michael W. Berry, University of Tennessee
and Jacob Kogan, University of Maryland, Baltimore County
Devasis Bassu, Telcordia Technologies, Inc.
Roger Bilisoly,
Central Connecticut State University
Malu Castellanos, Hewlett-Packard Laboratories
Carlotta Domeniconi,
George Mason University
Kyle Gallivan,
Florida State University
Efstratios
Gallopoulos, University of Patras, Greece
Efim Gendler, iboogie.tv
Mei Kobayashi,
IBM
Tokyo Research Lab
Andrew Knyazev,
University of Colorado at Denver
Peg
Howland, Utah State University
April
Kontostathis, Ursinus University
Choudur
Lakshminarayan, Hewlett-Packard Laboratories
Mark Last,
Ben-Gurion University, Israel
Bill Pottenger, DIMACS, Rutgers
Padma
Raghavan, Penn State University
Stephen Soderland,
University
of Washington
Michael Steinbach, University
of Minnesota
Shi Zhong, Yahoo!
Co-Chairs:
Michael W. Berry
Department of Electrical Engineering & Computer Science
203 Claxton Complex
University of Tennessee
Knoxville, TN 37996-3450
Phone: (865) 974-3838
Fax: (865) 974-4404
berry AT eecs DOT utk DOT edu
Jacob Kogan
Department of Mathematics and Statistics
University of Maryland, Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3297
Fax: (410) 455-1066
kogan AT math DOT umbc. DOT edu
