Viet Ha-Thuc

[LtoR] Maojin Jiang, Lorretta Auvil, Wei Chen,
Raymond Pon, and Jane Mason

Hyatt Regency Hotel
Atlanta, GA

April 26, 2008

[LtoR] Nikita Lytkin, Ziqiu Su, Anirban Chatterjee,
and Andrea Tagarelli
to be held in conjunction with
Eighth SIAM International Conference on Data Mining (SDM 2008)

Topics of interest | Registration | Submission Requirements | Important Dates
Program | Program Committee | Organizational Committee | Sponsors


History of the Workshop

This is the sixth in the series of Text Mining workshops held in conjunction with SDM. Previous ones have taken place in 2001, 2002, 2003, 2006, and 2007, and Last year in Minneapolis, 31 authors representing industry, academia and national research laboratories from 8 different countries submitted a total of 12 papers. After careful review, 7 papers and were selected for publication and presentation. In addition, NASA Ames sponsored a text mining competition based on anomaly detection using documents from the Airline Safety Reporting System (ASRS). Photos shown above and below are from the 2007 SDM Text Mining Workshop.

General Topics

The proliferation of digital computing devices and their use in communication has resulted in an increased demand for systems and algorithms capable of mining textual data. Thus, the development of techniques for mining unstructured, semi-structured, and fully structured textual data has become quite important in both academia and industry. As a result, this Workshop will survey the emerging field of Text Mining - the application of techniques of machine learning in conjunction with natural language processing, information extraction and algebraic/mathematical approaches to computational information retrieval. Many issues are being addressed in this field ranging from the development of new learning approaches to the parallelization of existing algorithms. The goal of this workshop is to provide a venue for researchers to share initial approaches and preliminary results of recent research in Text Mining. Through the careful selection and review of submitted workshop papers, we hope to provide a suitable selection of topics that will both generate interest and provide insight into the state of the field of Text Mining.

Special Topics - Text Mining with the Enron Data Set and VAST 2007 Contest Data

Because of the continued interest generated from the availability of the Enron data set of 1.3 million email messages (see Enron Email Dataset) and its versatility in terms of potential research topics (link analysis, pattern matching), researchers are encouraged to submit papers to this workshop. In addition, the dataset of news stories and blog entries used in the IEEE Symposium on Visual Analytics Science and Technology (VAST) 2007 Contest is an interesting corpus for research in topic detection/tracking, role playing, and scenario analysis (see VAST 2007 Contest for more details on this dataset). Researchers whose work is more focused on social networking models of the Enron and VAST-2007 datasets should contact the organizers of the SDM Link Analysis (SLA) Workshop. With the authors' permission, a paper may be re-assigned to the SLA workshop (especially if the Program Committee makes the recommendation based on the content of the paper).

Other Specific Topics of Interest Include:

    Algorithms and Models

  • Bayesian Models
  • Concept Decomposition
  • Orthogonal Decomposition
  • Probabilistic Models
  • Vector Space Models
  • Latent Semantic Indexing
  • Graph-based Models
  • Text Streaming Models
    Applications
  • Clustering
  • Factor Analysis
  • Visualization Techniques
  • Metadata Generation
  • Information Extraction
  • Text Classification
  • Text Purification
  • Text Segmentation
  • Text Summarization
  • Query Structures
  • Trend Detection
  • Distributed Storage and Retrieval

Registration

Attendees are required to register for SDM 2008 so that no separate registration is needed for this workshop.
New for SDM08: a one-day registration for the conference. Workshop attendees do not have to register at the complete conference rate.
Click here for more details.


Submission Requirements

To submit a paper, upload your paper in PDF format (Papers should be printable on 8.5 × 11 paper only and be roughly 10 pages in length using a 11pt font in two-column font with 1 inch margins) by accessing the review system via http://www.cs.utk.edu/TextMiningPapers.

In the Authors section you will find the instructions:

1. Use the abstract submission interface to provide the main information
on your paper. You will be given an id/password which must later be used
to access the system during the following steps, so save the login information message that you will receive from the system.

2. Once an abstract has been submitted, you can upload your paper.

To guarantee consideration, manuscripts must be received by January 11, 2008 deadline has passed.
Submission of work in progress is also encouraged.


Important Dates

Papers Due: January 11, 2008 deadline has passed

Notifications sent: January 25, 2008 notifications have been made.

Camera ready: Final Papers due to workshop: February 11, 2008 all papers have been submitted to SIAM



Keynote speaker: Loretta Auvil, NCSA, University of Illinois at Urbana-Champaign (Link to Slides)
Title of Presentation: Engineering Knowledge for the Humanities
Abstract: Over the last decade NCSA's Automated Learning Group has innovated data mining technologies for industry, government, and the sciences. In the past few years, we have broadened our focus to include knowledge discovery in the humanities. My presentation will focus on how we are negotiating humanities computing's special challenges for data mining and analysis. I will discuss our early collaborative projects, FeatureLens and Nora, and SEASR (Software Environment for the Advancement of Scholarly Research), the Andrew W. Mellon Foundation-funded project we are now leading. Each of these projects has developed technologies customized to meet specific needs of the digital humanities community. FeatureLens--an early MONK (Metadata Offer New Knowledge) application--uses the machine learning approach of frequent pattern mining to identify fuzzy repetition patterns in a data collection, and with no initial human input. Nora--a case study for eighteenth- and nineteenth-century British and American literature--uses predictive modeling techniques to classify documents, even given complex and notoriously indistinct expert classes such as sentimental fiction. SEASR is our most ambitious project yet, employing a semantic-based, service-oriented architecture to build software bridges that allow users to access data stored in disparate formats and on incompatible platforms and to provide an enhanced environment for workflow and data sharing. The essential infrastructure SEASR provides will advance the capabilities of projects like our partner, MONK, a digital environment designed to help humanities scholars discover and analyze patterns.

Biography: Loretta Auvil directs the Automated Learning Group, a unit of the Data-Intensive Technology and Applications division at the National Center for Supercomputing Applications. After receiving an M.S. in Computer Science from Virginia Tech in 1992, Loretta spent several years creating tools for visualizing performance data of parallel computer programs at Rome Laboratory and Oak Ridge National Laboratory. At NCSA Loretta worked on the team that developed D2K (Data to Knowledge). For nearly ten years, she has served on the IEEE Visualization Conference Committee. Her main research interests are data mining and information visualization approaches for multi- dimensional data.

Program in PDF format (posted February 5, 2008)
Workshop Poster (posted April 8, 2008)

Sponsors: BeliefNetworks Inc. of Charleston, SC and SAS Institute Inc. of Cary, NC.


Program Committee

Co-Chairs: Michael W. Berry, University of Tennessee and Jacob Kogan, University of Maryland, Baltimore County

Devasis Bassu, Telcordia Technologies, Inc.
Roger Bilisoly, Central Connecticut State University
Malu Castellanos, Hewlett-Packard Laboratories
Carlotta Domeniconi, George Mason University
Kyle Gallivan, Florida State University
Efstratios Gallopoulos, University of Patras, Greece
Efim Gendler, iboogie.tv
Mei Kobayashi, IBM Tokyo Research Lab
Andrew Knyazev, University of Colorado at Denver
Peg Howland, Utah State University
April Kontostathis, Ursinus University

Choudur Lakshminarayan, Hewlett-Packard Laboratories
Mark Last, Ben-Gurion University, Israel
Bill Pottenger, DIMACS, Rutgers
Padma Raghavan, Penn State University
Stephen Soderland, University of Washington
Michael Steinbach, University of Minnesota
Shi Zhong, Yahoo!


Organizational Committee

Co-Chairs:
Michael W. Berry
Department of Electrical Engineering & Computer Science
203 Claxton Complex
University of Tennessee
Knoxville, TN 37996-3450
Phone: (865) 974-3838
Fax: (865) 974-4404
berry AT eecs DOT utk DOT edu

Jacob Kogan
Department of Mathematics and Statistics
University of Maryland, Baltimore County
Baltimore, MD 21250
Phone: (410) 455-3297
Fax: (410) 455-1066
kogan AT math DOT umbc. DOT edu




Last modified on April 29, 2008