Keynote Speaker |
![]() Ashok N. Srivastava NASA Ames Research Center |
The Needle in the Haystack Problem: Discovering Recurring Anomalies in Text Documents |
Abstract: An important problem that faces many governmental and industrial organizations is that of discovering the description of a recurring phenomenon in text documents. In many applications, the recurring phenomenon has a low frequency of occurrence, thus complicating its discovery. We call such low-frequency events that tend to co-occur 'recurring anomalies'. Conventional text mining methods tend to overlook these low-frequency events. The problem of discovering recurring anomalies arises in numerous application domains including fraud, counter-terrorism and security, analysis of complex systems, and warranty and maintenance reports. This talk describes the problem in some detail from a mathematical perspective and then discusses the past and current work in the field. We compare the performance of several existing methods and novel text mining methods that we have developed on text reports regarding complex aerospace systems. Biography: Ashok N. Srivastava, Ph.D. has seventeen years of research, development, and consulting experience in data analysis, machine learning, and data mining with applications in business and scientific domains. He currently co-manages the Discovery and System Health research department and leads the Data Mining group at NASA Ames Research Center. Dr. Srivastava represented NASA in a Homeland Security Presidential Directive (HSPD-11) regarding terrorist-related screening procedures. He also led numerous data mining projects in banking, e-commerce, finance, and retail industry prior to joining NASA. His research interests include kernel machines, text analysis, and representations of high-dimensional data with applications in a diverse set of areas. |