Data-mining technology is an increasingly popular way to search for patterns, correlations and trends within crime statistics, genomics data and other enormous amounts of information, and now UT Dallas researchers have created a repository of tools intended to further boost this young field.
The Data Mining Tool Repository provides researchers and developers with a number of useful data sets and tools, according to Dr. Latifur Khan, an associate professor of computer science, who developed the repository in conjunction with Dr. Mehedy Masud, a postdoctoral fellow, and students and other collaborators.
“The repository initially contains three tools that implement our algorithms as a service to the data-mining and security communities, preventing them from having to develop such tools from scratch,” Khan said.
The first tool implements malware code detection, the second tool describes novel class detection for stream data, and the third describes stream data classification with limited labeled data. More tools will be added as they become available, including a privacy-preserving data-mining toolkit and a security tool for cloud computing.
Data-mining research at UT Dallas is conducted at the University’s Cyber Security Research Center, which turned 5 years old in October. The center has won more than $10 million in research funds since its creation, including grants from the National Science Foundation, the Air Force Office of Scientific Research, the Intelligence Advanced Research Projects Activity, the Office of Naval Research, the National Geospatial Intelligence Agency, NASA and the National Institutes of Health.
The Cyber Security team is also collaborating closely with several corporations. It has developed a string of partnerships with Raytheon and is developing a collaborative relationship with two IBM research centers as well as Rockwell Collins, Lockheed Martin and L-3 Communications. The team is also working with federal research labs such as the National Institute of Standards and Technology on the National Vulnerability Database project.
“We are developing a holistic approach to research,” said Dr. Bhavani Thuraisingham, the director of the center. “Our primary goal is to solve hard problems for our sponsors and publish papers in premier journals and at important conferences. At the same time, we also develop software tools, including open-source tools, and contribute to standards such as the Open Geospatial Consortium. And our students’ PhD dissertations regularly become books that we can use in our graduate classes.”
Khan and his team have also developed packet anonymization tools that have been released open source. The team has also made contributions to government projects such as the intelligence community’s Blackbook and open source tools such as JENA from HP Labs.
Other members of the Cyber Security team include Dr. Murat Kantarcioglu, a recent NSF Career Award winner, and Dr. Kevin Hamlen, a recent Air Force Career Award recipient.
To celebrate its fifth anniversary, the Cyber Security team is organizing the University’s second Cyber Security Symposium in April 2010.
“The first symposium was held in April 2005, and we are very pleased to show the world what we have accomplished in five years,” Thuraisingham said.
The Data Mining Tool Repository can be found at dml.utdallas.edu/Mehedy/.
Source: UT Dallas