DAME Project overview

project road map

project working group

In Astrophysics as well as in many (if not all) fields of human endeavour, over the last decade the data mining research has been driven by the advance of database technology and the creation of huge datasets.

The growth of “visual and/or numerical analytics”, has emphasized the danger of information overload and the need to harness new technology, such as high resolution and multivariate analysis systems, in order to maintain overview and control of the data.

Despite these advances, the problem of finding suitable methods for the exploration and analysis of large astronomical datasets remains formidable. The emphasis is on methods of bridging the gap between accurate representations and mining of the data and the capabilities of current technology and users. The analytical methods based partially on statistical random choices (crossover/mutation) and on knowledge experience acquired (supervised and/or unsupervised adaptive learning) could realistically achieve the discovery of hidden laws behind focused phenomena, often based on nature laws.

DAME is a Data Mining suite tailored to work on huge datasets. The main guidelines to the problem solving are showed below.

In other words, DAME is a web oriented and VO aware suite. In order to perform data mining and exploration experiments, we have considered a top-down strategy, starting from the taxonomy of data mining and research functionalities which are associated to specific algorithms and processing methods. In the first release, the suite offers tools for the following functionalities:

Main features of the complete package can be summarised as follows:
  • Object Oriented Programming & UML
  • Internal standards and protocols (XML)
  • Java language (generic for DMM)
  • User/Session Registry DB (MySQL)
  • Web-based User I/O (GWT-Ext)
  • Web Application and Web Service Technology
  • Plugin Modularity (easy to be integrated/modified)
  • Hardware independent through platform driver
  • Data conversion and manipulation support

The project is based on five main components: Front End (FE), Framework (FW), Registry and Data Base (REDB), Driver (DR) and Data Mining Models (DMM). The following scheme shows the component diagram of the entire suite with their main interface/information exchange layout.

diagram

The algorithms and methods we decided to explore for functionality problem solving are collected in the next picture, where the relation functionality-model is also reported. They include both already implemented models and next foreseen release evolutions.