gcs logo

Globular Cluster Search

 

Page Index

 

 

Inside DAME


DAta Mining & Exploration Program

Welcome on the GCS (Globular Cluster Search) Science Case page

The project is based on the application of a Neural Network technique (Multi Layer Perceptron trained by the Quasi-Newton learning rule) to show that it is possible to effectively identify Globular Clusters in external galaxies, using single-band photometry and marginally resolved images. The experiments are performed through the DAME Web Application Suite.

The scientific problem

The study of Globular Clusters populations in external galaxies requires the use of wide-field, multi-band photometry. However to minimize contamination problems and to measure some of the GC properties, such as sizes and structural parameters (core radius, concentration, binary formation rates) high-resolution data are required as well, which are only available through the use of space facilities (i.e. HST).

The use of HST data is however challenging since the optimal dataset should be  deep, multi-band and with wide-field coverage in order to minimize projection effects, as well as to study the overall properties of the GC populations, which often differ from those inferred from observations of the central region of a galaxy.

The use of single-band HST data reduces the cost (in terms of observing time) of such studies, and can be eventually integrated with ground-based photometry in other bands to obtain the required color informations.

In this project we intend to show that even the use of single band photometry can yield very complete datasets with low contamination, through the use of Neural Network (Multi Layer Perceptron trained by a Quasi Newton rule) algorithm. This approach will minimize the observing time requirements, thus allowing to extend such studies to large areas and to the outskirts of nearby galaxies, thus minimizing the observational biases in studies where a very complete dataset is required, such as the study of Low Mass X-ray Binaries in GCs.

 

back to top page

The Data Set

The dataset used in this experiment consists in wide field HST observations of the giant elliptical NGC1399 in the Fornax cluster. This galaxy represents an ideal test case since, due to it's distance (20 Mpc), it is possible to cover a large fraction of its GC system (out to >5 Re) with a limited number of observations. Furthermore at this distance GC are only marginally resolved even by HST, allowing to verify our experiment in a worst-case scenario.

The optical data were taken with the HST Advanced Camera for Surveys (ACS, program GO-10129), in the F606W filter, with integration time of 2108 seconds for each field. The observations were arranged in a 3x3 ACS mosaic, and combined into a single image using the MultiDrizzle routine (Koekemoer et al. 2002). The final scale of the images is 0.03”/pix, providing Nyquist sampling of the ACS PSF. The field of view of the ACS mosaic covers ~100 square arcmin and extends out to a projected galactocentric distance of ~55 kpc, i.e. 4.9 r_e of the GC system (~5.7 r_e^gal).

Source catalogs were generated with SExtractor, requiring a minimum area of 20 pixels and reaching a 7 sigma depth of m_V=27.5, i.e. ~4 mag below the GC luminosity function turnover, thus sampling the entire GC population.. The catalog astrometric solution was registered to the USNO-B1 reference frame, obtaining a final accuracy of 0.2" r.m.s. Since no complete color catalog was available for the whole field, GC candidates were selected based on their magnitude and morphology, choosing sources with stellarity index >0.9 and m_V<26. In fact the distribution follows the GC luminosity function down to mV=26; at fainter magnitudes background unresolved sources dominate the overall distribution.

The final catalog used for this experiment thus contains photometric and morphological parameters for 2100 sources:

  • isophotal magnitude;
  • kron radius;
  • aperture magnitudes within a 2, 6 and 20 pixels (0.06", 0.18" and 0.6") diameter;
  • ellipticity;
  • position angle;
  • FWHM;
  • SExtractor stellarity index;

In addition for these sources we were able to measure structural parameters, fitting King surface brightness profile models with the Galfit software (Peng et al. 2002),
deriving:

  • tidal
  • core
  • effective radii
  • central surface brightness

The accuracy of these measurements was estimated simulating artificial GCs with the Multiking code (available at this page) specifically written to account for field distortion, PSF variation, dithering pattern.

In addition we use two multi-band datasets to obtain color informations for part of our sources that are needed in order to train our algorithms and validate the results: an HST/ACS dataset covering the central region of the galaxy in the g and z filters (Kundu et al. 2005), and a lower resolution ground-based dataset in C and R covering the entire galaxy (Bassino et al. 2006).

back to top page

The Data Mìning Model (MLP-QNA)

As typical in DAME Program, we decided to investigate the above scientific case as a data mining problem on Massive Data Sets (MDS), i.e. by using a mathematical model based on an automatic learning of information, correlations and significant features from huge datasets (bases ok knowledge) related to processes of wide different nature. This approach is basically motivated by the need to analyze and understand complex phenomena, often described by only partial or completely not explicit information in their parameter space.

The neural network model adopted in this experiment is the Multi Layer Perceptron (MLP). Its architecture is one of the most typical feed-forward neural network model. The term feed-forward is used to identify basic behavior of such neural models, in which the impulse is propagated always in the same direction, e.g. from neuron input layer towards output layer, through one or more hidden layers (the network brain), by combining weighted sum of weights associated to all neurons (except the input layer).

As easy to understand, the neurons are organized in layers, with proper own role. The input signal, simply propagated throughout the neurons of the input layer, is used to stimulate next hidden and output neuron layers. The output of each neuron is obtained by means of an activation function, applied to the weighted sum of its inputs. Different shape of this activation function can be applied, from the simplest linear one up to sigmoid. The number of hidden layers represents the degree of the complexity achieved for the energy solution space in which the network output moves looking for the best solution. As an example, in a typical classification problem, the number of hidden layers indicates the number of hyper-planes  used to split the parameter space (i.e. number of possible classes) in order to classify each input pattern. What is different in such a neural network architecture is typically the learning algorithm used to train the network. It exists a dichotomy between supervised and unsupervised learning methods. In the presente case we refer to supervised class.

The network must be firstly trained (training phase), in which the input patterns are submitted to the network as couples (input, desired known output). The feed-forward algorithm is then achieved and at the end of the input submission, the network output is compared with the corresponding desired output in order to quantify the learning quote. It is possible to perform the comparison in a batch way (after an entire input pattern set submission) or incremental (the comparison is done after each input pattern submission) and also the metric used for the distance measure between desired and obtained outputs, can be chosen accordingly problem specific requirements (in the MLP-BP the MSE, Mean Square Error, is used).
After each comparison and until a desired error distance is unreached (typically the error tolerance is a pre-calculated value or a constant imposed by the user), the weights of hidden layers must be changed accordingly to a particular law or learning technique.

After the training phase is finished (or arbitrarily stopped), the network should be able not only to recognize correct output for each input already used as training set, but also to achieve a certain degree of generalization, i.e. to give correct output for those inputs never used before to train it. The degree of generalization varies, as obvious, depending on how “good” has been the learning phase. This important feature is realized because the network doesn’t associates a single input to the output, but it discovers the relationship present behind their association. After training, such a neural network can be seen as a black box able to perform a particular function (input-output correlation) whose analytical shape is a priori not known. In order to gain the best training, it must be as much homogeneous as possible and able to describe a great variety of samples. Bigger the training set, higher will be the network generalization capability.
Despite of these considerations, it should always taken into account that neural networks application field should be usually referred to problems where it is needed high flexibility (quantitative result) more than high precision (qualitative results).

Concerning the hidden layer choice, there is the possibility to define zero hidden layers (SLP, Single Layer Perceptron, able to solve only linear separation of the parameter space), 1 or 2 hidden layers, depending on the complexity the user wants to introduce in the not linear problem solving experiment.

The standard BP learning rule (Back Propagation) is replaced by an adapted version of the classical Newton method for optimization problems.
The Newton method is the general basis for a whole family of so called Quasi-Newton methods (QNA). One of those methods, implemented here is the L-BFGS algorithm, [4]. More rigorously, the QNA is an optimization of learning rule, also because, as described below, the implementation is based on a statistical approximation of the Hessian by cyclic gradient calculation, that, as said in the previous section, is at the base of BP method.
As known, the classical Newton method uses the Hessian of a function. The step of the method is defined as a product of an inverse Hessian matrix and a function gradient. If the function is a positive definite quadratic form, we can reach the function minimum in one step. In case of an indefinite quadratic form (which has no minimum), we will reach the maximum or saddle point. In short, the method finds the stationary point of a quadratic form.


In practice, we usually have functions which are not quadratic forms. If such a function is smooth, it is sufficiently good described by a quadratic form in the minimum neighborhood. However, the Newton method can converge both to a minimum and a maximum (taking a step into the direction of a function increasing).

Quasi-Newton methods solve this problem as follows: they use a positive definite approximation instead of a Hessian. If Hessian is positive definite, we make the step using the Newton method. If Hessian is indefinite, we modify it to make it positive definite, and then perform a step using the Newton method. The step is always performed in the direction of the function decrement. In case of a positive definite Hessian, we use it to generate a quadratic surface approximation. This should make the convergence better. If Hessian is indefinite, we just move to where function decreases.

Some modifications of Quasi-Newton methods perform a precise linear minimum search along the indicated line, but it is proved that it's enough to sufficiently decrease the function value, and not necessary to find a precise minimum value. The L-BFGS algorithm tries to perform a step using the Newton method. If it does not lead to a function value decreasing, it lessens the step length to find a lesser function value.
Up to here it seems quite simple…but it is not!

The Hessian of a function isn't always available and in many cases is too much complicated; more often we can only calculate the function gradient.
Therefore, the following operation is used: the Hessian of a function is generated on the basis of the N consequent gradient calculations, and the quasi-Newton step is performed. There is a special formulas which allows to iteratively get a Hessian approximation. On each step approximation, the matrix remains positive definite. The algorithm uses the L-BFGS update scheme. BFGS stands for Broyden-Fletcher-Goldfarb-Shanno (more precisely, this scheme generates not the Hessian, but its inverse matrix, so we don't have to waste time inverting a Hessian).
The L letter in the scheme name comes from the words "Limited memory". In case of big dimensions, the amount of memory required to store a Hessian (N^2) is too big, along with the machine time required to process it. Therefore, instead of using N gradient values to generate a Hessian we can use a smaller number of values, which requires a memory capacity of order of N·M. In practice, M is usually chosen from 3 to 7, in difficult cases it is reasonable to increase this constant to 20. Of course, as a result we'll get not the Hessian but its approximation. On the one hand, the convergence slows down. On the other hand, the performance could even grow up. At first sight, this statement is paradoxical. But it contains no contradictions: the convergence is measured by a number of iterations, whereas the performance depends on the number of processor's time units spent to calculate the result.

As a matter of fact, this method was designed to optimize the functions of a number of arguments (hundreds and thousands), because in this case it is worth having an increasing iteration number due to the lower approximation precision because the overheads become much lower. This is particularly useful in astrophysical data mining problems, where usually the parameter space is dimensionally huge and confused by a low signal-to-noise ratio. But we can use these methods for small dimension problems too. The main advantage of the method is scalability, because it provides high performance when solving high dimensionality problems, and it allows to solve small dimension problems too.

 

back to top page

Preliminary Results

work in progress

 

back to top page

Bibliography & References

  1. Jordán, Andrés et al., The ACS Virgo Cluster Survey XVI. Selection Procedure and Catalogs of Globular Cluster Candidates, The Astrophysical Journal Supplement, Volume 180, Issue 1, pp. 54-66, 2009;
  2. Paolillo, Maurizio et al., Probing the GC-LMXB Connection in NGC 1399: A Wide-Field Study with HST and CHANDRA, Draft version September 3, 2010 (in preparation);
  3. Brescia, Massimo, MLP with QNA model design and user manual, DAME Technical Documentation, mlpGP_DAME-MAN-NA-0008-Rel1.0, September 02, 2010;
  4. Byrd, R.H et al., Representations of Quasi-Newton Matrices and their use in Limited Memory Methods", Mathematical Programming, 63, 4, pp. 129-156, 1994;

 

back to top page

DAME is a collaboration between INAF OACN, Dip. of Physics Science of University Federico II and Caltech Institute aimed at designing and developing instruments and tools for scientific data mining, based on information and comunication technology, funded by the Italian Ministry of Foreign Affairs as well as by the European project VOTECH (Virtual Observatory Technological Infrastructures) and by the Italian PON-S.Co.P.E.