public documents.sextractor_doc

[/] [detect_filter.tex] - Blame information for rev 33

Details | Compare with Previous | View Log

Line No. Rev Author Line
1 19 gam
\section{Filtering}
2
 
3
\gam{Filtering suggests that some data will be left out. Shouldn't we call
4
  this ``Smoothing'' instead?}
5
 
6
\subsection{Convolution}
7
\label{filter-conv}
8
Detectability is generally limited at the faintest flux levels by the background noise.
9
The power-spectrum of the noise and that of the superimposed signal can be significantly different.
10 25 gam
Some \index{gain} gain in the ability to detect sources may therefore be obtained simply through
11
appropriate \index{linear filtering} linear filtering of the data, prior to \index{segmentation} segmentation. In low density fields,
12
an optimal \index{convolution} convolution kernel $h$ (``matched filter'') can be found that maximizes
13
detectability. An estimator of detectability is for instance the \index{signal-to-noise ratio} signal-to-noise ratio
14 19 gam
at source position $(x_0,y_{\,0}) \equiv (0,0)$:
15
\begin{equation}
16
\left[ \frac{\rm S}{\rm N}\right)^2 \equiv \frac{\left( (s * h)(x_0,y_{\,0}) \right]^2}
17
{\overline{(n * h)^2}}\,,
18
\end{equation}
19 25 gam
where $s$ is the signal to be detected, $n$ the noise, and `$*$' the \index{convolution} convolution operator.
20 19 gam
Moving to Fourier space, we get:
21
\begin{equation}
22
\left( \frac{\rm S}{\rm N}\right)^2 = \frac{\left(\int{{\cal S}{\cal H}\,d\omega}\right)^2}
23
                                        {\int{|{\cal N}|^2 |{\cal H}|^2\,d\omega}}\,,
24
\end{equation}
25 25 gam
where ${\cal S}$ and ${\cal H}$ are the \index{Fourier-transforms} Fourier-transforms of $s$ and $h$, respectively, and
26 19 gam
$|{\cal N}|^2$ is the power-spectrum of the noise.
27
\gam{This equation seems dimensionally correct only if $\omega$ is dimensionless.}
28 25 gam
Remarking, using \index{Schwartz inequality} Schwartz inequality, that
29 19 gam
\begin{equation}
30
\label{eq:schwartz1}
31
\left|\int{{\cal S}{\cal H}\, d\omega}\right|^2 \leq
32
                \int{\frac{|{\cal S}|^2}{|{\cal N}|^2} d\omega} \, \int{|{\cal N}|^2 |{\cal H}|^2
33
                d\omega}\,,
34
\end{equation}
35
we see that
36
\begin{equation}
37
\label{eq:schwartz2}
38
\left( \frac{\rm S}{\rm N}\right)^2 \leq \int{\frac{|{\cal S}|^2}{|{\cal N}|^2} d\omega}\,.
39
\end{equation}
40
Equality (maximum S/N) in (\ref{eq:schwartz1}) and (\ref{eq:schwartz2}) is achieved for
41
\begin{equation}
42
\frac{\cal S}{|{\cal N}|} \propto |{\cal N}| {\cal H}^*\,,\, {\rm that\, is}
43
\end{equation}
44
\begin{equation}
45
\label{eq:conv}
46
{\cal H} \propto \frac{{\cal S}^*}{|{\cal N}|^2}.
47
\end{equation}
48 25 gam
In the case of white noise (a valid approximation for many astronomical \index{image} images, especially
49
\index{CCD} CCD ones), $|{\cal N}|^2 = \rm cst$; the optimal \index{convolution} convolution kernel for detecting \index{stars} stars is
50
then the \emph{point spread function}\footnote{The \index{PSF} PSF is the \index{convolution} convolution of
51
  the instrumental \index{PSF} PSF and the atmospheric seeing.} (PSF) flipped over the $x$ and $y$ directions. It may also be described as the
52 19 gam
cross-correlation with the template of the sources to be detected (for more
53
details see e.g., \cite{bijaoui:dantel:1970}) \gam{missing a recent book citation here}.
54
 
55
There are of course a few problems with this method. First of all,
56
many sources of unquestionable interest, like galaxies, appear in a variety of shapes and scales
57 25 gam
on astronomical \index{image} images.
58 19 gam
A perfectly optimized detection routine should ultimately apply all relevant
59 25 gam
\index{convolution} convolution kernels one after the other in order to make a complete catalogue. Approximations
60
to this approach are the (isotropic) \index{wavelet} wavelet analysis mentioned earlier, or the more empirical
61 33 gam
\index{ImCat} ImCat algorithm (Kaiser \etal 1995),  both of which assume that
62
the sources are
63 25 gam
reasonably round. The impact on \index{memory} memory usage and processing speed of such refinements is currently
64 19 gam
judged too severe to be applied in {\sc SExtractor}. Simple filtering does a good job in general:
65 25 gam
the topological constraints added by the \index{segmentation} segmentation process make the detection somewhat tolerant
66 19 gam
towards larger objects. Extended, very Low-Surface-Brightness (LSB) features found in astronomical
67 25 gam
\index{image} images are often artifacts (flat-fielding errors, optical ``ghosts'' or halos). However, it is
68
true that some of them can be genuine objects, like \index{LSB} LSB galaxies, or distant \index{galaxy clusters} galaxy clusters
69 19 gam
buried in the background noise. For detecting those with software like {\sc SExtractor}, a
70
specific processing is needed (see for instance Dalcanton \etal 1997 and references therein). The
71 25 gam
simplest way to achieve the detection of extended \index{LSB} LSB objects in {\sc SExtractor} is to work
72
on {\tt MINIBACK} \index{check-image} \index{check-images} check-images (see \S\ref{chap:miniback}).
73 19 gam
 
74
A second problem may occur because of overlaps with other objects. Convolving with a low-pass
75 25 gam
filter (the \index{PSF} PSF has no negative side-lobes) diminishes the contrast between objects, and makes
76
\index{segmentation} segmentation less effective in isolating individual sources. This can to some extent be recovered
77
by \index{deblending} deblending (see \S\ref{chap:deblending}). In severely crowded fields however, confusion noise
78 19 gam
becomes the limiting factor for detection, and it is then advisable not to filter at all, or to
79
use a bandpass-filter (compensated filter \gam{what is this? One with
80
  negative side-lobes?}).
81
 
82 25 gam
Finally, the \index{PSF} PSF can vary across the field. The \index{convolution} convolution mask
83 19 gam
should ideally follow these variations in order to allow for optimal detection everywhere in the
84 25 gam
\index{image} image. However, considering approximately-Gaussian \index{PSF} PSF cores and \index{convolution} convolution kernels,
85
detectability is a rather slow function of their \index{FWHM} FWHMs\footnote{Full-Width at Half-Maximum}: a
86
mismatch as large as 50\% between the kernel \index{FWHM} FWHM and that of the \index{PSF} PSF will lead to no more than a
87
10\% loss in peak S/N (Irwin 1985). Considering that \index{PSF} PSF variations are generally much smaller
88 19 gam
than this, filtering in {\sc SExtractor} is limited to constant kernels.
89
 
90
\subsection{Non-linear filtering}
91
\label{filter-non-linear}
92 25 gam
There are many situations in which \index{convolution} convolution is of little help:
93
filtering of (strongly) non-Gaussian noise, extraction of specific \index{image} image patterns,...
94
In those cases, one would like to extend the concept of a \index{convolution} convolution kernel to that of a more
95 19 gam
general stationary filter, able for instance to mimic boolean-like operations on pixels. What
96
one wants is thus a mapping from ${\mathbf R}^n$ to ${\mathbf R}$ around each pixel. But the
97
more general the filter, the more difficult it is to design ``by hand'' for each case, specifying
98
how input pixel \#i should be taken into account with respect to input pixel \#j to form the
99 25 gam
output, etc.. The solution to this is \index{machine-learning} machine-learning. Given a training set containing input and
100
output pixels, a \index{machine-learning} machine-learning software will adapt its internal parameters in order to minimize
101 19 gam
a ``cost function'' (generally a $\chi^2$ error) and converge toward the desired mapping-function.
102
 These parameters can then for example be reloaded by a ``read-only'' routine to provide the
103
actual filtering.
104
 
105
{\sc SExtractor} implements this kind of ``read-only'' functionality in the form of the so-called
106 25 gam
``retina filtering''. The {\sc \index{EyE} EyE}\footnote{{\em Enhance Your Extraction} \gam{URL?}} software (Bertin
107
1997) performs neural-network-learning on input and output \index{image} images to produce
108 19 gam
``retina files''.
109 25 gam
These files contain weights that describe the behaviour of the \index{neural network} neural network. The neural network
110 19 gam
can thus be seen as an ``artificial retina'' that takes its stimuli from a small rectangular array
111 25 gam
of pixels and produces a response according to prior learning (for more details, see the {\sc \index{EyE} EyE}
112
documentation). Typical applications of the retina are the identification of \index{glitch} \index{glitches} glitches.
113 19 gam
 
114
\subsection{What is filtered, and what isn't}
115
Although filtering is a benefit for detection, it distorts profiles
116
and correlates the noise; it is therefore detrimental for most measurement tasks. Because of this,
117 25 gam
filtering is applied ``on the fly'' to the \index{image} image, and {\em directly} affects only the detection
118 19 gam
process and the isophotal parameters described in \S\ref{chap:isoparam}. Other catalogue parameters
119 25 gam
are indirectly affected --- through the exact position of the \index{barycenter} barycenter and typical object extent
120
---, but the effect is considerably less. Obviously, in \index{double-image \index{mode} mode} double-image mode, filtering is only
121
applied to the {\em detection}\, \index{image} image.
122 19 gam
 
123 25 gam
\subsection{Image \index{boundaries} boundaries and \index{bad pixel} \index{bad pixels} bad pixels}
124
``Virtual'' pixels that lie outside \index{image} image \index{boundaries} boundaries are arbitrarily set to zero. This makes sense
125
since filtering occurs on a background-subtracted \index{image} image. When weighting is applied
126
(\S\ref{chap:weight}), \index{bad pixel} \index{bad pixels} bad pixels (pixels with weight $<$ {\tt WEIGHT\_THRESH}) are interpolated
127 19 gam
by default (\S\ref{chap:interp}) and should therefore not cause much trouble. It is recommended
128 25 gam
not to turn-off \index{interpolation} interpolation of \index{bad pixel} \index{bad pixels} bad pixels when filtering is on.
129 19 gam
 
130 29 bertin
\subsection{Configuration parameters}
131 19 gam
Filtering is triggered when the {\tt FILTER} keyword is set to {\tt Y}. If active, a file with name
132
specified by {\tt FILTER\_NAME} is searched for and loaded. Filtering with large retinas can be
133
extremely time consuming. In many cases, one is only interested in filtering pixels whose values
134
stand out from the background noise. The {\tt FILTER\_THRESH keyword} can be given to specify the
135
range of pixel values within which retina-filtering will be applied, in units of background noise
136 25 gam
\index{standard deviation} standard deviation. If one value is given, it is interpreted as a lower \index{threshold} threshold. For instance:
137 19 gam
\begin{verbatim}
138
FILTER_THRESH   3.0
139
\end{verbatim}
140 25 gam
will allow filtering for pixel values exceeding $+3\sigma$ above the \index{local background} local background, whereas
141 19 gam
\begin{verbatim}
142
FILTER_THRESH   -10.0,3.0
143
\end{verbatim}
144
will only allow filtering for pixel values between $-10\sigma$ and $+3\sigma$.
145 25 gam
{\tt FILTER\_THRESH} has no effect on \index{convolution} convolution.
146 19 gam
 
147 25 gam
The result of the filtering process can be verified through a {\tt FILTERED} \index{check-image} check-image: see
148 19 gam
\S\ref{chap:check}.
149
 
150 29 bertin
\subsection{CPU cost}
151 19 gam
The {\sc SExtractor} filtering routine is particularly optimized for small kernels. It thus
152 25 gam
provides a convenient way of filtering large \index{image} image data. On a 2GHz machine, a \index{convolution} convolution by a
153 19 gam
$5\times5$ kernel will contribute less than 1 second to the processing time of a $2048\times4096$
154 25 gam
\index{image} image. The numbers for non-linear (retina) filtering depend on the complexity of the neural
155 19 gam
network, but can be a hundred times larger.
156
\gam{Update time?}
157
 
158 29 bertin
\subsection{Filter file formats}
159 19 gam
As described above, two kinds of filter
160 25 gam
files are recognized by {\sc SExtractor}: \index{convolution} convolution files (traditionaly suffixed with
161 19 gam
``{\tt .conv}''), and ``retina'' files (``{\tt .ret}'' extensions\footnote{In {\sc SExtractor},
162
file name extensions are just conventions; they are not used by the software to distinguish
163
between different file formats.}).
164
 
165 25 gam
Retina files are written exclusively by the {\sc \index{EyE} EyE} software, as \index{FITS binary-tables} FITS binary-tables.
166 19 gam
 
167
Convolution files are in ASCII format. The following example shows the content of the
168
{\tt gauss\_2.0\_5x5.conv} file which can be found in the {\tt config/} sub-directory of the
169
{\sc SExtractor} distribution:
170
\begin{verbatim}
171
CONV NORM
172 25 gam
# 5x5 \index{convolution} convolution mask of a gaussian \index{PSF} PSF with \index{FWHM} FWHM = 2.0 pixels.
173 19 gam
0.006319 0.040599 0.075183 0.040599 0.006319
174
0.040599 0.260856 0.483068 0.260856 0.040599
175
0.075183 0.483068 0.894573 0.483068 0.075183
176
0.040599 0.260856 0.483068 0.260856 0.040599
177
0.006319 0.040599 0.075183 0.040599 0.006319
178
\end{verbatim}
179
The {\tt CONV} keyword appearing at the beginning of the first line
180
tells {\sc SExtractor} that the file contains the description of a
181 25 gam
\index{convolution} convolution mask (kernel). It can be followed by {\tt NORM} if the
182 19 gam
mask is to be normalized to 1 before being applied, or {\tt NONORM}
183
otherwise\footnote{If the sum of the kernel coefficients happens to be
184
exactly zero, the kernel is normalized to variance unity.}. The
185
following lines should contain an equal number of kernel coefficients,
186
separated by $<$space$>$ or $<$TAB$>$ characters. Coefficients in the
187
example above are read from left to right and top to bottom,
188
corresponding to increasing {\tt NAXIS1} ($x$) and {\tt NAXIS2} ($y$)
189 25 gam
in the \index{image} image. Formatting is free, and number representations like {\tt
190 19 gam
-0.14}, {\tt -0.1400}, {\tt -1.4e-1} or {\tt -1.4E-01} are equivalent.
191
The width of the kernel is set by the number of values per line, and
192
its height is given by the number of lines. Lines beginning with
193
``{\tt \#}'' are treated as comments.