ABSTRACT
Detecting motions is an important aspect of machine vision. However real world vision tasks often contain interfering mo- tion information which is
not of interest. To tackle
this difficult task, we adapted Genetic Programming into
this domain.
The
GP-based methodology presented in this pa- per does not require the implementation of existing motion
detection algorithms. The evolved programs can detect gen- uine moving
objects such
as cars
and
boats, while ignoring
background movements such as waving
trees, rippling
wa- ter surface and even
pedestrians.
These
programs provide reliable performance under different lighting conditions, ei- ther indoors and outdoors. Furthermore no preprocessing of
video input is
required which is usually mandatory in con- ventional vision approaches.
Categories and Subject Descriptors
I.2.10 [Artificial Intelligence]: Vision and Scene Under-
standing—Video analysis ; I.4.8 [Image Processing and computer Vision]: Scene
Analysis—Motion ; I.4.6 [Image Processing
and computer Vision]: Segmentation—Pixel
classification ; I.5.4 [Pattern Recognition]:
Applications
General Terms
Algorithms, Design, Experimentation
Keywords
Genetic Programming,
Motion
Detection, Video
Analysis, Tracking
1. INTRODUCTION
This study extends the use of Genetic Programing to
a complex domain, motion detection. In real world
applica- tions, automatic reporting of movement is an important and highly desirable component. The detected motion signals can be crucial
for the systems. For example, they might be considered as
moving vehicles
in traffic monitoring systems,
or can be used to trigger video recording in security
surveil- lance
systems, or can be used to
mark potential targets in
robotic systems.
Conventional motion detection algorithms are based on comparing the differences between consecutive video frames,
Copyright is held by the author/owner(s).
GECCO'09, July 8–12, 2009, Montréal,
Québec, Canada.
ACM 978-1-60558-325-9/09/07.
such as counting the pixels which show prominent variations from previous frame
to the current frame. However uninter- esting
variations caused by factors
like lighting changes
and disturbance in image sensor itself would also
be considered
as motions by such
approaches.
Therefore pre-processing, which is to reduce false
positives, is often critical in these methods.
Furthermore, in many
real
world applications “real” mo- tion is not
necessarily
the movement of interest. For ex- ample,
waving tree branches are
truly
moving objects but have little value in an application. Similarly ripples on rivers provide
little
extra
information compared to
moving boats. In some cases,
seemingly meaningful
motions might also be considered uninteresting
as well, for instance pedestrians in
a vehicle monitoring system
and
birds
in a surveillance ap- plication. Ignoring these kinds of motion
while
reporting genuine
movement is a challenge in real world vision sys- tems.
The above issues are addressed in this study which presents
a methodology of
evolving motion detectors by Genetic Pro- gramming.
The
methodology can be considered as the ex- tension
of our previous studies in which GP has been success- fully used for similar video analysis tasks
such as segmenting texture regions[10] and tracking robosoccer
balls[4].
These previous
investigations have shown the feasibility of
evolving
video analysis programs without
task-dependent represen- tation, and also without prior
knowledge in machine vision techniques. Hence by such GP approaches, there is no need to
change representation from
task
to
task. Also
feature extraction is
not necessary which
can be a time
consuming process and requires expertise in vision algorithms
as well as manual implementation of these algorithms. Moreover the evolved
programs are
capable of handling certain degree of variations which is a highly
desirable feature in
real-world applications.
1.1 Goals
The goals of this
study are:
• to investigate the suitable
GP methodology for
motion detection
• to investigate
the performance of this method, espe- cially
in terms of the
capability of handling variations and filtering uninteresting
motions.
2. BACKGROUND
The relevant background
of this study includes
two ar- eas, the use of GP
in
image
and
vision
related tasks
and
the traditional motion detection. They are briefly
described below.
2.1
GP for Vision
Similar to other application domains
of Genetic Program-
ming[6], vision related areas also attract many interests. A few relevant studies
are briefly
discussed here,
especially on their representation schemes.
In the early stage,
Daniel Howard [3] used GP to detect ve- hicles in Infrared Line Scan images. They
used pixel statis- tics
and discrete
Fourier transforms as
inputs. They found
that the Discrete
fourier transforms terminal set performed only
slightly better than the pixel statistics terminal
set. Riccardo Poli [8] used
GP
to
perform
detection and seg- mentation on
X-ray coronarograms and Magnetic resonance images. Poli
also
used
a terminal set which accessed the pixel values directly.
Recently Leonardo Trujillo
and
Gustavo Olague
[12] used GP to detect interest
points within an image. Detected in- terest points
give an indication of
the regions within an im- age that contain crucial
information.
Once
identified these regions can be further processed
for more complex tasks like
object classification or recognition.
Bir Bhanu and Yingqiang Lin [1] used GP to evolve pro- grams that successfully identified and marked regions in syn- thetic aperture radar images that contained fields, roads and a lake.
They were also able
to
detect vehicles in color im- ages
using
the
same
methodology. The terminal set used by
them
contained the original image
along
with
15 other images
that are modifications of the original
image
that in- cluded the mean,
minimum, maximum and deviation values. Mengjie Zhang et al[7]
used GP to evolve detectors on multi-class problems
of increasing difficulty, which included
identifying haemorrhages and microaneurisms in
retina im- ages. They experimented with different pixel statistics
as
their terminal set and found that the
rectilinear statistics, in which an image is divided into 5 rectangles i.e four corners
and a center rectangle, performed well.
These studies
show that GP is capable
in a wide range of vision-related tasks. Most of them
used simple pixel
based representation which is also used in this investigation.
2.2
Motion
Detection
Traditional methods
of motion detection invlove [9] pre- processing
to filter out
the
noise present in the images. Dai et al [2] used intensity normalization, where intensity values of images are normalized
to have the same mean and vari- ance.
Using a selected threshold, respective to the mean, the relevant foreground images could be identified from the background images. Other techniques involve shading model, where the illumination component of a pixel is filtered out so as to reduce the effects of varying illumination, which is then
processed to identify the foreground pixels.
Background
modeling is another widely used method in
motion detection. Variations in
the pixels are modelled us- ing Gaussian-Mixtures.
If variations in
a pixel is observed
outside the
modelled mixtures, by a predetermined thresh-
old, the pixels are marked as foreground pixels. Stauffer and
Grimson [11] developed a background model with sev- eral Gaussians
to represent dynamic backgound objects like
swaying trees or riples of water
on a lake. Haritaoglu et al [5]
proposed a simplistic method, where a section
of the video
containing only the background
is analysed to find the
min-
imum
and
maximum intensity values and also
the largest inter-frame difference. A
foreground pixel would
then
be identified
as any pixel that differs from the minimum and
the maximum by the
largest
inter-frame difference value.
In
our methodology, these manually designed processes and algorithms are
not
necessary. At the
same
time,
the GP-based method addresses the problems mentioned above.
3. METHODOLOGY
Figure 1: The Basic Framework
Figure
1 shows the basic framework of evolving
motion detectors. There are two fundamental components known as the
evolution
phase and the application phase. The evo-
lution phase
is the process of evolving
motion detection pro- grams. Video frames are
provided in
this phase for
training and evaluation, which is guided by
the
manually marked “interesting” movements in these videos. Programs achieved
the best accuracy during evaluation
are used in application phase to process real-time video signals.
3.1 Data
Representation
Videos for training are processed. Every frame
containing
desirable movements are identified. The size and the loca-
tion of those
moving objects
are marked on these frames.
Each video
frame is usually 384×288 pixels
in size which is not suitable as
direct inputs
to GP programs. To reduce
the input size, a sweeping window
is used
to sample small
cut- outs from
frame images. Cut-outs containing marked mov- ing
objects are
labeled as
positive class.
Cut-outs mainly containing unmarked areas, e.g. no meaningful motions,
are marked as negative class. This set of
cutouts generated from a sequence of video frames is then
split
into two parts. One for training, one
for evaluation.
In our previous studies, each cut-out images contains static pixel information such as intensity, hue value or simple fea- ture values. In this study, they
contains the dynamics at each pixel position. Every pixel catches the intensity changes at that position
over consecutive frames,
by
the following formula:
|
n−
M (x) = i=0 (| xi − xi+1 | × (n − i))
n
Where x is one pixel position in the video. There are n consecutive frames, frame i = 0
is the most current one and frame n is the
first
frame
in the queue.
The weight
n − i results more influence
for more recent frames. The sum
of weighted differences is divided
by n to produce the average change
at one position.
3.1.1
Function Set
The function set (Table 1) comprises
three groups
of func- tions. The first
group is the four basic arithmetic operators add, subtract,
multiply and divide of which the return
Population Size
|
30
|
Maximum Depth
|
12
|
Minimum Depth
|
2
|
Generations
|
300
|
Mutation Rate
|
5%
|
Crossover Rate
|
85%
|
Elitism Rate
|
10%
|
Table 3: GP Runtime Parameters
Table 1: Function Set
|
values are double.
The second
group is the
four logic oper- ators equals, less than and greater than which return booleans
and
the if function. The
if takes
in a boolean and two doubles,
if the boolean
is true then the first double is returned, else the second double is returned.
The functions in the third group
take four
double values as arguments which represent the x, y coordinates
of top-left corner and the x, y
coordinates of
bottom-right corner of a region.
These functions
return the average,
the
maximum and the minimum pixel values
of that region respectively.
They validate the input coordinates before
performing the calculation on the specified
region, so arguments passed from child nodes will not
cause errors.
3.1.2 Terminal Set
The terminal set(Table
2) is rather simple. It contains
two different kinds of terminals, random doubles(drand) and attributes(Att).
Attributes are the input of
a GP program. They are read in from position
x at the cutout images.
Terminal
|
Return Type
|
drand
Att[x]
|
Double
Double
|
Table 2: Terminal Set
The choice
of function set and terminal set
is consistent
with our previous studies and similar
to
that in studies of GP on vision-related tasks.
3.2 Evolution Phase
The fitness
measure in the evolution is the accuracy of
an individual program classifying
whether a cutout image contains motion or not.
It can be expressed
as:
F itness = T P + T N T OT AL
where T OT AL is the
total number of training cases, T P and T N are the numbers of true positives and true negatives respectively.
The GP runtime parameters are listed
in Table 3. To gen- erate
the initial population the ramped half-and-half method was chosen as it could
provide a wide variety of
initial pro- gram
trees [6].
The
mutation rate
was set to 5% as work done
on similar problems utilised
mutation rates
that were either
very low or zero.
Such choice of parameters is based on
other related work,
where
we used GP in recognizing
texture pictures, analyzing
X-ray images.
This parameter
choice is also
consistent with
vision
related GP
work re- ported by other researchers.
There are two termination conditions
in evolution.
The first
one is when
the
training accuracy reaches 100%
while the second is when the number of specified generations has been reached.
For each run of training, the
best GP programs are
passed to evaluation. When the
evolution process finishes,
the best performing programs during the evaluation are then selected
for the next
phase.
3.3 Application Phase
Figure 2: Applying Evolved Motion Detectors
The basic procedure of
applying evolved motion
detectors is illustrated in
Figure 2.
The main steps are:
retrieving the current frame from video input, calculating M values
at each
position
based
on current frame, sampling cutouts by a sweeping window
of which
the
size is identical to
that
in evolution phase,
classifying these cutouts by evolved motion detectors, assembling the labeled cutouts back to a
frame and producing
the processed
frames
continuously
as video output.
During the
sampling, most of the pixels are sampled sev- eral
times
by windows
at different locations
because there
are overlaps
between
adjacent window positions. Accord-
ingly one
pixel might be classified
multiple times
with
dif- ferent cutouts. A voting
strategy is used
here to determine the final class label for each pixel. For example if a
pixel has been classified as “motion” more times than being classified
as “no-motion”, then this pixel is considered a “motion” pixel in the final
output.
The degree of overlap
during sampling is adjustable. Larger overlap will generate smoother output,
but result larger amount of cutouts, hence requires more
processing time. One can use this parameter to balance the computational cost and out- put quality. The overlap
used in these
experiments is always
8 pixels.
Note that in
this methodology there is no pre-processing. Motion detectors are
directly trained on cutouts generated
from videos. The trained programs are then directly ap-
plied on raw video inputs. This feature is important in
real- world applications, because
determining and fine-tuning a pre-processing method
is a non-trivial exercise
which has direct
impact on performance in
traditional methods.
4.
EXPERIMENTS
The methodology described in the previous section has been applied on scenarios with different difficulties, includ- ing
indoor and various outdoor environments.
4.1
Moving Ball
The task
in this set
of experiments is to detect a moving
ball in an arbitrary indoor setting.
The
video for training was taken in a normal computer laboratory. The results from the evolution phase are listed
in
Table 4.
The first data column shows the highest
accuracies obtained
during the training
and evaluation in
the evolution phase. The “TP Rate” and “TN Rate” columns show the ratio
of true posi- tives
among all labeled positives
and
ratio
of true negatives
among all labeled negatives.
They are
similar to the
tradi- tional performance indicators, precision and recall.
|
Accuracy
|
TP Rate
|
TN Rate
|
Training
Evaluation
|
99.83%
97.05%
|
99.83%
97.26%
|
99.83%
96.85%
|
Table 4:
Results
from Evolution
Phase
(Moving
Ball)
Table
4 shows that GP can construct programs with
low false
positive
rate
and
low false
negative rate on training and test video. The best
performing program is selected to process real-time video streams.
Figure 3: Detecting the Moving Ball
A frame
captured from the output of
the best
motion
de- tector is
presented in
Figure 3. The soccer ball is the
only moving object in this video. The
region with white boundary
is marked by the
detector as an area
that contains
motion. This region can follow the movement of the ball and roughly match with
the
shape of the
ball.
The
window size
here is
32 × 32. So the region marked is quite coarse.
However such
marking is already sufficient to locate the moving object
and to estimate its size and
speed.
To test the reliability of the evolved detector, we applied it
on different lighting conditions. Figure 4 shows its output in
a much brighter setting.
The
moving ball can still be marked properly.
Figure 4: Under Different
Lighting Conditions
4.2 Moving Vehicles
The video for this set of experiments was taken over
a highway with building and trees as the background. The task here
is to detect the moving vehicles
on the highway. It is a much more difficult problem compared to detecting a ball under indoor
environment although the experimental
processes were identical.
The cut-out image size was 20 × 20. The training and evaluation results are listed
in Table
5 which uses the
same format as
the previous table. The best
detector on evaluation
data set achieved high accuracy which
is 98.42%.
|
Accuracy
|
TP Rate
|
TN Rate
|
Training
Evaluation
|
99.75%
98.42%
|
100%
99.51%
|
99.51%
97.32%
|
Table 5:
Results from Evolution Phase (Moving Ve- hicle)
The variation is
much greater here, partially due to the unnoticeable ambient changes and partially due
to the au- tomatic adjustment of the
camera itself. Subtracting two consecutive frames
results
in a noisy background. Further- more the
swaying trees make
the
task
more challenging.
Figure
5 is a typical frame generated during the
applica- tion phase.
The
thick lines
are manually drawn
to highlight
the swaying
trees of which
the
movement are very
notice- able on video. As suggested by the
figure,
these trees were ignored
by
our trained motion detector which quite accu-
rately captured the
moving cars. Due to
the movement of the trees, their shadows also
move. Such movement was ig- nored
by the detector as well. However there
was
a small false positive
at the
right bottom corner.
To further test the performance, the
same
detector was applied
on inner city
environment. One
output frame
is
Figure 5: Detecting Moving Vehicles
Figure 7: Ignoring Pedestrians
|
Figure 8: Detecting Moving Ball by a Different
De- tector
Figure 6: Moving Vehicles
in a Different
Setting
shown
in
Figure 6.
The overall
brightness is
much lower
in this case. However the moving
vehicles were accurately
marked by the detector. The
parked cars were ignored.
Furthermore the uninteresting
motions were also ignored. On Figure 7, a pedestrian who was crossing
the road (high-
lighted in the oval) was
not
marked. The
moving trees in the video also did not cause
false positives.
The same detector was also applied
on the
indoor moving ball
detecting tasks. The result is illustrated in Figure
8.
4.3 Moving Boats
The task here is to detect moving
boats which
obviously has
waving water
surface
in the background. The process in this
set
of experiments was again identical to the previous experiments. The results from
its evolution phase and appli- cation phase are shown in Table
6 and Figure 9 respectively.
The cut-out image size was also 20
× 20.
In comparison with the training for
balls
and
vehicles, the accuracies here were lower.
One reason is the
ripples
in the background. In some cases, they were
very significant
especial in the areas close
to the moving boats.
Table 6: Results
from Evolution
Phase
(Moving
Boats)
Despite the relatively low accuracy in
evolution, the
per- formance
during application phase is satisfactory. The two moving boats
were properly marked by the trained detec- tor. The boat on the
left
was stationary and was ignored. The constantly moving ripples did not confuse the detector. However this
detector had
some false positives
on swaying trees which are
also highlighted in
Figure 9.
To test the applicability of the GP methodology, we ap- plied the detector trained for
moving vehicles on this moving
boat problem. The result
is shown
in Figure
10.
It did not generate false
positives and was
still
able
to
find
the
two moving
boats. However the marked areas
are much smaller
than that in Figure 9, mainly
because the boats
were mov- ing much slower than vehicles on video.
Movements that are not
so significant were ignored
by the
moving vehicle detec- tor (in Figure 10),
but picked up by the detector trained
for
5.
CONCLUSIONS
In the
paper, we presented a methodology based
on Ge- netic Programming to evolve motion
detectors. Different to conventional
motion detection methods, this method does
not require any manual development of
a motion detection algorithm. Preprocessing
is also not necessary.
This method has been used to
generate detectors for in- door
and outdoor applications, including
detecting moving ball, moving
vehicles and moving
boats. All the videos in the experiments were taken without any specific settings just
like many real-world applications. The tests were performed on unprocessed video.
Based on the results we conclude that Genetic Program- ming
is capable in handling this complex task,
motion
de- tection. The methodology presented in this study is suitable to adapt GP in this domain. Motion
detectors generated by this method is
able to
accurately detect
desirable motion, while ignoring motion
in the background
and uninterest- ing motion. Furthermore they can handle
lighting condition
changes and even perform
in a completely different environ- ment. This kind of reliability and
wide applicability makes
this methodology a suitable candidate for developing real motion
detection systems.
5.1 Future Work
This study opens many
possibilities for future investiga- tion. We will extend the
methodology for videos that are taken
on moving platforms. To extract moving objects
from such videos is much
more difficult. The combination
of mo- tion
detection and object detection is
another future direc- tion, which
is to
recognize the moving
objects rather than just locating them.
Additionally we will further study the
effects of window
sizes and ensemble
of motion detectors.
6. REFERENCES
[1] B. Bhanu and Y. Lin. Object detection
in multi-modal images using
genetic programming. 2003. Center for
Research in Intelligent Systems, University of California, USA.
[2] X. Dai and S. Khorram. The effects of image misregistration on the accuracy of remotely sensed change detection. IEEE Transactions on Geoscience and Remote Sensing, 36(5):1566–1577, 1998.
[3] S. C. R. Daniel
Howard and C. Ryan. Pragmatic genetic programming strategy
for the problem
of
vehicle detection in
airborne reconnaissance. 2005. Malvern Technology Center, Worcestershire, UK
University of Limerick,
Limerick, Ireland.
[4] D. Fang and A. Song. Robust method of
detecting moving objects
in videos
evolved by genetic programming. GECCO 08 July 12-16, 2008, Atlanta Georgia, USA, 2008.
[5] D. H. Ismail Haritaoglu
and L. S. Davis.
W4:
Real-time surveillance
of people
and
their
activities. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 22(8):809–830, 2000.
[6] J. Koza. Genetic programming:
On the programming
of computers by means of natural selection. 1992.
[7] V. B. C. Mengjie Zhang and P.
Andreae. A
domain independent window approach to
multiclass object detection using genetic programming.
2002.
[8] R. Poli. Genetic programming for
feature detection and image segmentation. School of Computer Science, The University of Birmingham, UK.
[9] O. A.-K.
Richard J. Radke, Srinivas
Andra
and
B. Roysam. Image
change detection algorithms: A systematic survey. IEEE Transactions On
Image Processing Vol 14, 2005. Rensselaer Polytechnic
Institute, Troy,
NY, 12180
USA.
[10]
A. Song. Fast video analysis by genetic programming.
IEEE Congress on Evolutionary Computation, 2008. CEC 2008, pages 3237–3243, 2008.
[11] C. Stauffer
and
W. E. L. Grimson. Learning
patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 22(8):747–757,
2000.
[12] L. Trujillo and
G. Olague. Automated design of image operators that detect interest points. Evolutionary
Computation, 16(4), 2008.