ADSENSE

Wednesday, 25 March 2015

Motion Detection in Complex Environments by Genetic Programmi








ABSTRACT
Detecting motions  is an important aspect  of machine vision. However real world vision tasks often contain interfering mo- tion  information which  is  not  of interest.   To  tackle  this difficult  task,   we  adapted  Genetic Programming into  this domain.  The  GP-based methodology presented in this  pa- per does not  require  the  implementation of existing  motion detection algorithms. The evolved programs can detect gen- uine  moving  objects  such  as cars  and  boats, while ignoring background movements such  as  waving  trees, rippling  wa- ter  surface  and  even  pedestrians.  These  programs provide reliable  performance under  different lighting  conditions,  ei- ther  indoors  and outdoors. Furthermore no preprocessing of video  input is required which  is usually  mandatory in con- ventional vision approaches.

Categories and Subject Descriptors
I.2.10  [Artificial Intelligence]:  Vision  and  Scene  Under- standingVideo analysis ;  I.4.8  [Image  Processing and computer Vision]:  Scene AnalysisMotion ; I.4.6 [Image Processing and computer Vision]:  SegmentationPixel classification ; I.5.4 [Pattern  Recognition]:  Applications

General Terms
Algorithms, Design,  Experimentation

Keywords
Genetic Programming, Motion  Detection, Video  Analysis, Tracking

1.   INTRODUCTION
This  study extends the  use  of Genetic Programing to  a complex  domain, motion  detection.  In  real  world  applica- tions,  automatic reporting of movement is an important and highly  desirable component.   The  detected motion  signals can be crucial  for the  systems. For  example,  they  might be considered as moving  vehicles in traffic monitoring systems, or can be used to trigger video recording  in security surveil- lance  systems, or can  be used  to  mark  potential targets in robotic  systems.
Conventional motion   detection algorithms are  based  on comparing the differences between  consecutive video frames,


Copyright is held by the author/owner(s).
GECCO'09, July 8–12, 2009, Montréal, Québec, Canada.
ACM 978-1-60558-325-9/09/07.

such as counting the pixels which show prominent variations from previous  frame to the current frame.  However uninter- esting  variations caused  by factors  like lighting  changes  and disturbance in image  sensor  itself  would  also be considered as  motions  by  such  approaches.  Therefore pre-processing, which  is to  reduce  false positives, is often  critical  in these methods.
Furthermore, in many  real  world  applications “real”  mo- tion  is not  necessarily   the  movement of interest.  For  ex- ample,  waving  tree branches are  truly moving  objects  but have little  value in an application. Similarly  ripples on rivers provide  little  extra  information compared to moving  boats. In some cases, seemingly  meaningful motions  might also be considered uninteresting as well, for instance pedestrians in a vehicle monitoring system  and  birds  in a surveillance ap- plication.   Ignoring  these  kinds  of motion  while  reporting genuine  movement is challenge  in  real  world  vision  sys- tems.
The above issues are addressed in this study which presents a methodology of evolving motion  detectors by Genetic Pro- gramming.  The  methodology can  be considered as the  ex- tension  of our previous studies  in which GP has been success- fully used for similar video analysis  tasks  such as segmenting texture regions[10]  and  tracking robosoccer  balls[4].  These previous  investigations have shown the feasibility of evolving video  analysis  programs without task-dependent represen- tation, and  also without prior  knowledge  in machine vision techniques. Hence by such GP  approaches, there  is no need to  change  representation from  task  to  task.    Also  feature extraction is not  necessary  which  can  be a time  consuming process and requires  expertise in vision algorithms as well as manual implementation of these  algorithms.  Moreover  the evolved  programs are  capable  of handling certain degree  of variations which  is a highly  desirable feature in real-world applications.
1.1   Goals
The  goals of this  study are:

to investigate the suitable GP methodology for motion detection

to  investigate the  performance of this  method, espe- cially in terms  of the  capability of handling variations and  filtering  uninteresting motions.

2.   BACKGROUND
The  relevant  background of this  study includes  two  ar- eas,  the  use  of GP  in  image  and  vision  related tasks  and


the  traditional motion  detection. They  are briefly described below.

2.1   GP for Vision
Similar  to other  application domains of Genetic Program- ming[6], vision  related areas  also attract many  interests.  A few relevant studies  are briefly discussed  here,  especially  on their  representation schemes.
In the early stage,  Daniel Howard [3] used GP to detect ve- hicles in Infrared Line Scan images.  They  used pixel statis- tics  and  discrete  Fourier transforms as inputs. They  found that the  Discrete  fourier  transforms terminal set performed only  slightly  better than the  pixel  statistics terminal set. Riccardo Poli  [8] used  GP  to  perform   detection and  seg- mentation on X-ray  coronarograms and  Magnetic resonance images.    Poli  also  used  a  terminal set  which  accessed  the pixel values  directly.
Recently Leonardo Trujillo and  Gustavo Olague  [12] used GP  to detect interest points  within  an image.  Detected in- terest points  give an indication of the  regions within  an im- age that contain crucial  information.  Once  identified these regions can be further processed  for more complex tasks  like object  classification or recognition.
Bir Bhanu and  Yingqiang  Lin [1] used  GP  to evolve pro- grams that successfully identified and marked regions in syn- thetic aperture radar images that contained fields, roads and a lake.   They  were  also able  to  detect vehicles  in color  im- ages  using  the  same  methodology.  The  terminal set  used by  them  contained the  original  image  along  with  15 other images  that are modifications of the  original  image  that  in- cluded the mean,  minimum, maximum and deviation values. Mengjie  Zhang  et  al[7] used  GP  to  evolve  detectors  on multi-class problems of increasing difficulty,  which  included identifying haemorrhages and  microaneurisms in retina  im- ages.   They  experimented with  different  pixel  statistics  as their  terminal set  and  found  that the  rectilinear statistics, in which an image is divided into 5 rectangles i.e four corners
and  a center  rectangle, performed well.
These  studies  show that GP  is capable  in a wide range  of vision-related tasks.   Most  of them  used  simple  pixel based representation which  is also used in this  investigation.

2.2   Motion Detection
Traditional methods of motion  detection invlove  [9] pre- processing  to filter out  the  noise present in the  images.  Dai et al [2] used intensity normalization, where intensity values of images  are  normalized to  have  the  same  mean  and  vari- ance.   Using  a  selected  threshold,  respective to  the  mean, the  relevant foreground images  could be identified from the background images.  Other techniques involve shading  model, where  the  illumination component  of a pixel  is filtered  out so as to reduce  the  effects of varying  illumination, which  is then  processed  to identify  the  foreground pixels.
Background modeling  is another widely  used  method in motion  detection. Variations in the  pixels  are modelled  us- ing Gaussian-Mixtures.  If variations in a pixel  is observed outside  the  modelled  mixtures, by a predetermined thresh- old,  the  pixels  are  marked as  foreground pixels.    Stauffer and  Grimson [11] developed  a background model  with  sev- eral Gaussians to represent dynamic backgound objects  like swaying trees or riples of water  on a lake.  Haritaoglu et al [5] proposed  a simplistic method, where  a section  of the  video containing only the  background is analysed to find the  min-

imum  and  maximum intensity values  and  also  the  largest inter-frame difference.    A  foreground pixel  would  then  be identified as  any  pixel  that differs  from  the  minimum and the  maximum by the  largest  inter-frame difference value.
In  our  methodology, these  manually designed  processes and  algorithms are  not  necessary. At the  same  time,  the GP-based method addresses the problems mentioned above.

3.   METHODOLOGY



Figure 1:  The Basic Framework


Figure  1 shows  the  basic  framework of evolving  motion detectors.  There  are  two  fundamental components known as the  evolution  phase  and  the  application phase.  The  evo- lution  phase  is the process of evolving motion  detection pro- grams.  Video frames  are provided in this  phase  for training and  evaluation, which  is  guided  by  the  manually marked “interesting” movements in these  videos.  Programs achieved the  best  accuracy during  evaluation are used in application phase  to process  real-time video signals.

3.1   Data Representation
Videos for training are processed.  Every  frame containing desirable movements are  identified.  The  size and  the  loca- tion  of those  moving  objects  are marked on these  frames.
Each video frame is usually  384×288 pixels in size which is not suitable as direct  inputs to GP  programs. To reduce  the input size, a sweeping  window  is used  to sample  small  cut- outs  from frame  images.  Cut-outs containing marked mov- ing  objects  are  labeled  as  positive  class.   Cut-outs mainly containing unmarked areas,  e.g.  no meaningful motions, are marked as negative class.  This set of cutouts generated from a sequence  of video frames  is then  split  into  two parts. One for training, one for evaluation.
In our previous studies, each cut-out images contains static pixel information such as intensity, hue value  or simple fea- ture  values.   In  this  study, they  contains the  dynamics at each pixel position. Every pixel catches the intensity changes at that position  over  consecutive frames,  by  the  following formula:
P  1
 
n
M (x) =    i=0 (| xi   xi+1 | × (n i))
n
Where  x  is one  pixel  position   in  the  video.    There  are  n consecutive frames,  frame  i = 0 is the  most  current one and frame  n  is the  first  frame  in the  queue.   The  weight  n i results  more  influence  for more  recent frames.   The  sum  of weighted  differences  is divided  by n to produce  the  average change  at one position.

3.1.1   Function Set
The function set (Table 1) comprises  three  groups of func- tions.  The  first group  is the  four basic arithmetic operators add, subtract, multiply and divide of which the  return



Population  Size
30
Maximum Depth
12
Minimum Depth
2
Generations
300
Mutation Rate
5%
Crossover  Rate
85%
Elitism Rate
10%

Table 3:  GP Runtime Parameters







Table 1:  Function Set


Function    Return Type                  Arguments
+           Double                   Double,  Double
-                 Double                   Double,  Double
×           Double                   Double,  Double
/         Double                   Double,  Double
=          Boolean                  Double,  Double
<          Boolean                  Double,  Double
>          Boolean                  Double,  Double
if                Double          Boolean,  Double,  Double
AVG              Double                        4 Doubles
MAX             Double                        4 Doubles
MIN              Double                        4 Doubles

 
values  are double.   The  second  group  is the  four logic oper- ators  equals, less  than and greater than which  return booleans  and  the  if function. The  if takes  in a boolean  and two doubles,  if the  boolean  is true then  the  first  double  is returned, else the  second  double  is returned.
The  functions in the  third group  take  four double  values as arguments which represent the x, y coordinates of top-left corner  and  the  x, y coordinates of bottom-right corner  of a region.   These  functions return the  average,  the  maximum and  the  minimum pixel  values  of that region  respectively. They  validate the  input coordinates before  performing the calculation on the specified region, so arguments passed from child nodes  will not  cause errors.

3.1.2   Terminal Set
The  terminal set(Table   2) is rather simple.   It  contains two different kinds of terminals, random doubles(drand) and attributes(Att).  Attributes are the  input of a GP  program. They  are read  in from position  x at the  cutout images.

Terminal
Return Type
drand
Att[x]
Double
Double

Table 2:  Terminal Set

The  choice  of function set  and  terminal set  is consistent with  our  previous  studies  and  similar  to  that in studies  of GP  on vision-related tasks.

3.2   Evolution Phase
The  fitness  measur in  the  evolution is the  accuracy of an  individual program classifying  whether a  cutout image contains motion  or not.  It can be expressed  as:


F itness = T P + T N T OT AL
where  T OT AL is the  total number of training cases,  T P and T N are the numbers of true positives  and true negatives respectively.
The GP runtime parameters are listed in Table   3. To gen- erate  the initial  population the ramped half-and-half method was chosen  as it could  provide  a wide variety of initial  pro- gram  trees [6].  The  mutation rate  was  set  to  5% as  work done  on similar  problems utilised  mutation rates  that were either  very  low or zero.  Such  choice of parameters is based on  other   related work,  where  we  used  GP  in  recognizing

texture  pictures, analyzing X-ray  images.   This  parameter choice  is  also  consistent  with  vision  related GP  work  re- ported by other  researchers.
There  are  two termination  conditions in evolution.  The first  one is when  the  training accuracy reaches  100% while the  second  is when  the  number of specified generations has been reached.
For each run of training, the best GP programs are passed to evaluation. When  the  evolution process finishes, the  best performing programs during  the evaluation are then  selected for the  next  phase.

3.3   Application Phase




















Figure 2:  Applying Evolved Motion Detectors


The basic procedure of applying evolved motion  detectors is illustrated in  Figure  2.   The  main  steps  are:   retrieving the  current frame  from video input, calculating M  values  at each  position  based  on current  frame,  sampling  cutouts by a sweeping  window  of which  the  size is identical to  that in evolution phase,  classifying  these  cutouts by evolved motion detectors,  assembling the  labeled  cutouts back  to  a  frame and  producing the  processed  frames  continuously as  video output.
During  the  sampling, most  of the  pixels are sampled  sev- eral  times  by  windows  at different  locations  because  there are  overlaps  between  adjacent window  positions.   Accord- ingly  one pixel  might be classified  multiple times  with  dif- ferent cutouts. A voting  strategy is used  here  to determine the final class label for each pixel.  For example  if a pixel has been classified as “motion” more times  than being classified as “no-motion, then  this pixel is considered a “motion” pixel in the  final output.


The degree of overlap during sampling  is adjustable. Larger overlap will generate smoother output, but result  larger amount of cutouts, hence requires  more processing time.  One can use this parameter to  balance  the  computational cost  and  out- put quality. The overlap  used in these  experiments is always
8 pixels.
Note  that in this  methodology there  is no pre-processing. Motion  detectors are  directly trained on cutouts generated from  videos.    The  trained  programs are  then  directly ap- plied on raw video inputs. This  feature is important in real- world  applications, because  determining and  fine-tuning a pre-processing method  is  a  non-trivial exercise  which  has direct  impact on performance in traditional methods.

4.   EXPERIMENTS
The  methodology described in  the  previous  section  has been  applied  on scenarios  with  different difficulties,  includ- ing indoor  and  various  outdoor environments.

4.1   Moving Ball
The  task  in this  set of experiments is to detect a moving ball  in an  arbitrary indoor  setting.  The  video  for training was  taken   in  a  normal   computer laboratory The  results from  the  evolution phase  are  listed  in  Table  4.   The  first data column  shows  the  highest  accuracies obtained during the training and evaluation in the evolution phase.  The “TP Rate” and “TN  Rate” columns  show the  ratio  of true posi- tives  among  all labeled  positives  and  ratio  of true negatives among  all labeled  negatives.  They  are  similar  to the  tradi- tional  performance indicators, precision  and  recall.


Accuracy
TP  Rate
TN  Rate
Training
Evaluation
99.83%
97.05%
99.83%
97.26%
99.83%
96.85%

Table  4:    Results  from Evolution  Phase  (Moving
Ball)

Table  4 shows that GP  can  construct programs with  low false  positive  rate  and  low false  negative rate  on  training and  test  video.  The  best  performing program is selected  to process  real-time video streams.



Figure 3:  Detecting the Moving Ball


A frame  captured from the  output of the  best  motion  de- tector is presented in Figure  3.  The  soccer ball  is the  only moving object in this video.  The region with white boundary

is marked by the  detector as an area  that contains motion. This region can follow the movement of the ball and roughly match with  the  shape  of the  ball.  The  window  size here  is
32 × 32. So the region marked is quite  coarse.  However such marking is already sufficient to locate the moving object  and to estimate its size and  speed.
To test  the  reliability of the  evolved  detector, we applied it on different lighting  conditions. Figure  4 shows its output in  much  brighter setting.  The  moving  ball  can  still  be marked properly.



Figure 4:  Under Different  Lighting Conditions



4.2   Moving Vehicles
The  video  for  this  set  of experiments was  taken   over  a highway  with  building   and  trees as  the  background.   The task  here  is to  detect the  moving  vehicles  on the  highway. It  is a much  more  difficult  problem  compared to  detecting a ball under  indoor  environment although the  experimental processes were identical. The cut-out image size was 20 × 20. The  training and  evaluation results  are  listed  in Table  5 which uses the  same format as the  previous  table. The  best detector on evaluation data set achieved high accuracy which
is 98.42%.


Accuracy
TP  Rate
TN  Rate
Training
Evaluation
99.75%
98.42%
100%
99.51%
99.51%
97.32%

Table 5:  Results from Evolution Phase (Moving Ve- hicle)

The  variation is much  greater here,  partially due  to  the unnoticeable ambient changes  and  partially due  to  the  au- tomatic adjustment of the  camera  itself.   Subtracting two consecutive frames  results  in a noisy  background.  Further- more the  swaying  trees make  the  task  more challenging.
Figure  5 is a typical frame  generated during  the  applica- tion  phase.  The  thick  lines are manually drawn  to highlight the  swaying  trees of which  the  movement are  very  notice- able  on video.  As suggested by the  figure,  these  trees were ignored  by  our  trained motion  detector which  quite  accu- rately  captured the  moving  cars.   Due  to  the  movement of the  trees, their  shadows  also move.  Such movement was ig- nored  by  the  detector as well.   However  there  was  a small false positive  at the  right bottom corner.
To  further test  the  performance, the  same  detector was applied   on  inner  city  environment.   One  output frame  is





Figure
5:  Detecting Moving Vehicles


Figure 7:  Ignoring Pedestrians





Accuracy
TP  Rate
TN  Rate
Training
Evaluation
95.52%
88.75%
95.60%
90.42%
95.43%
87.08%

 
Figure 8:  Detecting Moving Ball by a Different  De- tector



Figure 6:  Moving Vehicles in a Different  Setting





shown  in  Figure  6.   The  overall  brightness is much  lower in this  case.   However  the  moving  vehicles  were  accurately marked by the  detector. The  parked cars were ignored.
Furthermore the  uninteresting motions  were also ignored. On Figure  7, a pedestrian who was crossing  the  road  (high- lighted  in the  oval)  was  not  marked.  The  moving  trees in the  video also did not  cause false positives.
The  same detector was also applied  on the  indoor  moving ball detecting tasks.  The  result  is illustrated in Figure  8.

4.3   Moving Boats
The  task  here  is to detect moving  boats  which  obviously has waving  water  surface  in the background. The  process in this  set  of experiments was again  identical to  the  previous experiments. The results  from its evolution phase and appli- cation  phase  are shown in Table  6 and Figure  9 respectively. The  cut-out image  size was also 20 × 20.
In  comparison with  the  training for  balls  and  vehicles, the  accuracies here were lower.  One reason  is the  ripples  in the background. In some cases, they were very significant especial  in the  areas  close to the  moving  boats.

Table  6:    Results  from Evolution  Phase  (Moving
Boats)


Despite  the  relatively low accuracy in evolution, the  per- formance  during  application phase  is satisfactory.  The  two moving  boats  were  properly marked by  the  trained detec- tor The  boat  on the  left  was stationary and  was ignored. The  constantly moving  ripples  did not  confuse the  detector. However  this  detector had  some  false positives  on swaying trees which  are also highlighted in Figure  9.
To  test  the  applicability of the  GP  methodology, we ap- plied the detector trained for moving vehicles on this moving boat  problem. The  result  is shown  in Figure  10.  It  did not generate false  positives   and  was  still  able  to  find  the  two moving  boats. However  the  marked areas  are much  smaller than that in Figure  9, mainly  because  the  boats  were mov- ing much slower than vehicles on video.  Movements that are not  so significant were ignored  by the  moving  vehicle detec- tor (in Figure  10), but  picked  up by the  detector trained for



5.   CONCLUSIONS
In the  paper, we presented a methodology based  on Ge- netic  Programming to evolve motion  detectors. Different to conventional motion  detection methods, this  method does not  require  any  manual development of a motion  detection algorithm. Preprocessing is also not  necessary.
This  method has  been  used  to  generate detectors for in- door  and  outdoor applications, including detecting moving ball,  moving  vehicles  and  moving  boats.  All the  videos  in the experiments were taken  without any specific settings just like many  real-world applications. The tests  were performed on unprocessed video.
Based  on the  results  we conclude  that Genetic Program- ming  is capable  in handling this  complex  task,  motion  de- tection. The methodology presented in this study is suitable to adapt GP  in this  domain. Motion  detectors generated by this  method is able  to  accurately detect desirable motion, while  ignoring  motion   in  the  background and  uninterest- ing motion. Furthermore they  can handle  lighting  condition changes  and  even perform  in a completely different environ- ment. This  kind  of reliability and  wide applicability makes


this  methodology a  suitable candidate for  developing   real motion  detection systems.

5.1   Future Work
This  study opens  many  possibilities for future investiga- tion.   We  will extend the  methodology for videos  that are taken  on moving platforms. To extract moving objects  from such videos is much  more difficult.  The  combination of mo- tion  detection and  object  detection is another future direc- tion,  which  is to  recognize  the  moving  objects  rather than just  locating  them.   Additionally we will further study the effects of window  sizes and  ensemble  of motion  detectors.

6.   REFERENCES
[1] B. Bhanu and  Y. Lin. Object detection in multi-modal images  using genetic  programming. 2003. Center for Research in Intelligent Systems, University of California, USA.
[2] X. Dai and  S. Khorram. The  effects of image misregistration on the  accuracy of remotely sensed change  detection. IEEE Transactions on Geoscience and  Remote  Sensing,  36(5):1566–1577,  1998.
[3] S. C. R. Daniel  Howard  and  C. Ryan.  Pragmatic genetic  programming strategy for the  problem  of vehicle detection in airborne reconnaissance. 2005. Malvern  Technology Center, Worcestershire, UK University of Limerick,  Limerick,  Ireland.
[4] D. Fang  and  A. Song. Robust method of detecting moving  objects  in videos evolved  by genetic programming. GECCO 08 July  12-16,  2008,  Atlanta Georgia, USA, 2008.
[5] D. H. Ismail  Haritaoglu and  L. S. Davis.  W4:
Real-time surveillance of people  and  their  activities. IEEE Transactions on Pattern Analysis  and  Machine Intelligence, 22(8):809–830,  2000.
[6] J. Koza.  Genetic programming: On the  programming of computers by means  of natural selection.  1992.
[7] V. B. C. Mengjie  Zhang  and  P.  Andreae. A domain independent window  approach to multiclass object detection using genetic  programming. 2002.
[8] R. Poli.  Genetic programming for feature detection and image  segmentation. School of Computer Science, The  University of Birmingham, UK.
[9] O. A.-K.  Richard J. Radke,  Srinivas  Andra  and
B. Roysam. Image  change  detection algorithms: A systematic survey.  IEEE Transactions On  Image Processing Vol 14, 2005. Rensselaer Polytechnic Institute, Troy,  NY, 12180 USA.
[10] A. Song. Fast video analysis  by genetic  programming.
IEEE Congress  on Evolutionary Computation, 2008. CEC  2008, pages 3237–3243, 2008.
[11] C. Stauffer  and  W. E. L. Grimson. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis  and  Machine Intelligence, 22(8):747–757,  2000.

[12] L. Trujillo and  G. Olague.  Automated design of image operators that detect interest points. Evolutionary Computation, 16(4),  2008.

Wednesday, 25 March 2015

Motion Detection in Complex Environments by Genetic Programmi








ABSTRACT
Detecting motions  is an important aspect  of machine vision. However real world vision tasks often contain interfering mo- tion  information which  is  not  of interest.   To  tackle  this difficult  task,   we  adapted  Genetic Programming into  this domain.  The  GP-based methodology presented in this  pa- per does not  require  the  implementation of existing  motion detection algorithms. The evolved programs can detect gen- uine  moving  objects  such  as cars  and  boats, while ignoring background movements such  as  waving  trees, rippling  wa- ter  surface  and  even  pedestrians.  These  programs provide reliable  performance under  different lighting  conditions,  ei- ther  indoors  and outdoors. Furthermore no preprocessing of video  input is required which  is usually  mandatory in con- ventional vision approaches.

Categories and Subject Descriptors
I.2.10  [Artificial Intelligence]:  Vision  and  Scene  Under- standingVideo analysis ;  I.4.8  [Image  Processing and computer Vision]:  Scene AnalysisMotion ; I.4.6 [Image Processing and computer Vision]:  SegmentationPixel classification ; I.5.4 [Pattern  Recognition]:  Applications

General Terms
Algorithms, Design,  Experimentation

Keywords
Genetic Programming, Motion  Detection, Video  Analysis, Tracking

1.   INTRODUCTION
This  study extends the  use  of Genetic Programing to  a complex  domain, motion  detection.  In  real  world  applica- tions,  automatic reporting of movement is an important and highly  desirable component.   The  detected motion  signals can be crucial  for the  systems. For  example,  they  might be considered as moving  vehicles in traffic monitoring systems, or can be used to trigger video recording  in security surveil- lance  systems, or can  be used  to  mark  potential targets in robotic  systems.
Conventional motion   detection algorithms are  based  on comparing the differences between  consecutive video frames,


Copyright is held by the author/owner(s).
GECCO'09, July 8–12, 2009, Montréal, Québec, Canada.
ACM 978-1-60558-325-9/09/07.

such as counting the pixels which show prominent variations from previous  frame to the current frame.  However uninter- esting  variations caused  by factors  like lighting  changes  and disturbance in image  sensor  itself  would  also be considered as  motions  by  such  approaches.  Therefore pre-processing, which  is to  reduce  false positives, is often  critical  in these methods.
Furthermore, in many  real  world  applications “real”  mo- tion  is not  necessarily   the  movement of interest.  For  ex- ample,  waving  tree branches are  truly moving  objects  but have little  value in an application. Similarly  ripples on rivers provide  little  extra  information compared to moving  boats. In some cases, seemingly  meaningful motions  might also be considered uninteresting as well, for instance pedestrians in a vehicle monitoring system  and  birds  in a surveillance ap- plication.   Ignoring  these  kinds  of motion  while  reporting genuine  movement is challenge  in  real  world  vision  sys- tems.
The above issues are addressed in this study which presents a methodology of evolving motion  detectors by Genetic Pro- gramming.  The  methodology can  be considered as the  ex- tension  of our previous studies  in which GP has been success- fully used for similar video analysis  tasks  such as segmenting texture regions[10]  and  tracking robosoccer  balls[4].  These previous  investigations have shown the feasibility of evolving video  analysis  programs without task-dependent represen- tation, and  also without prior  knowledge  in machine vision techniques. Hence by such GP  approaches, there  is no need to  change  representation from  task  to  task.    Also  feature extraction is not  necessary  which  can  be a time  consuming process and requires  expertise in vision algorithms as well as manual implementation of these  algorithms.  Moreover  the evolved  programs are  capable  of handling certain degree  of variations which  is a highly  desirable feature in real-world applications.
1.1   Goals
The  goals of this  study are:

to investigate the suitable GP methodology for motion detection

to  investigate the  performance of this  method, espe- cially in terms  of the  capability of handling variations and  filtering  uninteresting motions.

2.   BACKGROUND
The  relevant  background of this  study includes  two  ar- eas,  the  use  of GP  in  image  and  vision  related tasks  and


the  traditional motion  detection. They  are briefly described below.

2.1   GP for Vision
Similar  to other  application domains of Genetic Program- ming[6], vision  related areas  also attract many  interests.  A few relevant studies  are briefly discussed  here,  especially  on their  representation schemes.
In the early stage,  Daniel Howard [3] used GP to detect ve- hicles in Infrared Line Scan images.  They  used pixel statis- tics  and  discrete  Fourier transforms as inputs. They  found that the  Discrete  fourier  transforms terminal set performed only  slightly  better than the  pixel  statistics terminal set. Riccardo Poli  [8] used  GP  to  perform   detection and  seg- mentation on X-ray  coronarograms and  Magnetic resonance images.    Poli  also  used  a  terminal set  which  accessed  the pixel values  directly.
Recently Leonardo Trujillo and  Gustavo Olague  [12] used GP  to detect interest points  within  an image.  Detected in- terest points  give an indication of the  regions within  an im- age that contain crucial  information.  Once  identified these regions can be further processed  for more complex tasks  like object  classification or recognition.
Bir Bhanu and  Yingqiang  Lin [1] used  GP  to evolve pro- grams that successfully identified and marked regions in syn- thetic aperture radar images that contained fields, roads and a lake.   They  were  also able  to  detect vehicles  in color  im- ages  using  the  same  methodology.  The  terminal set  used by  them  contained the  original  image  along  with  15 other images  that are modifications of the  original  image  that  in- cluded the mean,  minimum, maximum and deviation values. Mengjie  Zhang  et  al[7] used  GP  to  evolve  detectors  on multi-class problems of increasing difficulty,  which  included identifying haemorrhages and  microaneurisms in retina  im- ages.   They  experimented with  different  pixel  statistics  as their  terminal set  and  found  that the  rectilinear statistics, in which an image is divided into 5 rectangles i.e four corners
and  a center  rectangle, performed well.
These  studies  show that GP  is capable  in a wide range  of vision-related tasks.   Most  of them  used  simple  pixel based representation which  is also used in this  investigation.

2.2   Motion Detection
Traditional methods of motion  detection invlove  [9] pre- processing  to filter out  the  noise present in the  images.  Dai et al [2] used intensity normalization, where intensity values of images  are  normalized to  have  the  same  mean  and  vari- ance.   Using  a  selected  threshold,  respective to  the  mean, the  relevant foreground images  could be identified from the background images.  Other techniques involve shading  model, where  the  illumination component  of a pixel  is filtered  out so as to reduce  the  effects of varying  illumination, which  is then  processed  to identify  the  foreground pixels.
Background modeling  is another widely  used  method in motion  detection. Variations in the  pixels  are modelled  us- ing Gaussian-Mixtures.  If variations in a pixel  is observed outside  the  modelled  mixtures, by a predetermined thresh- old,  the  pixels  are  marked as  foreground pixels.    Stauffer and  Grimson [11] developed  a background model  with  sev- eral Gaussians to represent dynamic backgound objects  like swaying trees or riples of water  on a lake.  Haritaoglu et al [5] proposed  a simplistic method, where  a section  of the  video containing only the  background is analysed to find the  min-

imum  and  maximum intensity values  and  also  the  largest inter-frame difference.    A  foreground pixel  would  then  be identified as  any  pixel  that differs  from  the  minimum and the  maximum by the  largest  inter-frame difference value.
In  our  methodology, these  manually designed  processes and  algorithms are  not  necessary. At the  same  time,  the GP-based method addresses the problems mentioned above.

3.   METHODOLOGY



Figure 1:  The Basic Framework


Figure  1 shows  the  basic  framework of evolving  motion detectors.  There  are  two  fundamental components known as the  evolution  phase  and  the  application phase.  The  evo- lution  phase  is the process of evolving motion  detection pro- grams.  Video frames  are provided in this  phase  for training and  evaluation, which  is  guided  by  the  manually marked “interesting” movements in these  videos.  Programs achieved the  best  accuracy during  evaluation are used in application phase  to process  real-time video signals.

3.1   Data Representation
Videos for training are processed.  Every  frame containing desirable movements are  identified.  The  size and  the  loca- tion  of those  moving  objects  are marked on these  frames.
Each video frame is usually  384×288 pixels in size which is not suitable as direct  inputs to GP  programs. To reduce  the input size, a sweeping  window  is used  to sample  small  cut- outs  from frame  images.  Cut-outs containing marked mov- ing  objects  are  labeled  as  positive  class.   Cut-outs mainly containing unmarked areas,  e.g.  no meaningful motions, are marked as negative class.  This set of cutouts generated from a sequence  of video frames  is then  split  into  two parts. One for training, one for evaluation.
In our previous studies, each cut-out images contains static pixel information such as intensity, hue value  or simple fea- ture  values.   In  this  study, they  contains the  dynamics at each pixel position. Every pixel catches the intensity changes at that position  over  consecutive frames,  by  the  following formula:
P  1
 
n
M (x) =    i=0 (| xi   xi+1 | × (n i))
n
Where  x  is one  pixel  position   in  the  video.    There  are  n consecutive frames,  frame  i = 0 is the  most  current one and frame  n  is the  first  frame  in the  queue.   The  weight  n i results  more  influence  for more  recent frames.   The  sum  of weighted  differences  is divided  by n to produce  the  average change  at one position.

3.1.1   Function Set
The function set (Table 1) comprises  three  groups of func- tions.  The  first group  is the  four basic arithmetic operators add, subtract, multiply and divide of which the  return



Population  Size
30
Maximum Depth
12
Minimum Depth
2
Generations
300
Mutation Rate
5%
Crossover  Rate
85%
Elitism Rate
10%

Table 3:  GP Runtime Parameters







Table 1:  Function Set


Function    Return Type                  Arguments
+           Double                   Double,  Double
-                 Double                   Double,  Double
×           Double                   Double,  Double
/         Double                   Double,  Double
=          Boolean                  Double,  Double
<          Boolean                  Double,  Double
>          Boolean                  Double,  Double
if                Double          Boolean,  Double,  Double
AVG              Double                        4 Doubles
MAX             Double                        4 Doubles
MIN              Double                        4 Doubles

 
values  are double.   The  second  group  is the  four logic oper- ators  equals, less  than and greater than which  return booleans  and  the  if function. The  if takes  in a boolean  and two doubles,  if the  boolean  is true then  the  first  double  is returned, else the  second  double  is returned.
The  functions in the  third group  take  four double  values as arguments which represent the x, y coordinates of top-left corner  and  the  x, y coordinates of bottom-right corner  of a region.   These  functions return the  average,  the  maximum and  the  minimum pixel  values  of that region  respectively. They  validate the  input coordinates before  performing the calculation on the specified region, so arguments passed from child nodes  will not  cause errors.

3.1.2   Terminal Set
The  terminal set(Table   2) is rather simple.   It  contains two different kinds of terminals, random doubles(drand) and attributes(Att).  Attributes are the  input of a GP  program. They  are read  in from position  x at the  cutout images.

Terminal
Return Type
drand
Att[x]
Double
Double

Table 2:  Terminal Set

The  choice  of function set  and  terminal set  is consistent with  our  previous  studies  and  similar  to  that in studies  of GP  on vision-related tasks.

3.2   Evolution Phase
The  fitness  measur in  the  evolution is the  accuracy of an  individual program classifying  whether a  cutout image contains motion  or not.  It can be expressed  as:


F itness = T P + T N T OT AL
where  T OT AL is the  total number of training cases,  T P and T N are the numbers of true positives  and true negatives respectively.
The GP runtime parameters are listed in Table   3. To gen- erate  the initial  population the ramped half-and-half method was chosen  as it could  provide  a wide variety of initial  pro- gram  trees [6].  The  mutation rate  was  set  to  5% as  work done  on similar  problems utilised  mutation rates  that were either  very  low or zero.  Such  choice of parameters is based on  other   related work,  where  we  used  GP  in  recognizing

texture  pictures, analyzing X-ray  images.   This  parameter choice  is  also  consistent  with  vision  related GP  work  re- ported by other  researchers.
There  are  two termination  conditions in evolution.  The first  one is when  the  training accuracy reaches  100% while the  second  is when  the  number of specified generations has been reached.
For each run of training, the best GP programs are passed to evaluation. When  the  evolution process finishes, the  best performing programs during  the evaluation are then  selected for the  next  phase.

3.3   Application Phase




















Figure 2:  Applying Evolved Motion Detectors


The basic procedure of applying evolved motion  detectors is illustrated in  Figure  2.   The  main  steps  are:   retrieving the  current frame  from video input, calculating M  values  at each  position  based  on current  frame,  sampling  cutouts by a sweeping  window  of which  the  size is identical to  that in evolution phase,  classifying  these  cutouts by evolved motion detectors,  assembling the  labeled  cutouts back  to  a  frame and  producing the  processed  frames  continuously as  video output.
During  the  sampling, most  of the  pixels are sampled  sev- eral  times  by  windows  at different  locations  because  there are  overlaps  between  adjacent window  positions.   Accord- ingly  one pixel  might be classified  multiple times  with  dif- ferent cutouts. A voting  strategy is used  here  to determine the final class label for each pixel.  For example  if a pixel has been classified as “motion” more times  than being classified as “no-motion, then  this pixel is considered a “motion” pixel in the  final output.


The degree of overlap during sampling  is adjustable. Larger overlap will generate smoother output, but result  larger amount of cutouts, hence requires  more processing time.  One can use this parameter to  balance  the  computational cost  and  out- put quality. The overlap  used in these  experiments is always
8 pixels.
Note  that in this  methodology there  is no pre-processing. Motion  detectors are  directly trained on cutouts generated from  videos.    The  trained  programs are  then  directly ap- plied on raw video inputs. This  feature is important in real- world  applications, because  determining and  fine-tuning a pre-processing method  is  a  non-trivial exercise  which  has direct  impact on performance in traditional methods.

4.   EXPERIMENTS
The  methodology described in  the  previous  section  has been  applied  on scenarios  with  different difficulties,  includ- ing indoor  and  various  outdoor environments.

4.1   Moving Ball
The  task  in this  set of experiments is to detect a moving ball  in an  arbitrary indoor  setting.  The  video  for training was  taken   in  a  normal   computer laboratory The  results from  the  evolution phase  are  listed  in  Table  4.   The  first data column  shows  the  highest  accuracies obtained during the training and evaluation in the evolution phase.  The “TP Rate” and “TN  Rate” columns  show the  ratio  of true posi- tives  among  all labeled  positives  and  ratio  of true negatives among  all labeled  negatives.  They  are  similar  to the  tradi- tional  performance indicators, precision  and  recall.


Accuracy
TP  Rate
TN  Rate
Training
Evaluation
99.83%
97.05%
99.83%
97.26%
99.83%
96.85%

Table  4:    Results  from Evolution  Phase  (Moving
Ball)

Table  4 shows that GP  can  construct programs with  low false  positive  rate  and  low false  negative rate  on  training and  test  video.  The  best  performing program is selected  to process  real-time video streams.



Figure 3:  Detecting the Moving Ball


A frame  captured from the  output of the  best  motion  de- tector is presented in Figure  3.  The  soccer ball  is the  only moving object in this video.  The region with white boundary

is marked by the  detector as an area  that contains motion. This region can follow the movement of the ball and roughly match with  the  shape  of the  ball.  The  window  size here  is
32 × 32. So the region marked is quite  coarse.  However such marking is already sufficient to locate the moving object  and to estimate its size and  speed.
To test  the  reliability of the  evolved  detector, we applied it on different lighting  conditions. Figure  4 shows its output in  much  brighter setting.  The  moving  ball  can  still  be marked properly.



Figure 4:  Under Different  Lighting Conditions



4.2   Moving Vehicles
The  video  for  this  set  of experiments was  taken   over  a highway  with  building   and  trees as  the  background.   The task  here  is to  detect the  moving  vehicles  on the  highway. It  is a much  more  difficult  problem  compared to  detecting a ball under  indoor  environment although the  experimental processes were identical. The cut-out image size was 20 × 20. The  training and  evaluation results  are  listed  in Table  5 which uses the  same format as the  previous  table. The  best detector on evaluation data set achieved high accuracy which
is 98.42%.


Accuracy
TP  Rate
TN  Rate
Training
Evaluation
99.75%
98.42%
100%
99.51%
99.51%
97.32%

Table 5:  Results from Evolution Phase (Moving Ve- hicle)

The  variation is much  greater here,  partially due  to  the unnoticeable ambient changes  and  partially due  to  the  au- tomatic adjustment of the  camera  itself.   Subtracting two consecutive frames  results  in a noisy  background.  Further- more the  swaying  trees make  the  task  more challenging.
Figure  5 is a typical frame  generated during  the  applica- tion  phase.  The  thick  lines are manually drawn  to highlight the  swaying  trees of which  the  movement are  very  notice- able  on video.  As suggested by the  figure,  these  trees were ignored  by  our  trained motion  detector which  quite  accu- rately  captured the  moving  cars.   Due  to  the  movement of the  trees, their  shadows  also move.  Such movement was ig- nored  by  the  detector as well.   However  there  was  a small false positive  at the  right bottom corner.
To  further test  the  performance, the  same  detector was applied   on  inner  city  environment.   One  output frame  is





Figure
5:  Detecting Moving Vehicles


Figure 7:  Ignoring Pedestrians





Accuracy
TP  Rate
TN  Rate
Training
Evaluation
95.52%
88.75%
95.60%
90.42%
95.43%
87.08%

 
Figure 8:  Detecting Moving Ball by a Different  De- tector



Figure 6:  Moving Vehicles in a Different  Setting





shown  in  Figure  6.   The  overall  brightness is much  lower in this  case.   However  the  moving  vehicles  were  accurately marked by the  detector. The  parked cars were ignored.
Furthermore the  uninteresting motions  were also ignored. On Figure  7, a pedestrian who was crossing  the  road  (high- lighted  in the  oval)  was  not  marked.  The  moving  trees in the  video also did not  cause false positives.
The  same detector was also applied  on the  indoor  moving ball detecting tasks.  The  result  is illustrated in Figure  8.

4.3   Moving Boats
The  task  here  is to detect moving  boats  which  obviously has waving  water  surface  in the background. The  process in this  set  of experiments was again  identical to  the  previous experiments. The results  from its evolution phase and appli- cation  phase  are shown in Table  6 and Figure  9 respectively. The  cut-out image  size was also 20 × 20.
In  comparison with  the  training for  balls  and  vehicles, the  accuracies here were lower.  One reason  is the  ripples  in the background. In some cases, they were very significant especial  in the  areas  close to the  moving  boats.

Table  6:    Results  from Evolution  Phase  (Moving
Boats)


Despite  the  relatively low accuracy in evolution, the  per- formance  during  application phase  is satisfactory.  The  two moving  boats  were  properly marked by  the  trained detec- tor The  boat  on the  left  was stationary and  was ignored. The  constantly moving  ripples  did not  confuse the  detector. However  this  detector had  some  false positives  on swaying trees which  are also highlighted in Figure  9.
To  test  the  applicability of the  GP  methodology, we ap- plied the detector trained for moving vehicles on this moving boat  problem. The  result  is shown  in Figure  10.  It  did not generate false  positives   and  was  still  able  to  find  the  two moving  boats. However  the  marked areas  are much  smaller than that in Figure  9, mainly  because  the  boats  were mov- ing much slower than vehicles on video.  Movements that are not  so significant were ignored  by the  moving  vehicle detec- tor (in Figure  10), but  picked  up by the  detector trained for



5.   CONCLUSIONS
In the  paper, we presented a methodology based  on Ge- netic  Programming to evolve motion  detectors. Different to conventional motion  detection methods, this  method does not  require  any  manual development of a motion  detection algorithm. Preprocessing is also not  necessary.
This  method has  been  used  to  generate detectors for in- door  and  outdoor applications, including detecting moving ball,  moving  vehicles  and  moving  boats.  All the  videos  in the experiments were taken  without any specific settings just like many  real-world applications. The tests  were performed on unprocessed video.
Based  on the  results  we conclude  that Genetic Program- ming  is capable  in handling this  complex  task,  motion  de- tection. The methodology presented in this study is suitable to adapt GP  in this  domain. Motion  detectors generated by this  method is able  to  accurately detect desirable motion, while  ignoring  motion   in  the  background and  uninterest- ing motion. Furthermore they  can handle  lighting  condition changes  and  even perform  in a completely different environ- ment. This  kind  of reliability and  wide applicability makes


this  methodology a  suitable candidate for  developing   real motion  detection systems.

5.1   Future Work
This  study opens  many  possibilities for future investiga- tion.   We  will extend the  methodology for videos  that are taken  on moving platforms. To extract moving objects  from such videos is much  more difficult.  The  combination of mo- tion  detection and  object  detection is another future direc- tion,  which  is to  recognize  the  moving  objects  rather than just  locating  them.   Additionally we will further study the effects of window  sizes and  ensemble  of motion  detectors.

6.   REFERENCES
[1] B. Bhanu and  Y. Lin. Object detection in multi-modal images  using genetic  programming. 2003. Center for Research in Intelligent Systems, University of California, USA.
[2] X. Dai and  S. Khorram. The  effects of image misregistration on the  accuracy of remotely sensed change  detection. IEEE Transactions on Geoscience and  Remote  Sensing,  36(5):1566–1577,  1998.
[3] S. C. R. Daniel  Howard  and  C. Ryan.  Pragmatic genetic  programming strategy for the  problem  of vehicle detection in airborne reconnaissance. 2005. Malvern  Technology Center, Worcestershire, UK University of Limerick,  Limerick,  Ireland.
[4] D. Fang  and  A. Song. Robust method of detecting moving  objects  in videos evolved  by genetic programming. GECCO 08 July  12-16,  2008,  Atlanta Georgia, USA, 2008.
[5] D. H. Ismail  Haritaoglu and  L. S. Davis.  W4:
Real-time surveillance of people  and  their  activities. IEEE Transactions on Pattern Analysis  and  Machine Intelligence, 22(8):809–830,  2000.
[6] J. Koza.  Genetic programming: On the  programming of computers by means  of natural selection.  1992.
[7] V. B. C. Mengjie  Zhang  and  P.  Andreae. A domain independent window  approach to multiclass object detection using genetic  programming. 2002.
[8] R. Poli.  Genetic programming for feature detection and image  segmentation. School of Computer Science, The  University of Birmingham, UK.
[9] O. A.-K.  Richard J. Radke,  Srinivas  Andra  and
B. Roysam. Image  change  detection algorithms: A systematic survey.  IEEE Transactions On  Image Processing Vol 14, 2005. Rensselaer Polytechnic Institute, Troy,  NY, 12180 USA.
[10] A. Song. Fast video analysis  by genetic  programming.
IEEE Congress  on Evolutionary Computation, 2008. CEC  2008, pages 3237–3243, 2008.
[11] C. Stauffer  and  W. E. L. Grimson. Learning patterns of activity using real-time tracking. IEEE Transactions on Pattern Analysis  and  Machine Intelligence, 22(8):747–757,  2000.

[12] L. Trujillo and  G. Olague.  Automated design of image operators that detect interest points. Evolutionary Computation, 16(4),  2008.