Vision Based Activity Monitoring for Surveillance Applications
Project Overview

Video Surveillance is an important application of Computer Vision for an organization, from the security point of view. This is one application where an automated system can replace human beings, as well as have inputs which axe not possible through human surveillance alone. This is not an easy task, given the state-of-the art hardware, and existing algorithms for the task. The human body is a highly non-rigid 3-D object, which one has to analyze from 2-D images of a given scene. Inherent in the problem specification is the need to be robust to noise of different forms. The noise is not just image and sensor noise.There could be multiple moving objects in the scene, both animate and inanimate. Additionally, there could be unmodeled objects in the scene - a cluttered background. This project aims to examine different aspects of the general problem of surveillance, in a collaborative manner (between I.I T, Bombay and I.I.T, Delhi). The Project mainly focuses on:

  • Activity Recognition:

    The basic aim is to have the Computer Vision system which will be able to analyse video sequences to understand patterns of ongoing activity and generate warning, when something amiss is detected. Algorithms are developed for analyzing video streams as they are - at their current sampling rate, and resolution. This is a fall-out from the now-completing joint I.I.T, Bombay and I.I.T, Delhi NRB-sponsored project, "An Automated Gesture Based System for Telerobotic Applications". Algorithms are developed to look at the semantics of multiple gestures and other body movements, to draw conclusions about the nature of the activity being performed in a scene. This would draw on the work done on hand-gesture recognition using temporal and motion information and spatial characterization of hand shapes. Activity recognition using manual and/or automated means can be facilitated through post-facto high resolution slow-motion analysis of surveillance data. Given surveillance data that has been taken over a period of time, the aim is to make inferences (either automatically, or through human intervention) based on getting something more out of the data. Camera systems have limitations on the frame grabbing rates. For example, cameras operating in the Infra-Red (IR for short) range typically have a very slow frame grabbing rate. The idea is to develop algorithms for obtaining super-resolution data, through which one will be able to see the video in a level of detail that is not possible from the original video stream.

  • Multisensor Visual Surveillance:

    Multisensor Visual Surveillance mainly involves issues like using more than one camera (possibly, fixed), for analyzing scenes. This project looks at the problem of having a rig of one or more video cameras in a room, or around a place, and analyzing scenes with their input. The other issue is to use cameras in the visual band, and to look at issues like synchronizing different camera streams, since calibration parameters in different bands are quite different.

    • Collaborative Tracking

      Multi-view tracking is primarily geometry-based, The problem of Multi-model tracking is also examined, wherein the tracker is assumed to have learnt a number of models of motion. At any stage now, the tracker is to select a model and follow it, and on failure, automatically switch between models.

    • Multi-band visual surveillance

      Collaborative Tracking forms the first part of a multi-modal camera system, wherein the object of interest is tracked across frames in a reliable and robust manner. This part of the project aims at using the synchronized input from different camera bands, for visual surveillance.

Prof. Shantanu Chaudhary

Kavita, Ritu, Ruchi

Minisitry Of Communication and Information Technology

Participating Institutes
  • IIT-Delhi
  • CEERI Pillani
  • IIT Bombay
  • Jadhopur University
  • IIIT Hyderabad
  • CDAC-Kolkata