Scalable Video Data Management and Visual Querying System for Autonomous Camera Networks

Khanijo, Bharati

View/Open

Thesis full text (12.35Mb)

Author

Khanijo, Bharati

Metadata

Show full item record

Abstract

Video data has been historically known for its unstructured nature, rich semantic content and scalability issues in terms of storage. With advances in computer vision and Deep Neural Net works (DNNs) it is now possible to automatically extract rich semantic information from video data. This has resulted in increased interest for development and exploration of applications where stored video data could be used to observe and study the world retrospectively. But recent research has highlighted the compute intensive nature of such deep models (e.g., accurate object detection models) leading to high cost associated with use of these models limiting their applicability to analyze video data naively for retrospective analysis. Also development and efficient implementation of above applications often require to co-analyze video data along with associated geospatial and temporal metadata, which has been acknowledged by the research community to be a difficult task due to the associated cognitive load. There is a growing use of drone cameras for capturing video data due to their mobility and ease of deployment. These videos are complemented with temporally varying location and orientation metadata, which ease their exploration. Video query systems are required to allow intuitive querying over such geospatial, temporal and semantic information associated with the videos. In this thesis, we develop a geospatial-temporal video query system that supports semantic queries over drone videos by extending an existing spatial-temporal database and leveraging DNN models. Specifically, we include query abstractions over the level of detail of visual information captured in the videos, and propose simple heuristics for better reuse of semantic object detection results from different object detection model configurations. Op timizations needed for such retrospective analysis motivate the proposed novel DDownscale method, along with an Ingest Pipeline, to efficiently acquire, store and query drone videos in a Video Repository. A key requirement for this video repository is the need to conserve storage space and compute time for semantic video queries. Reducing the resolution of videos will reduce the video size and the inferencing time during querying. Existing methods to reduce the resolution of video data for such optimizations often leverage the stationary spatial and temporal characteristics of videos in static cameras, which are absent in videos from mobile drone fleets. Drones fly at different altitudes and record videos from different viewpoints and capture videos with varying level of detail of visual information. Another factor we need to take care of, is that, drone videos are typically of short duration unlike those captured by static cameras. We propose the novel DDownscale method to dynamically select the downscale factor for a video such that the level of detail in the video required for effective object detection is not compromised. We model the relative recall drop caused by downscaling as a function of the object size in the downscaled video and the downscaling factor used. We observe that for a given object detection DNN model and class of interest, our method generalizes well to the evaluated test datasets. Using the above modeling, we derive the DDownscale inequality as a relation between the relative recall drop of the video and the hyperparameters to DDownscale. This relation is satisfied by ≈ 98% of the dynamically downscaled videos across different datasets. For user specified target reduction in recall values ranging from 1% – 30%, the proposed DDownscale algorithm help achieve > 25% reduction in total object detection time and > 31% reduction in storage on average compared to baseline of storing and evaluating the videos uniformly at the original resolution, with ≈ 96% of dynamically downscaled videos having the relative recall drop within user specified target. Additionally, we explore a simpler specification of target level of detail ; derive a relation between this specification and a statistic of relative drop in recall of smallest object of in terest when detected by the selected model; and propose a drone video ingest pipeline that preprocesses video on arrival from the drones, including dynamically downscaling them, before insertion into the video repository residing on a central server. The pipeline uses scheduling strategies over a cluster of heterogeneous edge accelerators to reduce time to ingest the drone videos to make them quickly available for analysis. The pipeline reduces the average turn around time for ingest by ≈ 66% despite the downscaling overhead, compared to uploading original resolution video without downscaling, for the evaluated workload and experimental setup.

URI

https://etd.iisc.ac.in/handle/2005/6612

Collections

Department of Computational and Data Sciences (CDS) [102]