Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2022

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Electrical and Computer Engineering

First Advisor

Nabil J. Sarhan

Abstract

Despite their wide usage in a tremendous number of applications, Computer Vision (CV) algorithms struggle to maintain constant performance under varying image/video qualities. The variation (commonly known as adaptation) is due to changes in resource availability. Furthermore, deep learning-based CV algorithms require considerable amount of resources to perform complex tasks such as action recognition. In this dissertation, we study the impact of adaptation techniques on three vital CV algorithms, namely face detection, face recognition, and object detection (OD). The former two algorithms are used jointly to identify subjects in a scene, whereas OD is used in many applications such as autonomous driving, scene understanding, and video captioning. We characterize the relationship between various video adaptation parameters and CV accuracy by proposing three novel analytical models. The first two models (BRMODA and QRMODA) characterize the relationship between video bitrate/resolution and quantization/resolution with face detection and recognition accuracies. We empirically validate both models using two video datasets and a large image dataset. We determine that both face detection and recognition accuracies can be expressed as the sum of two exponentials in terms of the video bitrate with the resolution being a multiplicative factor with one exponential. Furthermore, the same accuracy can be expressed as a logistic function of the video quantization parameter with the resolution representing the sigmoid’s midpoint. Consequently, we use several metrics (Recall, Precision, and F1-Score) to represent the face detection and recognition accuracies. We also validate the models using both deep learning-based and statistical-based algorithms for face detection and recognition. Moreover, we propose a novel analytical model to express OD accuracy in terms of the video bitrate and resolution. We validate this model using over 3 million videos based on the YouTube-8M dataset. Since these videos are captured in the wild by different sources and come in different resolutions/qualities, we develop an adaptation methodology that changes the video resolution without altering the original aspect ratio, hence preserving the shape of objects in the video. Additionally, we study the impacts of individual encoding parameters on OD accuracy using detected object types. The empirical results show that the model is accurate for a widely used and deep learning-based OD algorithm. Furthermore, we propose a novel methodology for training action classification neural networks based on extracted holistic body landmarks. We show that the proposed method reduces the training time and thus the required resources for action recognition while simultaneously boosting accuracy. We utilize the proposed methodology to develop a Violence Recognition (VR) system as a case study for action recognition. We evaluate the performance of the proposed system by conducting extensive experiments on four violence video datasets. The empirical results show that the system supersedes the state-of-the-art accuracy on two dataset benchmarks. The results also show that the proposed model size is 240% smaller than the smallest state-of-the-art model. Ultimately, we study the impact of various factors, such as environmental variables and system design choices on VR accuracy.

Off-campus Download

Share

COinS