Off-campus WSU users: To download campus access dissertations, please use the following link to log into our proxy server with your WSU access ID and password, then click the "Off-campus Download" button below.

Non-WSU users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Access Type

WSU Access

Date of Award

January 2022

Degree Type

Dissertation

Degree Name

Ph.D.

Department

Computer Science

First Advisor

Zichun Zhong

Abstract

Nowadays, the ability to understand and analyze 3D data is becoming increasinglyimportant in computer vision and computer graphics communities. In this dissertation, we are aiming to explore several interesting yet challenging cutting-edge problems with the intersections of vision, geometry, graphics, and machine learning perspectives, i.e., learning geometry-aware neural explicit and implicit 3D representations, to push the boundary of the 3D deep learning performance. At first, we review the related works in the different application field, i.e. computer vision, which covers wide topics such as 3D shape reconstruction, generation, and analysis. Then we present our three major contributions to this dissertation, which can be categorized into three parts as follows: Explicit 3D Representation: We propose a new approach to define convolutions on point cloud. The proposed annular convolutions can define arbitrary kernel sizes on each local ring-shaped region, and help to capture better geometric representations of 3D shapes. Also, we propose a new multi-level hierarchical method based on dilated rings, which leads to better capturing and abstracting shape geometric details. The new dilated strategy on point clouds benefits our proposed closed-loop convolutions and poolings. Our proposed network models present new state-of-the-art performance on object classification, part segmentation, and semantic segmentation of large-scale scenes using a variety of standard benchmark datasets. Joint Explicit 3D Representation: We propose a new joint latent space – mixer approach for learning the high-quality 3D object multimodal representation and generative models. It provides an intrinsic and unified representation and correlation for cross-modality data by synergically integrating and complementing the encoded features from 3D geometry and 2D contents via the proposed intermodality feature mapping and intramodality feature consistency design. We design a new geometry-aware autoencoder for 3D shapes through the developed full-resolution shape feature extractor and multi-resolution geometric feature extractor, which can enhance the geometric variability and scalability of the joint latent representation. Our network models present new state-of-the-art performance on shape (point cloud) autoencoding, and several novel 3D shape tasks, such as simultaneous multimodal (SMM) shape and color image generation and interpolation, and SMM semantic-aware generation (i.e., generation with part-level semantic annotations on shape and image), which can enhance the capability of the corresponding single-modality and single-tasking to the next level. Explicit-Guided Implicit 3D Representation: We propose a new self-supervised learning method that represents a complicated implicit surface as a new depth-aware occupancy function (DOF) and adopts an end-to-end differentiable surface rendering paradigm to train the neural DOF representation only relying on single-view image with highly sparse depth information. The proposed surface-aware sampling, occupancy self-labeling, and differentiable surface rendering with inverse computation techniques can better optimize implicit neural surface and appearance together. Our proposed approach not only numerically achieves new state-of-the-art performance, but also produces surface reconstructions with qualitatively better geometric details and more accurate textures, as well as exhibits good performance on generalizability and flexibility.

Off-campus Download

Share

COinS