Access Type

Open Access Dissertation

Date of Award

January 2017

Degree Type


Degree Name



Medical Physics

First Advisor

Michael G. Snyder


For a radiation oncology clinic, the number of devices available to assist in the workflow for radiotherapy treatments are quite numerous. Processes such as patient verification, motion management, or respiratory motion tracking can all be improved upon by devices currently on the market. These three specific processes can directly impact patient safety and treatment efficacy and, as such, are important to track and quantify. Most products available will only provide a solution for one of these processes and may be outside the reach of a typical radiation oncology clinic due to difficult implementation and incorporation with already existing hardware. This manuscript investigates the use of the Microsoft Kinect v2 sensor to provide solutions for all three processes all while maintaining a relatively simple and easy to use implementation.

To assist with patient verification, the Kinect system was programmed to create a facial recognition and recall process. The basis of the facial recognition algorithm was created by utilizing a facial mapping library distributed by Microsoft within the Software Developers Toolkit (SDK). Here, the system extracts 31 fiducial points representing various facial landmarks. 3D vectors are created between each of the 31 points and the magnitude of each vector is calculated by the system. This allows for a face to be defined as a collection of 465 specific vector magnitudes. The 465 vector magnitudes defining a face are then used in both the creation of a facial reference data set and subsequent evaluations of real-time sensor data in the matching algorithm. To test the algorithm, a database of 39 faces was created, each with 465 vectors derived from the fiducial points, and a one-to-one matching procedure was performed to obtain sensitivity and specificity data of the facial identification system.

In total, 5299 trials were performed and threshold parameters were created for match determination. Optimization of said parameters in the matching algorithm by way of ROC curves indicated the sensitivity of the system for was 96.5% and the specificity was 96.7%. These results indicate a fairly robust methodology for verifying, in real-time, a specific face through comparison from a pre-collected reference data set. In its current implementation, the process of data collection for each face and subsequent matching session averaged approximately 30 seconds, which may be too onerous to provide a realistic supplement to patient identification in a clinical setting. Despite the time commitment, the data collection process was well tolerated by all participants. It was found that ambient light played a crucial role in the accuracy and reproducibility of the facial recognition system. Testing with various light levels found that ambient light greater than 200 lux produced the most accurate results. As such, the acquisition process should be setup in such a way to ensure consistent ambient light conditions across both the reference recording session and subsequent real-time identification sessions.

In developing a motion management process with the Kinect, two separate, but complimentary processes were created. First, to track large scale anatomical movements, the automatic skeletal tracking capabilities of the Kinect were utilized. 25 specific body joints (head, elbow, knee, etc) make up the skeletal frame and are locked to relative positions on the body. Using code written in C#, these joints are tracked, in 3D space, and compared to an initial state of the patient allowing for an indication of anatomical motion. Additionally, to track smaller, more subtle movements on a specific area of the body, a user drawn ROI can be created. Here, the depth values of all pixels associated with the body in the ROI are compared to the initial state. The system counts the number of live pixels with a depth difference greater than a specified threshold compared to the initial state and the area of each of those pixels is calculated based on their depth. The percentage of area moved (PAM) compared to the ROI area then becomes an indication of gross movement within the ROI.

In this study, 9 specific joints proved to be stable during data acquisition. When moved in orthogonal directions, each coordinate recorded had a relatively linear trend of movement but not the expected 1:1 relationship to couch movement. Instead, calculation of the vector magnitude between the initial and current position proved a better indicator of movement. 5 of the 9 joints (Left/Right Elbow, Left/Right Hip, and Spine-Base) showed relatively consistent values for radial movements of 5mm and 10mm, achieving 20% - 25% coefficient of variation. For these 5 joints, this allowed for threshold values for calculated radial distances of 3mm and 7.5 mm to be set for 5mm and 10mm of actual movement, respectively. When monitoring a drawn ROI, it was found that the depth sensor had very little sensitivity of movement in the X (Left/Right) or Y (Superior/Inferior) direction, but exceptional sensitivity in the Z (Anterior/Posterior) direction. As such, the PAM values could only be coordinated with motion in the Z direction. PAM values of over 60% were shown to be indicative of movement in the Z direction equal to that of the threshold value set for movement as small as 3mm.

Lastly, the Kinect was utilized to create a marker-less, respiratory motion tracking system. Code was written to access the Kinect’s depth sensor and create a process to track the respiratory motion of a subject by recording the depth (distance) values obtained at several user selected points on the subject, with each point representing one pixel on the depth image. As a patient breathes, a specific anatomical point on the chest/abdomen will move slightly within the depth image across a number of pixels. By tracking how depth values change for a specific pixel, instead of how the anatomical point moves throughout the image, a respiratory trace can be obtained based on changing depth values of the selected pixel. Tracking of these values can then be implemented via marker-less setup. Varian’s RPM system and the Anzai belt system were used in tandem with the Kinect in order to compare respiratory traces obtained by each using two different subjects.

Analysis of the depth information from the Kinect for purposes of phase based and amplitude based binning proved to be correlated well with the RPM and Anzai systems. IQR values were obtained which compared times correlated with specific amplitude and phase percentage values against each product. The IQR spans of time indicated the Kinect would measure a specific percentage value within 0.077 s for Subject 1 and 0.164s for Subject 2 when compared to values obtained with RPM or Anzai. For 4D-CT scans, these times correlate to less than 1mm of couch movement and would create an offset of one half an acquired slice. These minimal deviations between the traces created by the Kinect and RPM or Anzai indicate that by tracking the depth values of user selected pixels within the depth image, rather than tracking specific anatomical locations, respiratory motion can be tracked and visualized utilizing the Kinect with results comparable to that of commercially available products.