Region based convolutional neural networks for object detection



tải về 2.4 Mb.
Chế độ xem pdf
trang1/14
Chuyển đổi dữ liệu19.10.2022
Kích2.4 Mb.
#184721
  1   2   3   4   5   6   7   8   9   ...   14
KAUL-THESIS-2017


REGION BASED CONVOLUTIONAL NEURAL NETWORKS FOR OBJECT DETECTION 
AND RECOGNITION IN ADAS APPLICATION 
by 
SACHIT KAUL 
Presented to the Faculty of the Graduate School of 
The University of Texas at Arlington in Partial Fulfillment 
of the Requirements 
for the Degree of 
MASTER OF SCIENCE IN MECHANICAL ENGINEERING 
THE UNIVERSITY OF TEXAS AT ARLINGTON 
December 2017 


ii 
Copyright © by Sachit Kaul 2017 
All Rights Reserved 


iii 
Acknowledgements 
First, I would like to thank Dr. Kamesh Subbarao for his guidance, constant 
support and motivation through entire process of my dissertation development. While 
discussing about the area of study I wanted to pursue my graduate thesis it was him who 
introduced me to the area of Object Detection and Recognition using Computer Vision 
and Deep Learning and generously agreed to be my faculty advisor for the thesis.
Since then he has been instrumental in developing my concepts in the field of 
study. He helped me built my research in a structured manner while giving me freedom to 
innovate and pursue my own ideas.
I would also like to thank Dr. Ratan Kumar and Dr. Ashfaq Adnan for being a part 
of my thesis committee and taking time to give their inputs for making this work possible.
Second, I would also like to thank my family and friends for always being there 
for me when I needed them and believing in me whatever the circumstances. They have 
filled me with enthusiasm and have always encouraged me to pursue my dreams. It is 
their trust in me which has made me the person I am today. 
Last, I would like to thank MathWorks, Inc for providing me with all the tools 
necessary for my thesis. It is their tools which has made my thesis possible. 
November 30, 2017 


iv 
Abstract 
REGION BASED CONVOLUTIONAL NEURAL NETWORKS FOR OBJECT DETECTION 
AND RECOGNITION IN ADAS APPLICATION 
Sachit Kaul, MS
The University of Texas at Arlington, 2017 
Supervising Professor: Kamesh Subbarao 
Object Detection and Recognition using Computer Vision has been a very 
interesting and a challenging field of study from past three decades. Recent 
advancements in Deep Learning and as well as increase in computational power has 
reignited the interest of researchers in this field in last decade. 
Implementing Machine Learning and Computer Vision techniques in scene 
classification and object localization particularly for automated driving purpose has been 
a topic of discussion in last half decade and we have seen some brilliant advancements 
in recent times as self-driving cars are becoming a reality. In this thesis we focus on 
Region based Convolutional Neural Networks (R-CNN) for object recognition and 
localizing for enabling Automated Driving Assistance Systems (ADAS). R-CNN combines 
two ideas: (1) one can apply high-capacity Convolutional Networks (CNN) to bottom-up 
region proposals in order to localize and segment objects and (2) when labelling data is 
scarce, supervised pre-training for an auxiliary task, followed by domain-specific-fine-
tuning, boosts performance significantly. 
In this thesis, inspired by the RCNN framework we describe an object detection 
and segmentation system that uses a multilayer convolutional network which computes 



highly discriminative, yet invariant features to classify image regions and outputs those 
regions as detected bounding boxes for specifically a driving scenario to detect objects 
which are generally on road such as traffic signs, cars, pedestrians etc.
We also discuss different types of region based convolutional networks such as 
RCNN, Fast RCNN and Faster RCNN, describe their architecture and perform a time 
study to determine which of them leads to real-time object detection for a driving scenario 
when implemented on a regular PC architecture. 
Further we discuss how we can use such R-CNN for determining the distance of 
objects on road such as Cars, Traffic Signs, Pedestrians from a sensor (camera) 
mounted on the vehicle which shows how Computer Vision and Machine Learning 
techniques are useful in automated braking systems (ABS) and in perception algorithms 
such as Simultaneous Localization and Mapping (SLAM).


vi 
Table of Contents 
Acknowledgements .............................................................................................................iii
 
Abstract .............................................................................................................................. iv
 
List of Illustrations ............................................................................................................. viii
 
List of Tables ...................................................................................................................... ix
 
Chapter 1: Introduction
…………………………………………………………………………...1 
Chapter 2
: Literature Review…………………………………………………………………….3 
Chapter 3: Introduction to Convolutional Neural Networks
…………………........................7 
3.1. 
Architecture……………………………………………………………………….................7 
3.1.1 Convolution Layer
………………………………………………………………………….8 
3.1.2 Poo
ling Layer……………………………………………………………….....................10 
3
.1.3 Normalization Layer………………………………………………………………………11 
3.1.4 Fully-
Connected Layer…………………………………………………….....................12 
3.2 Famous Convolutional Network Architectures
……………………………….................12 
Chapter 4
: Transfer Learning…………………………………………………………………..14 
4.1 AlexNet Architecture
………………………………………………………………………..14 
4.1.1 ReLU Nonlinearity
………………………………………………………………………...14 
4.1.2 Training on Multiple GPU's
………………………………………………………………16 
3.1.3 Localization Response Normalization
………………………………………………….16 
4.1.4 Overlapping Pooling
……………………………………………………………………...17 
4.1.5 Overall Architecture
………………………………………………………………………17 
4.2 Fine-
tuning AlexNet for Road Objects Detection……………………………................18 
Chapter 5: Region Based Convolutional Networks for Object Detection
………………….20 
5.1 R-
CNN………………………………………………………………………………………..20 


vii 
5.1.1 R-CNN Architecture
………….…………………………………………………………...21 
5.1.2 Implementation of R-
CNN for Road Object Detection………….…………………….22 
5.1.3 Results of R-
CNN……..………………………………………………………………….22 
5
.1.4 Conclusion…..…………………………………………………………………………….24 
5.2 Fast R-
CNN………………………………………………………………………………….25 
5.2.2 Implementation of Fast R-
CNN……………………………………………...................26 
5.2.3 Results Fast R-CNN
……………………………………………………………………...26 
5
.2.4 Conclusion…………………………………………………………………………………27 
5.3 Faster R-
CNN…….…………………………………………………………………………28 
5.3.2 Implementation of Faster R-
CNN…………………………………………...................29 
5.3.3 Results Faster R-CNN
……………………………………………………………………30 
5
.3.4 Conclusion…………………………………………………………………………………32 
Chapter 6: Implementation of Faster R-
CNN for Depth Estimation………………………..33 
6
.1 Introduction…………………………………………………………………………………..33 
6
.2 Stereo Vision…………………………………………………………………….................33 
6.3 Ca
mera Calibration…………………………………………………………………………35 
6
.4 Disparity Mapping……………………………………………………………….................37 
6
.5 Object Detection…………………………………………………………………………….38 
6
.6 3D Reconstruction And Depth Estimation………………………………………………..38 
6.7 Results of Depth E
stimation……………………………………………………………….40 
Chapter 7: Conclusion
………………………………………………………………………….41 
Chapter 8: Future Scope
……………………………………………………………………….42 
References……………………………………………………………………………………….43 
Appendix………………………………………………………………………………………….48 
Biographical I
nformation………………………………………………………………………..51 


viii 
List of Illustrations 
Figure.3
.1 Regular 3 Layer Neural Network Vs Convolutional Neural Network…………..8
Figure 3.2 Representation of Layers of Convolutional Neural Network
…………………..12 
Figure 4.1 Training error rate vs Epochs
……………………………………………………..15 
Figure 4.2. AlexNet Final Architecture
………………………………………………………..18
Figure 5.1. R-CNN Architecture
……………………………………………………………….21
Figure 5.2. Procedure for Object detection and Localization in R-CNN
…………………..22 
Figure 5.3. Time study
………………………………………………………………………….24 
Figure 5.4. Fast R-CNN Architecture
………………………………………………………....25 
Figure 5.5. Detection results
…………………………………………………………………...27 
Figure 5.6. Faster R-CNN Architecture
……………………………………………………….28 
Figure 5.7. Comparison of Detection Time Per Image for R-CNN Architectures
………...30 
Figure 5.8. Detection Results Faster R-CNN
………………………………………………...31 
Figure 5.9. Mean Average Precision (mAP) of our Faster RCNN
………………………….32 
Figure 6.1 A Stereo Imaging System
………………………………………………………….33 
Figure 6.2. Depth Estimation Process Work Flow
…………………………………………...35 
Figure 6.3 Examples of what you can do after calibrating the camera
…………………....36 
Figure 6
.4 Intrinsic and Extrinsic parameters………………………………………………...36 
Figure 6
.5 Camera Calibration Session in MATLAB R2017b……………………………....37 
Figure 6.6 Stereo Camera Geometry
………………………………………………………....39 
Figure 6.7. Results of Depth Estimation
………………………………………………………40 


ix 
List of Tables 
Table 5.1 Time Study of R-CNN ..................................................................................... 23. 
Table 5.2 Time Study of Fast R-CNN .............................................................................. 26. 




tải về 2.4 Mb.

Chia sẻ với bạn bè của bạn:
  1   2   3   4   5   6   7   8   9   ...   14




Cơ sở dữ liệu được bảo vệ bởi bản quyền ©tieuluan.info 2022
được sử dụng cho việc quản lý

    Quê hương