




已閱讀5頁,還剩5頁未讀, 繼續(xù)免費閱讀
版權說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權,請進行舉報或認領
文檔簡介
附錄AReal-time object recognition using local features on a DSP-based embedded systemAbstractIn the last few years, object recognition has become one of the most popular tasks in computer vision. In particular, this was driven by the development of new powerful algorithms for local appearance based object recognition. So-called smart cameras with enough power for decentralized image processing became more and more popular for all kinds of tasks, especially in the field of surveillance. Recognition is a very important tool as the robust recognition of suspicious vehicles, persons or objects is a matter of public safety. This simply makes the deployment of recognition capabilities on embedded platforms necessary. In our work we investigate the task of object recognition based on state-of-the-art algorithms in the context of a DSP-based embedded system. We implement several powerful algorithms for object recognition, namely an interest point detector together with an region descriptor, and build a medium-sized object database based on a vocabulary tree, which is suitable for our dedicated hardware setup. We carefully investigate the parameters of the algorithm with respect to the performance on the embedded platform. We show that state-of-the-art object recognition algorithms can be successfully deployed on nowadays smart cameras, even with strictly limited computational and memory resources.Keywords DSP ; Object recognition; Local features; Vocabulary tree1. IntroductionObject recognition is one of the most popular tasks in the field of computer vision. In the past decade, big efforts were made to build robust object recognition systems based on appearance features with local extent. For such a framework to be applicable in the real world several attributes are very important: insensitivity against rotation, illumination or view point changes, as well as real-time behavior and large-scale operation. Current systems already have a lot of these properties and, though not all problems have been solved yet, nowadays they become more and more attractive to the industry for inclusion in products for the customer market. In turn, recently embedded vision platforms such as smart cameras have successfully emerged, however, only offering a limited amount of computational and memory resources. Nevertheless, embedded vision systems are already present in our everyday life. Almost everyones mobile phone is equipped with a camera and, thus, can be treated as a small embedded vision system. Clearly this gives rise to new applications, like navigation tools for visually impaired persons, or collaborative public monitoring using millions of artificial eyes. In addition, the low price of digital sensors and the increased need for security in public places has led to a tremendous growth in the number of cameras mounted for surveillance purposes. They have to be small in size and have to process the huge amounts of available data on site. Furthermore, they have to perform dedicated operations automatically and without human interaction. Not only in the field of surveillance, but also in the areas of household robotics, entertainment, military and industrial robotics, embedded computer vision platforms are becoming more and more popular due to their robustness against environmental adversities. Especially DSP-based embedded platforms are very popular as they are powerful and cheap CPUs, which are still small in size and efficient in terms of power consumption. As DSP offer the maximum in flexibility of the software to be run, compared to other embedded units like FPG As, ASIC or GPU, their current success is not surprising.For the reasons already mentioned, recognition tasks are a very important area of research. However, in this respect some attributes of embedded platforms strictly limit the practicability of current state-of-the-art approaches. For example, the amount of memory available on a device strictly limits the number of objects in the database. Therefore, for building an embedded object recognition system, one goal is to make the amount of data to represent a single object as small as possible in order to maximize the number of recognizable objects. Another important aspect is the real-time capability of these systems. Algorithms have to be fast enough to be operational in the real world. They have to be robust and user-friendly; otherwise, a product equipped with such functionality is simply unattractive to a potential customer. For example, in an interactive tour through a museum, object recognition on a mobile device has to be fast enough to allow for continuity in guidance. Formally speaking, we consider this to be an application requiring soft real-time system behavior. Clearly, this is just one example, and the exact meaning of the term real-time is dependent on the concrete application. We still consider an object recognition system as being real-time capable, if it is able to deliver at least one result per second. This already serves enough for many applications like the example of the interactive museum introduced above. However, it is clear that this definition does not meet other applications, and that an improvement in throughput is needed for object recognition at frame rate, for instance in combination with object tracking. To summarize, building a full-featured recognition system on an embedded platform turns out to be a challenging problem given all the different aspects and environmental restrictions to consider.In this work, we describe a method to deploy a medium sized object recognition system on a prototypical DSP based embedded platform. To the best of our knowledge, we are the first to extensively investigate issues related to object recognition in the context of Embedded Systems; by now this is the only work studying the influence of various parameters on recognition performance and runtime behavior. We pick a set of high-level algorithms to describe objects by a set of appearance features. As a prototypical local feature based recognition system we use difference of Gaussian (DOG) key points and principal component analysis scale invariant feature transform (PCASIFT) descriptors to build compact object representations. By arranging this information in a clever treelike data structure based on k-means clustering, a so-called vocabulary tree, real-time behavior is achieved. By applying a dedicated compression mechanism, the size of the data structure can be traded off against the recognition performance and thereby accurate tuning the properties of a recognition system to a given hardware platform can be performed. As it is shown in extensive evaluations by considering both, special properties of the algorithms and dedicated advantages of special hardware, considerable gains in recognition performance and throughput can be achieved.The remainder of this paper is structured as follows. In Sect. 2 we give an overview about developments in both areas that we are bringing together in our work. On the one hand we list a number of references in the context of object recognition by computer vision; on the other hand, we cite a number of publications from the area of embedded smart sensors. A detailed description of the methods involved in building our object recognition algorithm is given in part 3. In Sect. 4 we outline our framework and give details about training and implementation of our system. We closely describe all steps in designing our approach and give side notes on alternative methods. In Sect. 5, we experimentally evaluate our system on a challenging object database and discuss real time and real-world issues. Furthermore, we investigate some special features of our approach and elucidate the dependencies of several parameters on the overall system performance. The work concludes with some final notes and an outlook on future work in Sect. 6.2. Related WorkIn the following we will give a short introduction to the topic of local feature based object recognition. Due to the huge amount of literature available, we will focus on the most promising approaches using local features, and refer to those algorithms which are somehow related to our work. We will also give a short overview about object recognition in the context of embedded systems, which, due to the sparseness of existing approaches, contain both global and local methods, as well as algorithms implement on FPGA and DSP-based platforms.Local-appearance based visual object recognition became popular after the development of powerful interest region detectors and descriptors. Early full-featured object recognition systems dealing with all the individual algorithmic steps and their related problems were proposed by Schmid and Mohr, and Schiele and Crowley . The main idea behind local feature based object recognition is maintaining object representations from collections of locally sampled descriptions. In other words, the appearance of local parts of a single object is encoded in descriptors, and a set of these descriptors forms the final object representation. For finding the distinguishable regions, so-called interest region detectors are used, which find regions or points of special visual distinctiveness. The neighborhood of such regions is subsequently encoded using a special transform to build a description inherently providing several desirable properties. Beside insensitivity against illumination changes and partial viewpoint invariance, representations as sets of local descriptors offer robustness against background clutter and partial occlusions. Needless to say that a so called bag of descriptors representation can be built using one single or several combinations of different detectors and descriptors.The collectivity of all descriptors from multiple objects(i.e., bags of descriptors) is used to build a database. Given this database and a new representation of an object to be recognized, correspondences are counted into a voting scheme to determine the correct match. Determining these correspondences is a complex task. Descriptors are high dimensional feature vectors and matching a query descriptor means determining the exact nearest neighbors in the database. Unfortunately, by now, no algorithms are known that can determine the exact nearest neighbor of a point in high-dimensional spaces that are any more efficient than exhaustive search. Due to the large amount of objects, and the large amount of local descriptors, respectively, this type of information management is unwieldy and inefficient. Thus, a number of different methods to approximate the solution in an efficient way have been proposed to keep the performance of an overall object recognition system manageable.The basic principle of interest points and regions is the search for spots and areas in an image which exhibit a predefined property making them special in relation to their local neighborhood. This property should make the region distinguishable from its neighborhood and detectable repeatedly. Furthermore, the detection of these features should beto the best possibleillumination and viewpoint invariant.The first important interest point detector, the so-called Harris Corner detector, was proposed in 1988 by Harris and Stephens. It exhibits excellent repeatability and was subsequently used for object recognition purposes by Schmid and Mohr. An extension to the Harris detector to include scale information was later reported by Mikolajczyk and Schmid as HarrisLaplace detector and was used by Schaffalitzky and Zisserman formulti-view matching of unordered image sets. Another approach to detect blob-like image structure is to search points where the determinant of the Hessian matrix assumes a local extreme um, which is called the Hessian detector. Further developments to include affine covariance resulted in the HarrisAffine and HessianAffine detectors proposed by Mikolajczyk, Mikolajczyk and Schmid.The currently most popular two-part approach known as scale invariant feature transform (SIFT) was proposed by Lowe, where the first part is an interest point detector. The DoG detector takes the differences of Gaussian blurred images as an approximation of the scale normalized Laplacian and uses the local maximum of the responses in scale space as an indicator for a keypoint. A complementary feature detector, the maximally stable extremal regions (MSER) detector, was proposed by Matas et al. In short, the MSER detector searches for regions which are brighter or darker than their surroundings, i.e., are surrounded by darker, vice-versa brighter pixels. First, pixels are sorted in ascending or descending order of their intensity value, depending on the region type to be detected. The pixel array is sequentially fed into a union-find algorithm and a tree-like shaped data structure is maintained, whereas the nodes contain information about pixel neighborhoods, as well as information about intensity value relationships. Finally, nodes which satisfy a set of predefined criteria are sought by a tree-traversing algorithm.Two affine covariant region detectors were proposed by Tuytelaars and Van Gool, intensity-based regions (IBR) and edge-based regions (EBR). IBRs are based onextrema in intensity. Given a local intensity extremum, the brightness function along rays emanating from the extremum is studied. This function itself exhibits an extremum at locations where the image intensity suddenly changes. Linking all points of the emanating rays corresponding to this extremum forms and IBR. EBRs are determined from corner points and edges nearby. Given a single corner point and walking along the edges in opposite directions with two more control points, a one-dimensional class of parallelograms is introduced using the corner itself and the vectors pointing from the corner to the control points. Studying a function of texture and using additional constraints, a single parallelogram is selected to be an EBR.Another algorithm, termed Salient Region detector was proposed by Kadir et al. and is based on the probability density function (PDF) of intensity values computed over an elliptical region. For each pixel, the entropy extrema for an ellipse centered at this pixel is recorded over the ellipse parameters orientation, h, scale s and the ratio of major to minor axis k. From a sorted list of all region candidates the n most salient ones are chosen. For an extensive evaluation of a large number of affine region detectors refer to the work of.Generally speaking, a descriptor is an abstract characterization of an image patch. Usually, the image patch is chosen to be the local environment of an interest region. Based on various algorithms methods or transformations, the resulting character can be made rotation invariant or, at least partially, insensitive to affine transformations.Most approaches are based on gradient calculations or image brightness values. As a second part of the SIFT approach, Lowe proposed the use of descriptors based on stacked gradient histograms. The single histograms are calculated in a subdivided patch describe the gradient orientation in order to cover spatial information. Finally, they are concatenated to form a 128-dimensional descriptor. Recently Ke and Sukthankar, proposed the so called PCASIFT descriptor based on eigenspace analysis. They calculated a principal component analysis (PCA) eigenspace on the gradient images of a representative number of over 20,000 image patches. The descriptor of a new image tile is generated by projecting the gradients of the tile onto the precalculated eigenspace, keeping only the d most significant eigenvectors. Thus, an efficient compression in descriptor dimensionality is achieved, coevally keeping the performance at a rate comparable to the original SIFT descriptor. Closely related to the SIFT approach, the gradient location and orientation histogram (GLOH) descriptor was proposed by Mikolajczyk and Schmid. Opposed to SIFT gradient histograms are calculated on a finer circular rather than on a coarser rectangular grid, which results in a 272-dimensional histogram. PCA is subsequently used to reduce the descriptor dimensionality to 128 again. Two rotation invariant descriptors were proposed by Lazebnik et al, the rotation-invariant feature transform (RIFT) and the SPINImage descriptors. The RIFT descriptor is calculated on a circular normalized patch which is divided into concentric rings of equal width. Within each ring, the gradient orientation histogram is computed while the gradient direction is calculated relative to the direction of the vector pointing outward from the center. The SPIN-Image is a two-dimensional histogram encoding the distribution of image brightness values in the neighborhood of a particular center point. The histogram has two dimensions, namely the distance from the center point and the intensity value. Quantizing the distance, the value of a bin corresponds to the histogram of the intensity values of pixels located at a fixed distance from the center point.附錄B基于DSP的通過局部特征實時物體識別嵌入式系統(tǒng)摘要在過去幾年中,對象識別已經(jīng)成為最熱門的任務,計算機視覺尤其是,這是推動發(fā)展新的強大的算法,局部特征的物體識別。所謂智能相機有足夠的權力分散的圖像處理變得越來越流行的各種任務,特別是在外地的監(jiān)視。它是一個非常重要的工具,強大的識別可疑車輛,人員或物體是否符合公眾安全。這只是局部識別功能的嵌入式平臺的基本功能。在我們的工作中,我們調(diào)查的任務是,目標識別基于狀態(tài)最先進的算法,在一個基于DSP的嵌入式系統(tǒng)。我們執(zhí)行一些功能強大的算法識別物體,即有興趣點探測連同區(qū)域描述,并建立一個中型對象數(shù)據(jù)庫為基礎的詞匯樹,這是適合我們的專用硬件設置。我們仔細研究了該算法參數(shù)性能的嵌入式平臺。我們所研究的,國家最先進的目標識別算法,可以成功地部署在當今智能相機,即使計算和內(nèi)存資源有嚴格的限制。關鍵詞 數(shù)字信號處理;物體識別;本地功能;詞匯樹;1. 介紹識別物體是一個最流行的任務領域中的計算機問題。在過去十年中,大量科學工作者做出努力,建立強有力的目標識別系統(tǒng)的外觀特征與局部特征的程度。對于這樣一個框架,以適用于現(xiàn)實世界中的幾個屬性是非常重要的:對旋轉不敏感,光照或觀點的變化,以及實時的行為和大規(guī)模行動。目前的系統(tǒng)已經(jīng)有很多這些屬性,雖然不是所有的問題已經(jīng)解決,但如今他們變得越來越有吸引力的行業(yè)列入產(chǎn)品的客戶市場。反過來,最近嵌入式視覺平臺,如智能相機已經(jīng)成功地出現(xiàn)了,不過,只有提供數(shù)量有限的計算和內(nèi)存資源。然而,嵌入式視覺系統(tǒng)已經(jīng)在我們的日常生活中。幾乎每個人的手機配備了攝像頭,因此可以被視為一個小型的嵌入式視覺系統(tǒng)。顯然,這會引起新的應用程序,如導航工具,視障人士,或協(xié)作公眾監(jiān)督使用數(shù)以百萬計的人造眼睛。此外,低價格的數(shù)字傳感器和需要增加
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會有圖紙預覽,若沒有圖紙預覽就沒有圖紙。
- 4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 人人文庫網(wǎng)僅提供信息存儲空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負責。
- 6. 下載文件中如有侵權或不適當內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 2025至2030中國自動化光學檢查行業(yè)發(fā)展趨勢分析與未來投資戰(zhàn)略咨詢研究報告
- 2025至2030中國膝關節(jié)軟骨修復與再生行業(yè)市場占有率及投資前景評估規(guī)劃報告
- 2025至2030中國胰島素德特米爾行業(yè)產(chǎn)業(yè)運行態(tài)勢及投資規(guī)劃深度研究報告
- 2025至2030中國肉鴨配合料行業(yè)深度研究及發(fā)展前景投資評估分析
- 2025至2030中國聚氯乙烯電纜行業(yè)市場占有率及投資前景評估規(guī)劃報告
- 2025至2030中國羊膜穿刺針行業(yè)發(fā)展趨勢分析與未來投資戰(zhàn)略咨詢研究報告
- 2025至2030中國網(wǎng)球拍線行業(yè)發(fā)展趨勢分析與未來投資戰(zhàn)略咨詢研究報告
- 環(huán)保工程應急響應及風險控制措施
- 2025至2030中國細胞破壞器設備行業(yè)產(chǎn)業(yè)運行態(tài)勢及投資規(guī)劃深度研究報告
- 2025至2030中國組胺H4受體行業(yè)產(chǎn)業(yè)運行態(tài)勢及投資規(guī)劃深度研究報告
- GB/T 20946-2007起重用短環(huán)鏈驗收總則
- GB/T 18391.3-2009信息技術元數(shù)據(jù)注冊系統(tǒng)(MDR)第3部分:注冊系統(tǒng)元模型與基本屬性
- GB/T 10610-2009產(chǎn)品幾何技術規(guī)范(GPS)表面結構輪廓法評定表面結構的規(guī)則和方法
- 熠搜家庭戶用光伏電站推介
- 濟源幼兒園等級及管理辦法
- 房地產(chǎn)開發(fā)全流程培訓講義課件
- DB44-T 2163-2019山地自行車賽場服務 基本要求-(高清現(xiàn)行)
- 云南省特種設備檢驗檢測收費標準
- DB15T 933-2015 內(nèi)蒙古地區(qū)極端高溫、低溫和降雨標準
- 工傷責任保險單
- 固體廢物采樣培訓
評論
0/150
提交評論