Enhancing visual intelligence for smart cities

January 30, 2024

Raffaele Imbriaco defended his PhD thesis at the Department of Electrical Engineering on January 18th.

In the bustling landscape of modern cities, a silent revolution is underway—one fueled by the omnipresence of sensors and the burgeoning power of artificial intelligence. Imagine a city where cameras not only capture moments but also serve as guardians of safety, guides for navigation, and stewards of traffic flow. This vision isn't just futuristic fantasy; it's the promising reality illuminated by recent advancements in visual intelligence. The PhD research of Raffaele Imbriaco sheds light on the transformative potential of harnessing vast datasets and cutting-edge algorithms to sculpt the urban environments of tomorrow.

With the increasing ubiquity of sensors in urbanized regions, it has become possible to collect large datasets across a myriad of sensors. Among these sensors, cameras are arguably the most common acquisition device. Often, these image collections include geographical data, enabling the development of systems for visual navigation, traffic management, and surveillance. The demand for these applications is expected to increase as cities become smarter in data management and usage. Artificial intelligence appears as a suitable foundation upon which these city-scale systems can be based. However, to enable the aforementioned high-level applications, it is fundamental that the underlying models can adequately represent the varying content captured in these extensive image collections. This is incredibly challenging because cities and urban regions are mutable, as are the conditions under which the data are acquired. Therefore, automated systems should be capable of extracting meaningful representations from the data to enable a wide range of applications necessary for Smart Cities and surveillance.

Enrich the capabilities of CBIR systems

In this thesis, several information selection techniques are explored for representation learning with images. The connection between these techniques is the underlying system concept (image retrieval) used to recognize the content of images and rank their similarity. This thesis can be broadly divided into two areas. The first area concentrates on improving the descriptiveness of a Content-Based Image Retrieval (CBIR) system. This is studied either using improved representation learning (e.g. via different techniques and convolutional architectures) or with enhanced re-ranking algorithms. The related contributions are applied to specific domains and tasks relevant for Smart Cities, specifically geolocation, image matching on Remote Sensing datasets, and identification of vehicle identities. The second area of this thesis extends image retrieval systems by considering the introduction of additional input data modalities. The specific modalities studied in this thesis are text-to-image retrieval and multi-label Remote Sensing image retrieval.

Paving the path to urban intelligence

In conclusion, the contributions of this thesis present various improvements to CBIR systems used within the context of Smart Cities and surveillance. The first notable contribution is the development of a flexible yet accurate feature extraction stage that exploits both local and global features. This approach enables a flexible framework where complete images and patch-based processing can be exploited. A second contribution is the extension of CBIR systems to cope with additional modalities besides images, i.e. textual descriptions, or multiple labels. It is found that this can be accomplished in good harmony with the visual processing offering significant performance gains, at the expense of additional encoding of the extra modality. A third contribution is the development of advanced post-processing algorithms, capable of enhancing the retrieval performance of a system. Overall, the contributions of this thesis not only strengthen the results of CBIR systems in terms of performance, but also bridge the implementation of CBIR systems towards Smart City applications and management, thereby paving the way further for this interesting and attractive field.

 

Title of PhD thesis: Representation learning for street-view and aerial image retrieval. Supervisors: Prof. Peter de With and Dr. Egor Bondarau.

Media Contact

Rianne Sanders
(Communications Advisor ME/EE)