Summary
There is a growing need for accurate and scalable techniques for satellite EO images understanding, search and retrieval from the massive archives (e.g., Copernicus archives) has appeared. However, in the era of big data, the semantic content of the satellite data is much more relevant than the keywords/tags. To keep up with the growing need of automatization, image search engines that extract and exploit the content of the satellite images are necessary, exploiting cutting-edge technologies and advances in Natural Language Processing (NLP), Machine Learning (ML) and Computer Vision (CV) applied to Earth Observation challenges (NLP4EO). In other words, the need is emerging of being able to go beyond the traditional query of EO data catalogues based on technical image metadata (location, time of acquisition, technical parameters) and enrich the semantic content of image catalogues enabling a brand new class of query possibilities powered by the combination of NLP (to understand the query and to describe the content of the data) and CV to massively annotate data and implement multi-modal text-to-image and image-to-image searches. Such search engines with ‘query by content’ functionalities are not existing yet neither within the DIAS platforms nor in other satellite EO data platforms.
Moreover, the added value of a Digital Assistant capable to understand complex requests related to geospatial data searches could go well beyond the expansion of dimensions that we are able to use to query EO data archives and include also advanced capabilities to understand and process a User request, selecting the most suitable workflow to satisfy the request, being able to autonomously execute processing on EO and non EO data and, finally, answer the initial question posed by the User.
In this scenario, the development of a precursor demonstrator of a Digital Assistant will adhere to the following high-level objectives:
- [OBJ-1] Explore innovative CV and NLP techniques for Content Based Image Retrieval (CBIR), taking into account both their level of maturity and the applicability to real EO Use Cases.
- [OBJ-2] Develop a prototype Digital Assistant exploiting the currently available capabilities in terms of massive processing of EO data for both the training of the “Content-based” Query Engines and for the implementation of the prototype Digital Assistant capable to let the Users interact with the EO data i.e., asking questions in natural language and starting a conversation with the Digital Assistant.
- [OBJ-3] Demonstrate the value of the Digital Assistant in real life EO Use Cases, to make sure that the demonstrator Digital Assistant can have a positive impact in the community of users
- [OBJ-4] Engage with the community of ML, NLP and CV, since the osmosis between different fields of application of similar technologies is extremely important to accelerate the development of innovative solutions.