Artificial Intelligence (AI) and Machine Learning (ML) algorithms have great potential to advance processing & analysis of Earth Observation (EO) data. Training datasets (TDS) are crucial for ML and AI applications but they are becoming a major bottleneck in more widespread and systematic application of AI/ML in EO. The issues include:
- General lack and inaccessibility of high-quality TDS
- Absence of standards resulting in inconsistent and heterogeneous TDS (data structures, file formats, quality control, meta data, repositories, licenses, etc.)
- Limited discoverability and interoperability of TDS
- Lack of best-practices & guidelines for generating, structuring, describing and curating TDS
Another obstacle to the use of AI/ML in EO applications for non-EO experts is a lack of domain specific knowledge such as map projections, file formats, calibration and quality assurance. As such, AI-Ready EO Training Datasets (AIREO) should be self-explanatory, follow FAIR principles and be directly ingestible for AI/ML applications.
- Review current initiatives, activities, techniques,tools, practices and requirements for preparing, using and sharing AI-Ready EO Training Datasets
- Setup AIREO network of stakeholders and practitioners in the AI/ML, EO, data science in communities and from other relevant science disciplines.
- Capture community requirements and develop: Specifications for AIREO datasets by leveraging existing formats and standards; Best-practices guidelines for preparing, using and sharing AIREO TDS; Pilot and benchmark AIREO datasets for selected use-case applications ; A Python library, compatible with OGC web; interface standards and RESTful APIs, for ingesting AIREO TDS into workflows; Jupyter notebooks showing the use of AIREO pilot datasets & Python library.
- AIREO specifications, best practices and datasets will: Meet FAIR (Findable, Accessible, Interoperable, Reusable) data principles; Involve and build on top of relevant community initiatives