Skip to the content.

Semantic Data Integration with Chimera

This half-day tutorial introduces participants to the practical challenges of achieving data interoperability across heterogeneous sources and to the advantages of an approach based on knowledge graphs [1]. Considering a practical scenario in the mobility domain (the integration of public transport data with open knowledge from Wikidata), participants will learn how knowledge graphs can support data harmonisation and fusion.

The session combines a conceptual introduction with a guided hands-on exercise using Chimera [2], an open-source framework for building declarative and composable semantic data transformation pipelines. Participants will design and execute a complete data integration pipeline — from ingestion of structured data to RDF lifting, SPARQL-based enrichment and construction, and RDF lowering — using only YAML route definitions and declarative mapping templates [3]. No programming experience is required.


Learning Outcomes

By the end of this tutorial, participants will be able to:


Running Example

To illustrate the pipeline stages, participants will work with a scenario involving the integration of public transport stop data (in GTFS format) with geographic and descriptive information retrieved from Wikidata. The resulting knowledge graph is visualised on an interactive online map that updates as data flows through the pipelines built by the participants.

Interactive map showing public transport stops enriched with Wikidata landmarks

An interactive dashboard fed by the Chimera pipeline that will be built during the hands-on session.

This scenario is representative of a broad class of integration problems encountered in domains such as smart cities, industry 4.0, and health data management, where heterogeneous sources can be unified under a common semantic model [4,5].


Tutorial Structure

Segment Duration Content
Part 1 — Data Interoperability Challenges 30 min + [15 min hands-on] Key challenges in heterogeneous data integration; limitations of ad-hoc approaches; knowledge graphs as a unifying model
Part 2 — The Chimera Framework 30 min + [15 min hands-on] Architecture overview; the any-to-RDF-to-any pattern; Chimera framework and components; RDF Mapping Language (RML) vs Mapping Template Language (MTL)
Break 30 min  
Part 3 — Hands-on Session 1 h 30 min Guided pipeline construction: ingestion, lifting, SPARQL enrichment, construction, lowering, and visualisation

Pipeline Stages Covered

Participants will configure and run each of the following stages during the hands-on session:

Stage Description
Ingest Read structured data files (CSV within ZIP archives) re-using the wide library of Apache Camel components within Chimera pipelines
Lift Convert tabular records to RDF triples using MTL lifting templates
Enrich Query a remote SPARQL endpoint (Wikidata) to retrieve additional structured information
Construct Shape the knowledge graph using SPARQL CONSTRUCT queries
Lower Serialise RDF back to a target format (CSV) using MTL lowering templates
Visualise Observe pipeline output in a live interactive map interface

Prerequisites

Participants are expected to have:


Tutorial Materials

Slides and all required materials will be made available on this page before the start of the conference.


Presenters

Marco Grassi

Marco Grassi

Instructor Knowledge Technologies Researcher, Cefriel

Marco Grassi's research focuses on semantic technologies and data interoperability. He is the lead developer of the Chimera framework and the principal author of its tutorial materials.

Mario Scrocca

Mario Scrocca

Instructor Senior Knowledge Technologies Researcher, Cefriel

Mario Scrocca's research interests include knowledge representation, data management, and semantic interoperability, with applications in mobility and industrial domains. He is a maintainer of the Chimera framework and has co-organised tutorials and courses on Knowledge Graph Construction topics.

Alessio Carenini

Alessio Carenini

Organizer Senior Researcher and Software Architect, Cefriel

Alessio Carenini has over 18 years of experience in European research projects, with a focus on the application of Semantic Web technologies to knowledge management in data-sharing ecosystems, including metadata modelling and data spaces.

Irene Celino

Irene Celino

Organizer Research Line Manager, Cefriel

Irene Celino coordinates research activities at Cefriel. Her interests span knowledge graphs, semantic interoperability, human-in-the-loop AI, and the human-centric evaluation of AI systems, with over 20 years of experience in cooperative research projects.


References

[1] Scrocca, M., Comerio, M., Carenini, A., Celino, I. Turning transport data to comply with EU standards while enabling a multimodal transport knowledge graph. In: Proceedings of the 19th International Semantic Web Conference (ISWC 2020). Lecture Notes in Computer Science, vol. 12507, pp. 411–429. Springer (2020). DOI, arXiv

[2] Grassi, M., Scrocca, M., Carenini, A., Comerio, M., Celino, I. Composable semantic data transformation pipelines with Chimera. In: Proceedings of the 4th International Workshop on Knowledge Graph Construction, co-located with ESWC 2023. CEUR Workshop Proceedings, vol. 3471. CEUR (May 2023). CEUR

[3] Scrocca, M., Carenini, A., Grassi, M., Comerio, M., Celino, I. Not everybody speaks RDF: Knowledge conversion between different data representations. In: Proceedings of the 5th International Workshop on Knowledge Graph Construction, co-located with ESWC 2024. CEUR Workshop Proceedings, vol. 3718. CEUR (May 2024). CEUR

[4] Scrocca, M., et al. Intelligent Urban Traffic Management via Semantic Interoperability Across Multiple Heterogeneous Mobility Data Sources. In: Proceedings of the 23rd International Semantic Web Conference (ISWC 2024). Springer Nature Switzerland, Cham (November 2024). DOI, arXiv

[5] Scrocca, M., Grassi, M., Carenini, A., Anicic, D., Calbimonte, J. P., & Celino, I. A DataOps Toolbox Enabling Continuous Semantic Integration of Devices for Edge‑Cloud AI Applications. In: Proceedings of the 24th International Semantic Web Conference (ISWC 2025). Springer Nature Switzerland, Cham (October 2025). DOI, arXiv


SMARTY logo This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101140087 (SMARTY, Chips Joint Undertaking).
SmartEdge logo This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101092908 (SmartEdge).
UrbanFlow logo This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101239472 (UrbanFlow).