Semantic Data Integration with Chimera
This half-day tutorial introduces participants to the practical challenges of achieving data interoperability across heterogeneous sources and to the advantages of an approach based on knowledge graphs [1]. Considering a practical scenario in the mobility domain (the integration of public transport data with open knowledge from Wikidata), participants will learn how knowledge graphs can support data harmonisation and fusion.
The session combines a conceptual introduction with a guided hands-on exercise using Chimera [2], an open-source framework for building declarative and composable semantic data transformation pipelines. Participants will design and execute a complete data integration pipeline — from ingestion of structured data to RDF lifting, SPARQL-based enrichment and construction, and RDF lowering — using only YAML route definitions and declarative mapping templates [3]. No programming experience is required.
Learning Outcomes
By the end of this tutorial, participants will be able to:
- Explain the any-to-RDF-to-any integration pattern and its role in enabling semantic interoperability
- Configure Apache Camel routes augmented with Chimera components for data transformation tasks
- Write lifting and lowering templates using the Mapping Template Language (MTL) to convert between arbitrary formats and RDF
- Apply semantic transformations within a pipeline to build and reshape knowledge graphs
- Integrate external data sources (e.g. Wikidata SPARQL endpoint) as enrichment sources in a declarative pipeline
- Deploy and run a complete end-to-end pipeline using the provided Docker environment
Running Example
To illustrate the pipeline stages, participants will work with a scenario involving the integration of public transport stop data (in GTFS format) with geographic and descriptive information retrieved from Wikidata. The resulting knowledge graph is visualised on an interactive online map that updates as data flows through the pipelines built by the participants.

An interactive dashboard fed by the Chimera pipeline that will be built during the hands-on session.
This scenario is representative of a broad class of integration problems encountered in domains such as smart cities, industry 4.0, and health data management, where heterogeneous sources can be unified under a common semantic model [4,5].
Tutorial Structure
| Segment | Duration | Content |
|---|---|---|
| Part 1 — Data Interoperability Challenges | 30 min + [15 min hands-on] | Key challenges in heterogeneous data integration; limitations of ad-hoc approaches; knowledge graphs as a unifying model |
| Part 2 — The Chimera Framework | 30 min + [15 min hands-on] | Architecture overview; the any-to-RDF-to-any pattern; Chimera framework and components; RDF Mapping Language (RML) vs Mapping Template Language (MTL) |
| Break | 30 min | |
| Part 3 — Hands-on Session | 1 h 30 min | Guided pipeline construction: ingestion, lifting, SPARQL enrichment, construction, lowering, and visualisation |
Pipeline Stages Covered
Participants will configure and run each of the following stages during the hands-on session:
| Stage | Description |
|---|---|
| Ingest | Read structured data files (CSV within ZIP archives) re-using the wide library of Apache Camel components within Chimera pipelines |
| Lift | Convert tabular records to RDF triples using MTL lifting templates |
| Enrich | Query a remote SPARQL endpoint (Wikidata) to retrieve additional structured information |
| Construct | Shape the knowledge graph using SPARQL CONSTRUCT queries |
| Lower | Serialise RDF back to a target format (CSV) using MTL lowering templates |
| Visualise | Observe pipeline output in a live interactive map interface |
Prerequisites
Participants are expected to have:
- A laptop with Docker installed (all software dependencies are provided as container images; no local JDK or Python installation is required): How to install Docker
- Basic familiarity with structured data formats (CSV, JSON)
- Basic knowledge of RDF and the Semantic Web stack (recommended)
Tutorial Materials
Slides and all required materials will be made available on this page before the start of the conference.
- Slides: To be published
- Docker image & setup instructions: To be published
- Chimera repository: https://github.com/cefriel/chimera
Presenters
Marco Grassi
Instructor Knowledge Technologies Researcher, Cefriel
Marco Grassi's research focuses on semantic technologies and data interoperability. He is the lead developer of the Chimera framework and the principal author of its tutorial materials.
Mario Scrocca
Instructor Senior Knowledge Technologies Researcher, Cefriel
Mario Scrocca's research interests include knowledge representation, data management, and semantic interoperability, with applications in mobility and industrial domains. He is a maintainer of the Chimera framework and has co-organised tutorials and courses on Knowledge Graph Construction topics.
Alessio Carenini
Organizer Senior Researcher and Software Architect, Cefriel
Alessio Carenini has over 18 years of experience in European research projects, with a focus on the application of Semantic Web technologies to knowledge management in data-sharing ecosystems, including metadata modelling and data spaces.
Irene Celino
Organizer Research Line Manager, Cefriel
Irene Celino coordinates research activities at Cefriel. Her interests span knowledge graphs, semantic interoperability, human-in-the-loop AI, and the human-centric evaluation of AI systems, with over 20 years of experience in cooperative research projects.
References
[1] Scrocca, M., Comerio, M., Carenini, A., Celino, I. Turning transport data to comply with EU standards while enabling a multimodal transport knowledge graph. In: Proceedings of the 19th International Semantic Web Conference (ISWC 2020). Lecture Notes in Computer Science, vol. 12507, pp. 411–429. Springer (2020). DOI, arXiv
[2] Grassi, M., Scrocca, M., Carenini, A., Comerio, M., Celino, I. Composable semantic data transformation pipelines with Chimera. In: Proceedings of the 4th International Workshop on Knowledge Graph Construction, co-located with ESWC 2023. CEUR Workshop Proceedings, vol. 3471. CEUR (May 2023). CEUR
[3] Scrocca, M., Carenini, A., Grassi, M., Comerio, M., Celino, I. Not everybody speaks RDF: Knowledge conversion between different data representations. In: Proceedings of the 5th International Workshop on Knowledge Graph Construction, co-located with ESWC 2024. CEUR Workshop Proceedings, vol. 3718. CEUR (May 2024). CEUR
[4] Scrocca, M., et al. Intelligent Urban Traffic Management via Semantic Interoperability Across Multiple Heterogeneous Mobility Data Sources. In: Proceedings of the 23rd International Semantic Web Conference (ISWC 2024). Springer Nature Switzerland, Cham (November 2024). DOI, arXiv
[5] Scrocca, M., Grassi, M., Carenini, A., Anicic, D., Calbimonte, J. P., & Celino, I. A DataOps Toolbox Enabling Continuous Semantic Integration of Devices for Edge‑Cloud AI Applications. In: Proceedings of the 24th International Semantic Web Conference (ISWC 2025). Springer Nature Switzerland, Cham (October 2025). DOI, arXiv
This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101140087 (
This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101092908 (
This work has been partially funded by the European Union's Horizon Europe research and innovation programme under grant agreement No. 101239472 (