Creating lasting data archives is more pressing than ever, which is the core message from the European Commission’s eArchiving initiative. They’ve just unveiled version 2.0 of their architecture and secured another two years of funding.
This initiative aims to set clear processes using open formats and metadata, meaning organizations won’t be burdened with outdated IT equipment just to access old data. Gregor Završnik, a researcher from the University of Ljubljana and consultant on geospatial data archiving, shared the challenges of retrieving historical data. “You need to access the storage media and read the file format,” he explained. “But the real issue comes after you extract data, like from an Excel table. You often lack the context. You’re left wondering what those numbers actually mean, how they were collected, and their authenticity.”
E-Archiving builds on the E-Ark project, launched in 2014, which focuses on creating sustainable tools for validating, reformatting, and archiving data. The main goal? To ensure archives can communicate with one another through common encoding while meeting regulatory requirements.
Initially, the E-Ark team envisioned a universal archiving format. As the project evolved, they realized most archives are maintained by the original data creators, who often believe their data will hold commercial value in the future. “We need to create a standard that enables companies to restore their archives even years later,” Završnik noted.
However, a significant hurdle has been uniting major players in storage and backup; the initiative predominantly comprises research teams. Now, for E-Ark to transition into eArchiving under the European Commission, the technical work needs to become a widely accepted market standard. A crucial step is ensuring that the universal archive format aligns with the latest revision of ISO 14721, the guideline for open archival information systems.
“If the Commission mandates that public sector entities in the EU adopt our archive format, they can’t compel private companies to follow suit,” Završnik pointed out. “But they can highlight the benefits of using an open format, including escaping the trap of commercial tools and facilitating easier data exchange.”
The eArchiving initiative proposes the Common Specification for Information Packages (CSIP) as its file format. This format is accessible for those wishing to convert data into a long-lasting archive or for software developers looking to implement it. “CSIP is free from commercial licensing. It’s structured for easy re-reading and can be used across various software platforms. Each archive gets a unique numeric ID, and it can define dependencies linked to other data,” Završnik explained.
He elaborated on how data dependencies could relate to Linux packages or software requiring third-party libraries to work, such as needing mapping from one archive to support a land registry archive.
CSIP is utilized through a management platform called OAIS (Open Archival Information Package). This system includes tools to process source data with SIP (Submission Information Package), preserve it post-reformatting with AIP (Archival Information Package), and distribute only the necessary data for specific uses with DIP (Dissemination Information Package).
Each sub-format is tailored with unique metadata. For instance, DIP contains metadata for different fields, whether medical, commercial, architectural, or cartographic. The latest update, version 2.0, enhances the metadata details, categorizing it into six groups: strategy, business, application, technology, implementation, and migration, with each categorized further into passive structure, behavior, active structure, and motivation.