OAIS Reference Model

UIS RUSSIA Data Management

Metadata Harvesting

Since we have to harvest data producers for SIPs, metadata extraction is also handled by us. There is a (very short) list of absolutely necessary metadata fields:
While being far short from fully descriptive, it nevertheless allows for a definitive identification of a publication. Additionaly, some other metadata is extracted whenever possible: For tabular data there is an additional list of mandatory metadata to be extracted and processed: This metadata list is extended with any additional metadata we are able to harvest.

Data storage details

The metadata is stored in MySQL DBMS.
Rosstat statistical publications are currently stored as a straight directory tree with index files. By 01.09.2016 this storage will be modified to handle checksums - every file will be renamed to match its contents checksum (SHA256). Logical structure of a Rosstat statistical report will be stored in MySQL DBMS and will represent an additional layer of metadata.
All other analytical publications are stored in MongoDB DBMS, which already has file contents checksums incorporated into its storage procedures. By 31.12.2016 they will also be moved into the abovementioned storage for coherence.