OAIS Reference Model
UIS RUSSIA Data Management
Since we have to harvest data producers for SIPs, metadata extraction is also handled by us.
There is a (very short) list of absolutely necessary metadata fields:
- Source name
- Publication date
- Original URL
While being far short from fully descriptive, it nevertheless allows for a definitive identification
of a publication. Additionaly, some other metadata is extracted whenever possible:
- Journal number/part/volume/page
- Publication title
- Publication type
- Publication category
For tabular data there is an additional list of mandatory metadata to be extracted and processed:
- Indicator's name
- Associated region(s)
- Measurement unit
- Comments (if present)
This metadata list is extended with any additional metadata we are able to harvest.
Data storage details
The metadata is stored in MySQL DBMS.
Rosstat statistical publications are currently stored as a straight directory tree with index files.
By 01.09.2016 this storage will be modified to handle checksums - every file will be renamed to match its contents checksum (SHA256).
Logical structure of a Rosstat statistical report will be stored in MySQL DBMS and will represent an additional layer of metadata.
All other analytical publications are stored in MongoDB DBMS, which already has file contents checksums incorporated into its storage procedures.
By 31.12.2016 they will also be moved into the abovementioned storage for coherence.