Data preparation in ASEP

  1. Preparation of data files
  2. Selection, size, formats
  3. Documentation, licenses, access, classification
  4. Storage agreement
  5. Linking data with bibliographic records

1. Preparation of data files

The ASEP data repository stores records (metadata) containing a description of the stored dataset. The dataset can be stored directly in the ASEP repository or in another storage facility that does not allow the stored data to be described.

When preparing files to be described and stored, it is necessary to consider:

  • Which files will be stored?
  • Whether the selected files can be stored,
  • Select the size and formats of the files,
  • Name the files appropriately,
  • Prepare a description or other documentation for users.
  • Select a license,
  • Select file access,
  • Prepare keywords and  OECD subject classification

We assume that the instructions for preparing stored datasets may be modified based on actual practice. If you need help storing larger files, please contact the repository administrator  arl@lib.cas.cz.

2. Selection, size, file formats

File selection

Scientific research generates a large number of files, and the ASEP repository is used to store the final files. When changes are made to stored files, a new version of the record must be created.

Files may only be deleted in exceptional and justified cases.

Data files are stored in a zip archive format.

 

The size of the files must correspond to the capabilities of the ASEP repository.

  • The maximum size of a single data record in ASEP is 50 GB.
  • If you need to upload a single file larger than 50 GB or store a dataset, please contact arl@lib.cas.cz for consultation on the options.

 

Choosing a file format

When choosing the format of files stored in a zip archive, it is a good idea to ensure that the format you choose will be accessible in the future. If necessary, the same data can be stored in several formats. When selecting the file format for a dataset, we recommend following these general principles, which we will refine based on practical experience.

 

To ensure long-term access and usability of data in the repository, it is advisable to use standard formats that guarantee long-term protection (LTP – Long Time Period). Formats suitable for long-term preservation are primarily those that are open, well-mapped, and widely supported by software manufacturers—i.e., there are multiple programs from different manufacturers that are capable of opening and displaying the files. When selecting suitable formats for LTP, it is best to follow community recommendations with regard to best practices and generally accepted standards.

 

File naming:

Saved files should be named appropriately. The name should reflect the content of the saved files. File names must not contain diacritical marks, spaces, underscores as separators, or colons. The maximum length of a file name is 127 characters.

3. Documentation, license, access, classification

Documentation

It is important to describe the contents of the datasets clearly so that users who download the files for further use know what files the dataset contains and how they can work with them. If the description field in the data form is not sufficient for the description, we recommend attaching additional „read_me“ files (preferably in txt format), which will make it clear what is stored in the individual data files, or other important instructions for users. For data that is created, edited, or compressed using special software, it is a good idea to include in the description which version of the software was used, what encoding, compression, etc., was used, for easy access and further usability by the user.

 

Information about file formats can also be entered in the metadata form of the data record in the technical information field, see field description

 

License

For each dataset, the depositor must specify the license, i.e., what rules apply to users who download the data and how they can use it. Creative Commons licenses, which are predefined, can be used. If another license applies, its text must be saved with the dataset. Further information about licenses can be found here.  If you need to add a special license, please contact: arl@lib.cas.cz.

Files containing licenses, documentation, and other important information describing the files should be stored separately from the zipped data. Appropriate file naming will also help with clarity and should not be underestimated.

 

File access

  1. Not publicly accessible (On request): users download the files and store them on their computers, and can work with them according to the license published with the record.

 

  1. Open access with embargo on data files, which means that we make them publicly available only after a certain period, which can be set when uploading.

 

  1. Open access for an institute where only employees of the institute where the work was created have access to the files; other users can request the data.

 

  1. Not publicly accessible Files that the user must request (by filling out a form). The decision to make the files accessible is up to the author.

 

If the files are published under an open license (e.g., Creative Commons), access to the files must be publicly available.

 

Classification

For each dataset, it is mandatory to fill in keywords in English and field codes – OECD FORD categories.

4. Storage agreement

Before submitting data for publication, the author must agree to the agreement on storing datasets in the data repository:

Agreement on storing datasets in the ASEP repository. If KNAV receives evidence of copyright infringement, the relevant item will be removed immediately.

 

What you need to be aware of before publishing your prepared data files:

  • You have all the necessary rights to make the data available.
  • You have the consent of co-authors and other data rights holders.
  • You have sufficiently anonymised your data or obtained the express consent of all subjects whose identity could be revealed from the data.

5. Linking data and bibliographic records

Bibliographic records in ASEP and data records in various repositories can be linked to each other.

Example: