Skip to content

File storage

Two different concepts are involved in the storing of files in InvenioRDM. One is the backend, meaning the actual technology that is used to store a file. For example, the local file system or S3. The other concept is the origin , also known as method used to transport the files. There are three such defined methods.

  • Local, which represents the files that are managed by the InvenioRDM instance, independently of the backend.
  • Fetch, these are files that are not immediately managed by the instance as they need to be downloaded first. This means that they will eventually become local files.
  • Remote, these are represented by a reference to an external storage system. Since the files are not managed by the instance there is no possible way to guarantee their availability or integrity. At the moment this method is not supported by InvenioRDM.

These types of file storage origin/method are stored in the storage_class attribute of the file model, and represented by a one character encoding:

Type Representation
Local L
Fetch F
Remote R

Local files (L)

Local files are managed as defined in the records and drafts reference section.

Files fetching (F)

Introduced in InvenioRDM v11

Experimental feature

The file fetching mechanism in InvenioRDM v11 has a few limitations. Be aware that future releases of InvenioRDM might introduce breaking changes. We will document them as extensively as possible.

Use it at your own risk!

Fetched files accept two more arguments than local files in their initialization: storage_class, and uri:

Parameters

Name Type Location Description
storage_class string body "L"
uri string body URL to fetch the file from

The uri must be a URL, accessible from the server's network and resolving to a file that can be fetched. No authentication mechanism (e.g. Authorization header) is supported for the request process, so any authentication has to be part of the URL itself (e.g. a token passed in a query string).

Request

POST /api/records/{id}/draft/files HTTP/1.1
Content-Type: application/json

[
    {
        "key": "dataset.zip",
        "uri": "https://example.org/files/dataset.zip?token=<auth token>",
        "storage_class": "F",
    },
    ...
]

Response

HTTP/1.1 201 CREATED
Content-Type: application/json

{
  "enabled": true,
  "default_preview": null,
  "order": [],
  "entries": [
    {
      "key": "dataset.zip",
      "updated": "2020-11-27 11:17:11.002624",
      "created": "2020-11-27 11:17:10.998919",
      "metadata": null,
      "status": "pending",
      "storage_class": "F",
      "uri": "https://example.org/files/dataset.zip?token=<auth token>",
      "links": {
        "content": "/api/records/{id}/draft/files/dataset.zip/content",
        "self": "/api/records/{id}/draft/files/dataset.zip",
        "commit": "/api/records/{id}/draft/files/dataset.zip/commit"
      },
    }
  ],
  "links": {
    "self": "/api/records/{id}/draft/files"
  },
}

At this point an asynchronous task will be launched and the file will be transported into the InvenioRDM instance. Once the file transfer is completed, the status field will be changed to completed. At this point the storage_class of the files has also changed to L. The status can be checked using the files url (/api/records/{id}/draft/files). Note, until all the files have been transferred (i.e. their status is completed) the record cannot be published.

More over, while files are being transferred requests to the content and commit endpoints are not allowed (disabled).

Security

By default file fetching will be refused. Files can only be fetched from a configurable list of trusted domains, which can be configured in the invenio.cfg file.

RECORDS_RESOURCES_FILES_ALLOWED_DOMAINS = [
    "example.org",
    "mystoragehosting.com",
]

Remote files (R)

Not supported

Remote files are currently not supported.