File storage¶
Two different concepts are involved in the storing of files in InvenioRDM. One is the backend, meaning the actual technology that is used to store a file. For example, the local file system or S3. The other concept is the origin , also known as method used to transport the files. There are three such defined methods.
- Local, which represents the files that are managed by the InvenioRDM instance, independently of the backend.
- Fetch, these are files that are not immediately managed by the instance as they need to be downloaded first. This means that they will eventually become local files.
- Remote, these are represented by a reference to an external storage system. Since the files are not managed by the instance there is no possible way to guarantee their availability or integrity. At the moment this method is not supported by InvenioRDM.
These types of file storage origin/method are stored in the storage_class
attribute of the file model, and
represented by a one character encoding:
Type | Representation |
---|---|
Local | L |
Fetch | F |
Remote | R |
Local files (L)¶
Local files are managed as defined in the records and drafts reference section.
Files fetching (F)¶
Introduced in InvenioRDM v11
Experimental feature
The file fetching mechanism in InvenioRDM v11 has a few limitations. Be aware that future releases of InvenioRDM might introduce breaking changes. We will document them as extensively as possible.
Use it at your own risk!
Fetched files accept two more arguments than local files in their
initialization: storage_class
, and
uri
:
Parameters
Name | Type | Location | Description |
---|---|---|---|
storage_class |
string | body | "L" |
uri |
string | body | URL to fetch the file from |
The uri
must be a URL, accessible from the server's network and resolving to a file
that can be fetched. No authentication mechanism (e.g. Authorization
header) is
supported for the request process, so any authentication has to be part of the URL itself
(e.g. a token passed in a query string).
Request
POST /api/records/{id}/draft/files HTTP/1.1
Content-Type: application/json
[
{
"key": "dataset.zip",
"uri": "https://example.org/files/dataset.zip?token=<auth token>",
"storage_class": "F",
},
...
]
Response
HTTP/1.1 201 CREATED
Content-Type: application/json
{
"enabled": true,
"default_preview": null,
"order": [],
"entries": [
{
"key": "dataset.zip",
"updated": "2020-11-27 11:17:11.002624",
"created": "2020-11-27 11:17:10.998919",
"metadata": null,
"status": "pending",
"storage_class": "F",
"uri": "https://example.org/files/dataset.zip?token=<auth token>",
"links": {
"content": "/api/records/{id}/draft/files/dataset.zip/content",
"self": "/api/records/{id}/draft/files/dataset.zip",
"commit": "/api/records/{id}/draft/files/dataset.zip/commit"
},
}
],
"links": {
"self": "/api/records/{id}/draft/files"
},
}
At this point an asynchronous task will be launched and the file will be transported into
the InvenioRDM instance. Once the file transfer is completed, the status field will be
changed to completed
. At this point the storage_class
of the files has also changed
to L
. The status can be checked using the files url (/api/records/{id}/draft/files
).
Note, until all the files have been transferred (i.e. their status is completed
) the
record cannot be published.
More over, while files are being transferred requests to the content
and commit
endpoints are not allowed (disabled).
Security¶
By default file fetching will be refused. Files can only be fetched from a configurable
list of trusted domains, which can be configured in the invenio.cfg
file.
RECORDS_RESOURCES_FILES_ALLOWED_DOMAINS = [
"example.org",
"mystoragehosting.com",
]
Remote files (R)¶
Not supported
Remote files are currently not supported.