General
The depa.tech content delivery service (cds) is a HTTP service that provides access to information stored in the depa.tech document store. Depending on the publishing office, we stored a variety of artifacts, some directly from the office itself, others generated during the import process.
It is possible to download either single artifacts, all artifacts of a single document or all artifacts of a list of documents. Multiple artifacts/documents are downloaded as ZIP archives.
Information Required
The only information required to access the cds is a valid depa.tech document ID (such as DE.000112014003467.A5). Accessing the endpoint of a document directly will return a list, in JSON format, of all artifacts we currently hold for that particular document.
Authentification
To use the cds, authentication is required, customers need an account for the depa.tech proxy. Please send an email to marc.haus@mtc.berlin if you need credentials.
Naming Conventions
To allow for uniform access to the various publishing offices that depa.tech supports, a naming convention has been employed for common artifacts. For example, the PDF is always called DOCUMENT.PDF, the XML DOCUMENT.XML, and the individual page files PAGEnnnn. This conforms to the standard DEPAROM naming conventions. A DEPAROM client is not required to use cds.
Artifacts Stored in cds
Currently, depa.tech holds the following artifacts:
- Office XML
The XML source from the publishing office contained, at minimum, the bibliographic data. For some offices, full text is available. See table below. - Office PDF
The PDF file from the publishing office. - PAGEnnnn files
Each page as single page TIFF in CCITT format (single bit, black and white) at 300 DPI. Generally rendered from the Office PDF. - Embedded Images
Images and drawings in TIFF format. Naming convention depends on publishing office. Used in combination with the XML. - mtc JSON
The document XML in JSON format. The artifact name ismtc.json
. - mtc Simple JSON
A simplified JSON format that complies more closely with DEPAROM. Generally speaking, for most uses the mtc JSON format is more appropriate since it is more. The artifact name ismtc.simple.json
. - mtc Artifacts JSON
A file in JSON format containing a list of all available artifacts. The file is calledartifacts.json
. Is similar is usage to what is returned by the List Artifacts endpoint. This file also contains structural information about what document sections are on which pages.
HTTP API
General Error Responses
The HTTP API returns the following error codes, these are listed here. Other responses are listed in the tables further down.
HTTP Code | Reason | Comments |
---|---|---|
404 | Document Not Found | Returned if a document is requested that is not in the store of if the document ID is malformed. Also returned for non-existent artifacts. |
500 | Internal Server Error | This error code indicates that something went wrong during the request. If errors persist, please contact mtc (support@depa.tech). |
Service Endpoints
Action | Method | Path | Body | Response | Comment |
---|---|---|---|---|---|
BASE url | https://api.depa.tech | ||||
List Artifacts | GET | /cds/:docid | Returns a JSON response | 200 OK | :docid is a depa.tech document ID. List available artifacts of a document. |
Download Artifact | GET | /cds/:docid/:artifact | Response depends on artifact mime type | 200 OK | downloads the artefact directly. :artifact is the name of the artifact as listed in the artifacts.json file or the “List Artifacts” endpoint. |
Download all Artifacts | GET | /cds/zip/:docid | A ZIP stream containing all artifacts of a document. | 200 OK | downloads all available artifacts of a document as a ZIP file. The filename is generated and have a “cds-” prefix. |
Bulk Download | POST | /cds/zip | Request must contain a JSON body describing the documents to download. A Filter can be used to specify documents to download. Response is a ZIP stream. | 200 OK | downloads all artifacts of all documents requested as a single ZIP file. The artifacts of each document are stored in a separate folder. If a document is not available, then that document will be missing the ZIP archive. There will be no error code in this case. You may want to use “Check Availability” endpoint to identify missing documents. |
Check Availability | POST | /cds/check | Json body contains a list of docids that are missing in store. | 200 OK | check if requested docids are in Store available. |
JSON Formats
Response from List Artifacts
Field | Format | Usage / Comments |
---|---|---|
artifacts | JSON array of strings | Contains file names within the requested directory |
container | String | Actual container name in the store |
docid | String | docid without revision suffix |
Example
{
"container"
:
"DE.000112014003467.A5"
,
"docid"
:
"DE.000112014003467.A5"
,
"artifacts"
: [
"artifacts.json"
,
"DOCUMENT.PDF"
,
"DOCUMENT.XML"
,
"mtc.json"
,
"mtc.simple.json"
,
"mtc_source.json"
,
"PAGE0001"
]
}
Example
{
"docids"
: [
"EP.000000002678869.A2"
,
"DE.000202011110740.U1"
],
"filter"
: [
"*.xml"
,
"*.pdf"
]
}
MTC Artifacts JSON
Field | Format | Usage / Comments |
---|---|---|
id | JSON String | depa.tech ID of the corresponding document. |
artifacts | List of JSON Strings | List of available artifacts, as returned by the List Artifacts endpoint. |
sections | List of JSON Dictionaries | For each dictionary, the key is the section name and the fields start and end denote the start and end page numbers of that section. depa.tech supports the following sections:
The presence of a Claims or Description section does not necessarily mean that the XML document is full text. |
Example
{
"id"
:
"DE.000112014003467.A5"
,
"artifacts"
: [
"DOCUMENT.PDF"
,
"DOCUMENT.XML"
,
"mtc.json"
,
"mtc.simple.json"
,
"mtc_source.json"
,
"PAGE0001"
,
"artifacts.json"
],
"sections"
: [
{
"section"
:
"Title"
,
"start"
: 1,
"end"
: 1
}
]
}
Reponse JSON for Check Availability
Field | Format | Usage / Comments |
---|---|---|
docids | String | List of docids |