depa.store
This service provides access to document artifacts stored in depa.tech. Use this service to download PDF files, XML files and individual page files.
General
The depa.tech content delivery service (cds) is a HTTP service that provides access to information stored in the depa.tech document store. Depending on the publishing office, we stored a variety of artifacts, some directly from the office itself, others generated during the import process.
It is possible to download either single artifacts, all artifacts of a single document or all artifacts of a list of documents. Multiple artifacts/documents are downloaded as ZIP archives.
Information Required
The only information required to access the cds is a valid depa.tech document ID (such as DE.000112014003467.A5). Accessing the endpoint of a document directly will return a list, in JSON format, of all artifacts we currently hold for that particular document.
Naming Conventions
To allow for uniform access to the various publishing offices that depa.tech supports, a naming convention has been employed for common artifacts. For example, the PDF is always called DOCUMENT.PDF, the XML DOCUMENT.XML, and the individual page files PAGEnnnn. This conforms to the standard DEPAROM naming conventions. A DEPAROM client is not required to use cds.
Artifacts Stored in cds
Currently, depa.tech holds the following artifacts:
Office XML
The XML source from the publishing office contained, at minimum, the bibliographic data. For some offices, full text is available. See table below.
Office PDF
The PDF file from the publishing office.
PAGEnnnn files
Each page as single page TIFF in CCITT format (single bit, black and white) at 300 DPI. Generally rendered from the Office PDF.
Embedded Images
Images and drawings in TIFF format. Naming convention depends on publishing office. Used in combination with the XML.
mtc JSON
The document XML in JSON format. The artifact name is mtc.json.
mtc Simple JSON
A simplified JSON format that complies more closely with DEPAROM. Generally speaking, for most uses the mtc JSON format is more appropriate since it is more. The artifact name is mtc.simple.json.
mtc Artifacts JSON
A file in JSON format containing a list of all available artifacts. The file is called artifacts.json. Is similar is usage to what is returned by the List Artifacts endpoint. This file also contains structural information about what document sections are on which pages.
HTTP API
General Error Responses
The HTTP API returns the following error codes, these are listed here. Other responses are listed in the tables further down.
HTTP Code | Reason | Comments |
---|---|---|
404 | Document Not Found | Returned if a document is requested that is not in the store or if the document ID is malformed. Also returned for non-existent artifacts. |
500 | Internal Server Error | This error code indicates that something went wrong during the request. If errors persist, please contact MTC (support@depa.tech). |
Service Endpoints
Action | Method | Path | Body | Response | Comment |
---|---|---|---|---|---|
BASE url | https://api.depa.tech | Base URL for API endpoint | |||
List Artifacts | GET | /cds/:docid | 200 OK | :docid is a depa.tech document ID. List available artifacts of a document. | |
Download Artifact | GET | /cds/:docid/:artifact | 200 OK | Downloads the artifact directly. :artifact is the name of the artifact as listed in the artifacts.json file or the "List Artifacts" endpoint. | |
Download All Artifacts | GET | /cds/zip/:docid | 200 OK | Downloads all available artifacts of a document as a ZIP file. The filename is generated and has a "cds-" prefix. | |
Bulk Download | POST | /cds/zip | Request must contain a JSON body describing the documents to download. A Filter can be used to specify documents to download. | 200 OK | Downloads all artifacts of all documents requested as a single ZIP file. The artifacts of each document are stored in a separate folder. |
Check Availability | POST | /cds/check | JSON body contains a list of docids that are missing in store. | 200 OK | Check if requested docids are available in the store. |
JSON Formats
Response from List Artifacts
Field | Format | Usage / Comments |
---|---|---|
artifacts | JSON array of strings | Contains file names within the requested directory |
container | String | Actual container name in the store |
docid | String | docid without revision suffix |
Example:
response example
Request JSON for Bulk Download
Field | Format | Usage / Comments |
---|---|---|
docids | JSON array of strings | List of docids. All documents will be added to the ZIP file. |
filter | JSON array of strings | List of filters, wildcards (*) are supported. The Filter section is optional. |
Example:
Request JSON for Bulk Download Example
MTC Artifacts JSON
Field | Format | Usage / Comments |
---|---|---|
id | JSON string | depa.tech ID of the corresponding document. |
artifacts | List of JSON Strings | List of available artifacts, as returned by the List Artifacts endpoint. |
sections | List of JSON Dictionaries | For each dictionary, the key is the section name and the fields start and end denote the start and end page numbers of that section. depa.tech supports the following sections:
The presence of a Claims or Description section does not necessarily mean that the XML document is full text. |
Example:
mtc artifacts JSON Example
Reponse JSON for Check Availability
Field | Format | Usage / Comments |
---|---|---|
docids | String | List of docids |
Example:
Response JSON for Check Availability Example
Back to Manuals