depa.store

This service provides access to document artifacts stored in depa.tech. Use this service to download PDF files, XML files and individual page files.

General

The depa.tech content delivery service (cds) is a HTTP service that provides access to information stored in the depa.tech document store. Depending on the publishing office, we stored a variety of artifacts, some directly from the office itself, others generated during the import process.

It is possible to download either single artifacts, all artifacts of a single document or all artifacts of a list of documents. Multiple artifacts/documents are downloaded as ZIP archives.

Information Required

The only information required to access the cds is a valid depa.tech document ID (such as DE.000112014003467.A5). Accessing the endpoint of a document directly will return a list, in JSON format, of all artifacts we currently hold for that particular document.

Naming Conventions

To allow for uniform access to the various publishing offices that depa.tech supports, a naming convention has been employed for common artifacts. For example, the PDF is always called DOCUMENT.PDF, the XML DOCUMENT.XML, and the individual page files PAGEnnnn. This conforms to the standard DEPAROM naming conventions. A DEPAROM client is not required to use cds.

Artifacts Stored in cds

Currently, depa.tech holds the following artifacts:

  • Office XML

    The XML source from the publishing office contained, at minimum, the bibliographic data. For some offices, full text is available. See table below.

  • Office PDF

    The PDF file from the publishing office.

  • PAGEnnnn files

    Each page as single page TIFF in CCITT format (single bit, black and white) at 300 DPI. Generally rendered from the Office PDF.

  • Embedded Images

    Images and drawings in TIFF format. Naming convention depends on publishing office. Used in combination with the XML.

  • mtc JSON

    The document XML in JSON format. The artifact name is mtc.json.

  • mtc Simple JSON

    A simplified JSON format that complies more closely with DEPAROM. Generally speaking, for most uses the mtc JSON format is more appropriate since it is more. The artifact name is mtc.simple.json.

  • mtc Artifacts JSON

    A file in JSON format containing a list of all available artifacts. The file is called artifacts.json. Is similar is usage to what is returned by the List Artifacts endpoint. This file also contains structural information about what document sections are on which pages.

HTTP API

General Error Responses

The HTTP API returns the following error codes, these are listed here. Other responses are listed in the tables further down.

HTTP CodeReasonComments
404Document Not FoundReturned if a document is requested that is not in the store or if the document ID is malformed. Also returned for non-existent artifacts.
500Internal Server ErrorThis error code indicates that something went wrong during the request. If errors persist, please contact MTC (support@depa.tech).

Service Endpoints

ActionMethodPathBodyResponseComment
BASE urlhttps://api.depa.techBase URL for API endpoint
List ArtifactsGET/cds/:docid200 OK:docid is a depa.tech document ID. List available artifacts of a document.
Download ArtifactGET/cds/:docid/:artifact200 OKDownloads the artifact directly. :artifact is the name of the artifact as listed in the artifacts.json file or the "List Artifacts" endpoint.
Download All ArtifactsGET/cds/zip/:docid200 OKDownloads all available artifacts of a document as a ZIP file. The filename is generated and has a "cds-" prefix.
Bulk DownloadPOST/cds/zipRequest must contain a JSON body describing the documents to download. A Filter can be used to specify documents to download.200 OKDownloads all artifacts of all documents requested as a single ZIP file. The artifacts of each document are stored in a separate folder.
Check AvailabilityPOST/cds/checkJSON body contains a list of docids that are missing in store.200 OKCheck if requested docids are available in the store.

JSON Formats

Response from List Artifacts

FieldFormatUsage / Comments
artifactsJSON array of stringsContains file names within the requested directory
containerStringActual container name in the store
docidStringdocid without revision suffix

Example:

response example

1{
2 "container": "DE.000112014003467.A5",
3 "docid": "DE.000112014003467.A5",
4 "artifacts": [
5 "artifacts.json",
6 "DOCUMENT.PDF",
7 "DOCUMENT.XML",
8 "mtc.json",
9 "mtc.simple.json",
10 "mtc_source.json",
11 "PAGE0001"
12 ]
13 }

Request JSON for Bulk Download

FieldFormatUsage / Comments
docidsJSON array of stringsList of docids. All documents will be added to the ZIP file.
filterJSON array of stringsList of filters, wildcards (*) are supported. The Filter section is optional.

Example:

Request JSON for Bulk Download Example

1{
2 "docids": [
3 "EP.000000002678869.A2",
4 "DE.000202011110740.U1"
5 ],
6 "filter": [
7 "*.xml",
8 "*.pdf"
9 ]
10}

MTC Artifacts JSON

FieldFormatUsage / Comments
idJSON stringdepa.tech ID of the corresponding document.
artifactsList of JSON StringsList of available artifacts, as returned by the List Artifacts endpoint.
sectionsList of JSON Dictionaries

For each dictionary, the key is the section name and the fields start and end denote the start and end page numbers of that section.

depa.tech supports the following sections:

  • Title

    The title page(s) of the document

  • Abstract

    The pages the contain the abstract. Usuall the same as Title and is sometimes missing, depending on the data source.

  • Drawing

    The pages that contain drawings. Is not always present, for example if the document has no drawings.

  • Claim

    The pages that contain the claims. Is not always present.

  • Description

    The pages that contain the description. Is not always present.

The presence of a Claims or Description section does not necessarily mean that the XML document is full text.

Example:

mtc artifacts JSON Example

1{
2 "id": "DE.000112014003467.A5",
3 "artifacts": [
4 "DOCUMENT.PDF",
5 "DOCUMENT.XML",
6 "mtc.json",
7 "mtc.simple.json",
8 "mtc_source.json",
9 "PAGE0001",
10 "artifacts.json"
11 ],
12 "sections": [
13 {
14 "section": "Title",
15 "start": 1,
16 "end": 1
17 }
18 ]
19}

Reponse JSON for Check Availability

FieldFormatUsage / Comments
docidsStringList of docids

Example:

Response JSON for Check Availability Example

1{
2 "docids": [
3 "EP.000000002678869.A2"
4 ]
5}

Back to Manuals