Sie befind­en sich hier:

depa.store

This service provides access to document artifacts stored in depa.tech. Use this service to download PDF files, XML files and individual page files.

General

The depa.tech con­tent deliv­ery ser­vice (cds) is a HTTP ser­vice that pro­vides access to infor­ma­tion stored in the depa.tech doc­u­ment store. Depend­ing on the pub­lish­ing office, we stored a vari­ety of arti­facts, some direct­ly from the office itself, oth­ers gen­er­at­ed dur­ing the import process.
It is pos­si­ble to down­load either sin­gle arti­facts, all arti­facts of a sin­gle doc­u­ment or all arti­facts of a list of doc­u­ments. Mul­ti­ple artifacts/documents are down­loaded as ZIP archives.

Information Required

The only infor­ma­tion required to access the cds is a valid depa.tech doc­u­ment ID (such as DE.000112014003467.A5). Access­ing the end­point of a doc­u­ment direct­ly will return a list, in JSON for­mat, of all arti­facts we cur­rent­ly hold for that par­tic­u­lar document.

Authentification

To use the cds, authen­ti­ca­tion is required, cus­tomers need an account for the depa.tech proxy. Please send an email to marc.haus@mtc.berlin if you need credentials.

Naming Conventions

To allow for uni­form access to the var­i­ous pub­lish­ing offices that depa.tech sup­ports, a nam­ing con­ven­tion has been employed for com­mon arti­facts. For exam­ple, the PDF is always called DOCUMENT.PDF, the XML DOCUMENT.XML, and the indi­vid­ual page files PAGEnnnn. This con­forms to the stan­dard DEPAROM nam­ing con­ven­tions. A DEPAROM client is not required to use cds.

Artifacts Stored in cds

Cur­rent­ly, depa.tech holds the fol­low­ing artifacts:

  • Office XML
    The XML source from the pub­lish­ing office con­tained, at min­i­mum, the bib­li­o­graph­ic data. For some offices, full text is avail­able. See table below.
  • Office PDF
    The PDF file from the pub­lish­ing office.
  • PAGEnnnn files
    Each page as sin­gle page TIFF in CCITT for­mat (sin­gle bit, black and white) at 300 DPI. Gen­er­al­ly ren­dered from the Office PDF.
  • Embed­ded Images
    Images and draw­ings in TIFF for­mat. Nam­ing con­ven­tion depends on pub­lish­ing office. Used in com­bi­na­tion with the XML.
  • mtc JSON
    The doc­u­ment XML in JSON for­mat. The arti­fact name is mtc.json.
  • mtc Sim­ple JSON
    A sim­pli­fied JSON for­mat that com­plies more close­ly with DEPAROM. Gen­er­al­ly speak­ing, for most uses the mtc JSON for­mat is more appro­pri­ate since it is more. The arti­fact name is mtc.simple.json.
  • mtc Arti­facts JSON
    A file in JSON for­mat con­tain­ing a list of all avail­able arti­facts. The file is called artifacts.json. Is sim­i­lar is usage to what is returned by the List Arti­facts end­point. This file also con­tains struc­tur­al infor­ma­tion about what doc­u­ment sec­tions are on which pages.

HTTP API

General Error Responses

The HTTP API returns the fol­low­ing error codes, these are list­ed here. Oth­er respons­es are list­ed in the tables fur­ther down.

HTTP CodeRea­sonCom­ments
404 Doc­u­ment Not Found Returned if a doc­u­ment is request­ed that is not in the store of if the doc­u­ment ID is mal­formed. Also returned for non-exis­tent artifacts.
500Inter­nal Serv­er Error This error code indi­cates that some­thing went wrong dur­ing the request. If errors per­sist, please con­tact mtc (support@depa.tech).

Service Endpoints

ActionMethodPathBodyResponseCom­ment
BASE urlhttps://api.depa.tech
List Arti­facts GET/cds/:docid
Returns a JSON response
200 OK
:docid is a depa.tech doc­u­ment ID. List avail­able arti­facts of a document.
Down­load Artifact GET/cds/:docid/:artifact Response depends on arti­fact mime type 200 OK
down­loads the arte­fact direct­ly. :arti­fact is the name of the arti­fact as list­ed in the artifacts.json file or the “List Arti­facts” endpoint.
Down­load all Artifacts GET/cds/zip/:docid A ZIP stream con­tain­ing all arti­facts of a document. 200 OK
down­loads all avail­able arti­facts of a doc­u­ment as a ZIP file. The file­name is gen­er­at­ed and have a “cds-” prefix.
Bulk Down­load POST/cds/zip
Request must con­tain a JSON body describ­ing the doc­u­ments to down­load. A Fil­ter can be used to spec­i­fy doc­u­ments to download.
Response is a ZIP stream.
200 OK
down­loads all arti­facts of all doc­u­ments request­ed as a sin­gle ZIP file. The arti­facts of each doc­u­ment are stored in a sep­a­rate folder.
If a doc­u­ment is not avail­able, then that doc­u­ment will be miss­ing the ZIP archive. There will be no error code in this case.
You may want to use “Check Avail­abil­i­ty” end­point to iden­ti­fy miss­ing documents.
Check Avail­abil­i­tyPOST/cds/checkJson body con­tains a list of docids that are miss­ing in store.200 OKcheck if request­ed docids are in Store available.

JSON Formats

Response from List Artifacts

FieldFor­matUsage / Comments
arti­factsJSON array of strings Con­tains file names with­in the request­ed directory
con­tain­erString
Actu­al con­tain­er name in the store
docidStringdocid with­out revi­sion suffix

Example


{
    "container": "DE.000112014003467.A5",
    "docid": "DE.000112014003467.A5",
    "artifacts": [
        "artifacts.json",
        "DOCUMENT.PDF",
        "DOCUMENT.XML",
        "mtc.json",
        "mtc.simple.json",
        "mtc_source.json",
        "PAGE0001"
    ]
}


		

Request JSON for Bulk Download

FieldFor­matUsage / Comments
docidsJSON array of strings List of docids. All doc­u­ments will be added to the ZIP file.
fil­terJSON array of stringsList of fil­ters, wild­cards (*) are supported. 
The Fil­ter sec­tion is optional.

Example


{
    "docids": [
        "EP.000000002678869.A2",
        "DE.000202011110740.U1"
    ],
    "filter": [
        "*.xml",
        "*.pdf"
    ]
}

MTC Artifacts JSON

FieldFor­matUsage / Comments
idJSON String depa.tech ID of the cor­re­spond­ing document.
arti­facts List of JSON Strings List of avail­able arti­facts, as returned by the List Arti­facts endpoint.
sec­tions List of JSON Dictionaries 
For each dic­tio­nary, the key is the sec­tion name and the fields start and end denote the start and end page num­bers of that section.

depa.tech sup­ports the fol­low­ing sections:

  • Title
    The title page(s) of the document

  • Abstract
    The pages the con­tain the abstract. Usu­all the same as Title and is some­times miss­ing, depend­ing on the data source.

  • Draw­ing
    The pages that con­tain draw­ings. Is not always present, for exam­ple if the doc­u­ment has no drawings.

  • Claim
    The pages that con­tain the claims. Is not always present. 

  • Descrip­tion
    The pages that con­tain the descrip­tion. Is not always present.

The pres­ence of a Claims or Descrip­tion sec­tion does not nec­es­sar­i­ly mean that the XML doc­u­ment is full text. 

Example


{
    "id": "DE.000112014003467.A5",
    "artifacts": [
        "DOCUMENT.PDF",
        "DOCUMENT.XML",
        "mtc.json",
        "mtc.simple.json",
        "mtc_source.json",
        "PAGE0001",
        "artifacts.json"
    ],
    "sections": [
        {
            "section": "Title",
            "start": 1,
            "end": 1
        }
    ]
}


		

Reponse JSON for Check Availability

FieldFor­matUsage / Comments
docidsStringList of docids

Example


{
    "docids": [
        "EP.000000002678869.A2"
    ]
}