Submitted documents must be in XHTML, HTML or in plain text. Other formats (e.g., PDF) must be translated externally by the application. The preferred formats are XHTML and HTML because they yield better sentence boundary results. Internet search queries are considered a “document” and are normally submitted as plain text. The list of supported document types and their associated MIME type is given in Document Text MIME Types.
The service provides an HTTP(S) interface that allows the document to be transmitted as an attachment or as a request parameter. The request also indicates if the result is returned as an attachment or stored at a specified URI (batch mode). Several other parameters are used. See text/disambiguate.
Normally the server will respond to the client when the document has been processed. However batch mode can be used to submit a collection of documents. In this mode, the server responds immediately after validating the request. The client can then transmit another request immediately. The same or another process can then monitor a drop box where the server deposits the sense annotated documents.
The request specifies the maximum number of tokens that are to be processed. This can be used to limit the processing of large documents. This limit cannot exceed the limit allowed by the account profile or the request is rejected. When submitting XHTML or HTML documents, a large document is truncated at the next tag interpreted as a paragraph boundary. When submitting plain text documents, the document is truncated when the number of tokens has been reached.
The request specifies a recipe to use when processing the document. The recipe defines the linguistic algorithms that are used when computing the senses. Different recipes trade off the processing time (and cost) with precision of the result. Faster recipes leave more ambiguity and can be slightly less accurate. See Sense Analysis Recipes. Due to the extensive analysis performed even by the fastest recipe, the application can expect to wait several seconds for a document. For long documents, this can extend to a few minutes. Internet search queries are almost always processed within a few tenths of a second.
If the request is invalid or cannot be processed, a response with a standard HTML error code is returned. The response also include element “errMsg” with a verbose indication of the cause of the error.