azure.ai.formrecognizer package Azure SDK for Python 2.0.0 documentation
Summary of all the custom models on the account.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) AccountProperties [source]¶Converts a dict in the shape of a AccountProperties to the model itself.
Parameters:data (dict) – A dictionary in the shape of AccountProperties.
Returns:AccountProperties
Return type:to_dict() Dict [source]¶Returns a dict representation of AccountProperties.
Returns:dict
Return type:custom_model_count: int¶Current count of trained custom models.
custom_model_limit: int¶Max number of models that can be trained for this account.
class azure.ai.formrecognizer.AddressValue(**kwargs: Any)[source]¶An address field value.
New in version 2023-07-31: The unit, city_district, state_district, suburb, house, and level properties.
classmethod from_dict(data: Dict) AddressValue [source]¶Converts a dict in the shape of a AddressValue to the model itself.
Parameters:data (dict) – A dictionary in the shape of AddressValue.
Returns:AddressValue
Return type:to_dict() Dict [source]¶Returns a dict representation of AddressValue.
Returns:dict
Return type:city: str | None¶Name of city, town, village, etc.
city_district: str | None¶Districts or boroughs within a city, such as Brooklyn in New York City or City of Westminster in London.
country_region: str | None¶Country/region.
house: str | None¶Building name, such as World Trade Center.
house_number: str | None¶House or building number.
level: str | None¶Floor number, such as 3F.
po_box: str | None¶Post office box number.
postal_code: str | None¶Postal code used for mail sorting.
road: str | None¶Street name.
state: str | None¶First-level administrative division.
state_district: str | None¶Second-level administrative division used in certain locales.
street_address: str | None¶Street-level address, excluding city, state, countryRegion, and postalCode.
suburb: str | None¶Unofficial neighborhood name, like Chinatown.
unit: str | None¶Apartment or office number.
class azure.ai.formrecognizer.AnalysisFeature(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Document analysis features to enable.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
BARCODES = 'barcodes'¶Enable the detection of barcodes in the document.
FORMULAS = 'formulas'¶Enable the detection of mathematical expressions in the document.
KEY_VALUE_PAIRS = 'keyValuePairs'¶Enable the detection of general key value pairs (form fields) in the document.
LANGUAGES = 'languages'¶Enable the detection of the text content language.
OCR_HIGH_RESOLUTION = 'ocrHighResolution'¶Perform OCR at a higher resolution to handle documents with fine print.
STYLE_FONT = 'styleFont'¶Enable the recognition of various font styles.
class azure.ai.formrecognizer.AnalyzeResult(**kwargs: Any)[source]¶Document analysis result.
classmethod from_dict(data: Dict) AnalyzeResult [source]¶Converts a dict in the shape of a AnalyzeResult to the model itself.
Parameters:data (dict) – A dictionary in the shape of AnalyzeResult.
Returns:AnalyzeResult
Return type:to_dict() Dict [source]¶Returns a dict representation of AnalyzeResult.
Returns:dict
Return type:api_version: str¶API version used to produce this result.
content: str¶Concatenate string representation of all textual and visual elements in reading order.
documents: List[AnalyzedDocument] | None¶Extracted documents.
key_value_pairs: List[DocumentKeyValuePair] | None¶Extracted key-value pairs.
languages: List[DocumentLanguage] | None¶Detected languages in the document.
model_id: str¶Model ID used to produce this result.
pages: List[DocumentPage]¶Analyzed pages.
paragraphs: List[DocumentParagraph] | None¶Extracted paragraphs.
styles: List[DocumentStyle] | None¶Extracted font styles.
tables: List[DocumentTable] | None¶Extracted tables.
class azure.ai.formrecognizer.AnalyzedDocument(**kwargs: Any)[source]¶An object describing the location and semantic content of a document.
classmethod from_dict(data: Dict) AnalyzedDocument [source]¶Converts a dict in the shape of a AnalyzedDocument to the model itself.
Parameters:data (dict) – A dictionary in the shape of AnalyzedDocument.
Returns:AnalyzedDocument
Return type:to_dict() Dict [source]¶Returns a dict representation of AnalyzedDocument.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the document.
confidence: float¶Confidence of correctly extracting the document.
doc_type: str¶The type of document that was analyzed.
fields: Dict[str, DocumentField] | None¶A dictionary of named field values.
spans: List[DocumentSpan]¶The location of the document in the reading order concatenated content.
class azure.ai.formrecognizer.BlobFileListSource(container_url: str, file_list: str)[source]¶Content source for a file list in Azure Blob Storage.
classmethod from_dict(data: Dict[str, Any]) BlobFileListSource [source]¶Converts a dict in the shape of a BlobFileListSource to the model itself.
Parameters:data (dict) – A dictionary in the shape of BlobFileListSource.
Returns:BlobFileListSource
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of BlobFileListSource.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
container_url: str¶Azure Blob Storage container URL.
file_list: str¶Path to a JSONL file within the container specifying a subset of documents for training.
class azure.ai.formrecognizer.BlobSource(container_url: str, *, prefix: str | None = None)[source]¶Content source for Azure Blob Storage.
classmethod from_dict(data: Dict[str, Any]) BlobSource [source]¶Converts a dict in the shape of a BlobSource to the model itself.
Parameters:data (dict) – A dictionary in the shape of BlobSource.
Returns:BlobSource
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of BlobSource.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
container_url: str¶Azure Blob Storage container URL.
prefix: str | None¶Blob name prefix.
class azure.ai.formrecognizer.BoundingRegion(**kwargs: Any)[source]¶The bounding region corresponding to a page.
classmethod from_dict(data: Dict) BoundingRegion [source]¶Converts a dict in the shape of a BoundingRegion to the model itself.
Parameters:data (dict) – A dictionary in the shape of BoundingRegion.
Returns:BoundingRegion
Return type:to_dict() Dict [source]¶Returns a dict representation of BoundingRegion.
Returns:dict
Return type:page_number: int¶The 1-based number of the page in which this content is present.
polygon: Sequence[Point]¶A list of points representing the bounding polygon that outlines the document component. The points are listed in clockwise order relative to the document component orientation starting from the top-left. Units are in pixels for images and inches for PDF.
class azure.ai.formrecognizer.ClassifierDocumentTypeDetails(source: BlobSource | BlobFileListSource)[source]¶Training data source.
classmethod from_dict(data: Dict[str, Any]) ClassifierDocumentTypeDetails [source]¶Converts a dict in the shape of a ClassifierDocumentTypeDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of ClassifierDocumentTypeDetails.
Returns:ClassifierDocumentTypeDetails
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of ClassifierDocumentTypeDetails.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
source: BlobSource | BlobFileListSource¶Content source containing the training data.
source_kind: Literal['azureBlob', 'azureBlobFileList']¶“azureBlob” and “azureBlobFileList”.
Type:Type of training data source, known values are
class azure.ai.formrecognizer.CurrencyValue(**kwargs: Any)[source]¶A currency value element.
New in version 2023-07-31: The code property.
classmethod from_dict(data: Dict) CurrencyValue [source]¶Converts a dict in the shape of a CurrencyValue to the model itself.
Parameters:data (dict) – A dictionary in the shape of CurrencyValue.
Returns:CurrencyValue
Return type:to_dict() Dict [source]¶Returns a dict representation of CurrencyValue.
Returns:dict
Return type:amount: float¶The currency amount.
code: str | None¶Resolved currency code (ISO 4217), if any.
symbol: str | None¶The currency symbol, if found.
class azure.ai.formrecognizer.CustomDocumentModelsDetails(**kwargs: Any)[source]¶Details regarding the custom models under the Form Recognizer resource.
classmethod from_dict(data: Dict) CustomDocumentModelsDetails [source]¶Converts a dict in the shape of a CustomDocumentModelsDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomDocumentModelsDetails.
Returns:CustomDocumentModelsDetails
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomDocumentModelsDetails.
Returns:dict
Return type:count: int¶Number of custom models in the current resource.
limit: int¶Maximum number of custom models supported in the current resource.
class azure.ai.formrecognizer.CustomFormModel(**kwargs: Any)[source]¶Represents a trained model.
New in version v2.1: The model_name and properties properties, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) CustomFormModel [source]¶Converts a dict in the shape of a CustomFormModel to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomFormModel.
Returns:CustomFormModel
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomFormModel.
Returns:dict
Return type:errors: List[FormRecognizerError]¶List of any training errors.
model_id: str¶The unique identifier of this model.
model_name: str¶Optional user defined model name.
properties: CustomFormModelProperties¶Optional model properties.
status: str¶Status indicating the model’s readiness for use, CustomFormModelStatus
. Possible values include: ‘creating’, ‘ready’, ‘invalid’.
A list of submodels that are part of this model, each of which can recognize and extract fields from a different type of form.
training_completed_on: datetime¶Date and time (UTC) when model training completed.
training_documents: List[TrainingDocumentInfo]¶Metadata about each of the documents used to train the model.
training_started_on: datetime¶The date and time (UTC) when model training was started.
class azure.ai.formrecognizer.CustomFormModelField(**kwargs: Any)[source]¶A field that the model will extract from forms it analyzes.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) CustomFormModelField [source]¶Converts a dict in the shape of a CustomFormModelField to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomFormModelField.
Returns:CustomFormModelField
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomFormModelField.
Returns:dict
Return type:accuracy: float¶The estimated recognition accuracy for this field.
label: str¶The form fields label on the form.
name: str¶Canonical name; uniquely identifies a field within the form.
class azure.ai.formrecognizer.CustomFormModelInfo(**kwargs: Any)[source]¶Custom model information.
New in version v2.1: The model_name and properties properties, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) CustomFormModelInfo [source]¶Converts a dict in the shape of a CustomFormModelInfo to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomFormModelInfo.
Returns:CustomFormModelInfo
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomFormModelInfo.
Returns:dict
Return type:model_id: str¶The unique identifier of the model.
model_name: str¶Optional user defined model name.
properties: CustomFormModelProperties¶Optional model properties.
status: str¶The status of the model, CustomFormModelStatus
. Possible values include: ‘creating’, ‘ready’, ‘invalid’.
Date and time (UTC) when model training completed.
training_started_on: datetime¶Date and time (UTC) when model training was started.
class azure.ai.formrecognizer.CustomFormModelProperties(**kwargs: Any)[source]¶Optional model properties.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) CustomFormModelProperties [source]¶Converts a dict in the shape of a CustomFormModelProperties to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomFormModelProperties.
Returns:CustomFormModelProperties
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomFormModelProperties.
Returns:dict
Return type:is_composed_model: bool¶false).
Type:Is this model composed? (default
class azure.ai.formrecognizer.CustomFormModelStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Status indicating the model’s readiness for use.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
CREATING = 'creating'¶ INVALID = 'invalid'¶ READY = 'ready'¶ class azure.ai.formrecognizer.CustomFormSubmodel(**kwargs: Any)[source]¶Represents a submodel that extracts fields from a specific type of form.
New in version v2.1: The model_id property, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) CustomFormSubmodel [source]¶Converts a dict in the shape of a CustomFormSubmodel to the model itself.
Parameters:data (dict) – A dictionary in the shape of CustomFormSubmodel.
Returns:CustomFormSubmodel
Return type:to_dict() Dict [source]¶Returns a dict representation of CustomFormSubmodel.
Returns:dict
Return type:accuracy: float¶The mean of the model’s field accuracies.
fields: Dict[str, CustomFormModelField]¶A dictionary of the fields that this submodel will recognize from the input document. The fields dictionary keys are the name of the field. For models trained with labels, this is the training-time label of the field. For models trained without labels, a unique name is generated for each field.
form_type: str¶Type of form this submodel recognizes.
model_id: str¶Model identifier of the submodel.
class azure.ai.formrecognizer.DocumentAnalysisApiVersion(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Form Recognizer API versions supported by DocumentAnalysisClient and DocumentModelAdministrationClient.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
V2022_08_31 = '2022-08-31'¶ V2023_07_31 = '2023-07-31'¶This is the default version
class azure.ai.formrecognizer.DocumentAnalysisClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)[source]¶DocumentAnalysisClient analyzes information from documents and images, and classifies documents. It is the interface to use for analyzing with prebuilt models (receipts, business cards, invoices, identity documents, among others), analyzing layout from documents, analyzing general document types, and analyzing custom documents with built models (to see a full list of models supported by the service, see: https://aka.ms/azsdk/formrecognizer/models). It provides different methods based on inputs from a URL and inputs from a stream.
Note
DocumentAnalysisClient should be used with API versions 2022-08-31 and up. To use API versions <=v2.1, instantiate a FormRecognizerClient.
Parameters: Keyword Arguments:api_version (str or DocumentAnalysisApiVersion) – The API version of the service to use for requests. It defaults to the latest service version. Setting to an older version may result in reduced feature compatibility. To use API versions <=v2.1, instantiate a FormRecognizerClient.
New in version 2022-08-31: The DocumentAnalysisClient and its client methods.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] document_analysis_client = DocumentAnalysisClient(endpoint, AzureKeyCredential(key))¶
"""DefaultAzureCredential will use the values from these environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET """ from azure.ai.formrecognizer import DocumentAnalysisClient from azure.identity import DefaultAzureCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] credential = DefaultAzureCredential() document_analysis_client = DocumentAnalysisClient(endpoint, credential)begin_analyze_document(model_id: str, document: bytes | IO[bytes], **kwargs: Any) LROPoller[AnalyzeResult] [source]¶
Analyze field text and semantic values from a given document.
Parameters: Keyword Arguments:pages (str) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=”1-3, 5-6”. Separate each page number or range with a comma.
locale (str) – Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.
features (list[str]) – Document analysis features to enable.
An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult
.
New in version 2023-07-31: The features keyword argument.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] document_analysis_client = DocumentAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_documents, "rb") as f: poller = document_analysis_client.begin_analyze_document( "prebuilt-invoice", document=f, locale="en-US" ) invoices = poller.result() for idx, invoice in enumerate(invoices.documents): print(f"--------Analyzing invoice #{idx + 1}--------") vendor_name = invoice.fields.get("VendorName") if vendor_name: print( f"Vendor Name: {vendor_name.value} has confidence: {vendor_name.confidence}" ) vendor_address = invoice.fields.get("VendorAddress") if vendor_address: print( f"Vendor Address: {vendor_address.value} has confidence: {vendor_address.confidence}" ) vendor_address_recipient = invoice.fields.get("VendorAddressRecipient") if vendor_address_recipient: print( f"Vendor Address Recipient: {vendor_address_recipient.value} has confidence: {vendor_address_recipient.confidence}" ) customer_name = invoice.fields.get("CustomerName") if customer_name: print( f"Customer Name: {customer_name.value} has confidence: {customer_name.confidence}" ) customer_id = invoice.fields.get("CustomerId") if customer_id: print( f"Customer Id: {customer_id.value} has confidence: {customer_id.confidence}" ) customer_address = invoice.fields.get("CustomerAddress") if customer_address: print( f"Customer Address: {customer_address.value} has confidence: {customer_address.confidence}" ) customer_address_recipient = invoice.fields.get("CustomerAddressRecipient") if customer_address_recipient: print( f"Customer Address Recipient: {customer_address_recipient.value} has confidence: {customer_address_recipient.confidence}" ) invoice_id = invoice.fields.get("InvoiceId") if invoice_id: print( f"Invoice Id: {invoice_id.value} has confidence: {invoice_id.confidence}" ) invoice_date = invoice.fields.get("InvoiceDate") if invoice_date: print( f"Invoice Date: {invoice_date.value} has confidence: {invoice_date.confidence}" ) invoice_total = invoice.fields.get("InvoiceTotal") if invoice_total: print( f"Invoice Total: {invoice_total.value} has confidence: {invoice_total.confidence}" ) due_date = invoice.fields.get("DueDate") if due_date: print(f"Due Date: {due_date.value} has confidence: {due_date.confidence}") purchase_order = invoice.fields.get("PurchaseOrder") if purchase_order: print( f"Purchase Order: {purchase_order.value} has confidence: {purchase_order.confidence}" ) billing_address = invoice.fields.get("BillingAddress") if billing_address: print( f"Billing Address: {billing_address.value} has confidence: {billing_address.confidence}" ) billing_address_recipient = invoice.fields.get("BillingAddressRecipient") if billing_address_recipient: print( f"Billing Address Recipient: {billing_address_recipient.value} has confidence: {billing_address_recipient.confidence}" ) shipping_address = invoice.fields.get("ShippingAddress") if shipping_address: print( f"Shipping Address: {shipping_address.value} has confidence: {shipping_address.confidence}" ) shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient") if shipping_address_recipient: print( f"Shipping Address Recipient: {shipping_address_recipient.value} has confidence: {shipping_address_recipient.confidence}" ) print("Invoice items:") for idx, item in enumerate(invoice.fields.get("Items").value): print(f"...Item #{idx + 1}") item_description = item.value.get("Description") if item_description: print( f"......Description: {item_description.value} has confidence: {item_description.confidence}" ) item_quantity = item.value.get("Quantity") if item_quantity: print( f"......Quantity: {item_quantity.value} has confidence: {item_quantity.confidence}" ) unit = item.value.get("Unit") if unit: print(f"......Unit: {unit.value} has confidence: {unit.confidence}") unit_price = item.value.get("UnitPrice") if unit_price: unit_price_code = unit_price.value.code if unit_price.value.code else "" print( f"......Unit Price: {unit_price.value}{unit_price_code} has confidence: {unit_price.confidence}" ) product_code = item.value.get("ProductCode") if product_code: print( f"......Product Code: {product_code.value} has confidence: {product_code.confidence}" ) item_date = item.value.get("Date") if item_date: print( f"......Date: {item_date.value} has confidence: {item_date.confidence}" ) tax = item.value.get("Tax") if tax: print(f"......Tax: {tax.value} has confidence: {tax.confidence}") amount = item.value.get("Amount") if amount: print( f"......Amount: {amount.value} has confidence: {amount.confidence}" ) subtotal = invoice.fields.get("SubTotal") if subtotal: print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}") total_tax = invoice.fields.get("TotalTax") if total_tax: print( f"Total Tax: {total_tax.value} has confidence: {total_tax.confidence}" ) previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance") if previous_unpaid_balance: print( f"Previous Unpaid Balance: {previous_unpaid_balance.value} has confidence: {previous_unpaid_balance.confidence}" ) amount_due = invoice.fields.get("AmountDue") if amount_due: print( f"Amount Due: {amount_due.value} has confidence: {amount_due.confidence}" ) service_start_date = invoice.fields.get("ServiceStartDate") if service_start_date: print( f"Service Start Date: {service_start_date.value} has confidence: {service_start_date.confidence}" ) service_end_date = invoice.fields.get("ServiceEndDate") if service_end_date: print( f"Service End Date: {service_end_date.value} has confidence: {service_end_date.confidence}" ) service_address = invoice.fields.get("ServiceAddress") if service_address: print( f"Service Address: {service_address.value} has confidence: {service_address.confidence}" ) service_address_recipient = invoice.fields.get("ServiceAddressRecipient") if service_address_recipient: print( f"Service Address Recipient: {service_address_recipient.value} has confidence: {service_address_recipient.confidence}" ) remittance_address = invoice.fields.get("RemittanceAddress") if remittance_address: print( f"Remittance Address: {remittance_address.value} has confidence: {remittance_address.confidence}" ) remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient") if remittance_address_recipient: print( f"Remittance Address Recipient: {remittance_address_recipient.value} has confidence: {remittance_address_recipient.confidence}" )¶
from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] model_id = os.getenv("CUSTOM_BUILT_MODEL_ID", custom_model_id) document_analysis_client = DocumentAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) # Make sure your document's type is included in the list of document types the custom model can analyze with open(path_to_sample_documents, "rb") as f: poller = document_analysis_client.begin_analyze_document( model_id=model_id, document=f ) result = poller.result() for idx, document in enumerate(result.documents): print(f"--------Analyzing document #{idx + 1}--------") print(f"Document has type {document.doc_type}") print(f"Document has document type confidence {document.confidence}") print(f"Document was analyzed with model with ID {result.model_id}") for name, field in document.fields.items(): field_value = field.value if field.value else field.content print( f"......found field of type '{field.value_type}' with value '{field_value}' and with confidence {field.confidence}" ) # iterate over tables, lines, and selection marks on each page for page in result.pages: print(f"\nLines found on page {page.page_number}") for line in page.lines: print(f"...Line '{line.content}'") for word in page.words: print(f"...Word '{word.content}' has a confidence of {word.confidence}") if page.selection_marks: print(f"\nSelection marks found on page {page.page_number}") for selection_mark in page.selection_marks: print( f"...Selection mark is '{selection_mark.state}' and has a confidence of {selection_mark.confidence}" ) for i, table in enumerate(result.tables): print(f"\nTable {i + 1} can be found on page:") for region in table.bounding_regions: print(f"...{region.page_number}") for cell in table.cells: print( f"...Cell[{cell.row_index}][{cell.column_index}] has text '{cell.content}'" ) print("-----------------------------------")begin_analyze_document_from_url(model_id: str, document_url: str, **kwargs: Any) LROPoller[AnalyzeResult] [source]¶
Analyze field text and semantic values from a given document. The input must be the location (URL) of the document to be analyzed.
Parameters:model_id (str) – A unique model identifier can be passed in as a string. Use this to specify the custom model ID or prebuilt model ID. Prebuilt model IDs supported can be found here: https://aka.ms/azsdk/formrecognizer/models
document_url (str) – The URL of the document to analyze. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL. For service supported file types, see: https://aka.ms/azsdk/formrecognizer/supportedfiles.
pages (str) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=”1-3, 5-6”. Separate each page number or range with a comma.
locale (str) – Locale hint of the input document. See supported locales here: https://aka.ms/azsdk/formrecognizer/supportedlocales.
features (list[str]) – Document analysis features to enable.
An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult
.
New in version 2023-07-31: The features keyword argument.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] document_analysis_client = DocumentAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png" poller = document_analysis_client.begin_analyze_document_from_url( "prebuilt-receipt", document_url=url ) receipts = poller.result() for idx, receipt in enumerate(receipts.documents): print(f"--------Analysis of receipt #{idx + 1}--------") print(f"Receipt type: {receipt.doc_type if receipt.doc_type else 'N/A'}") merchant_name = receipt.fields.get("MerchantName") if merchant_name: print( f"Merchant Name: {merchant_name.value} has confidence: " f"{merchant_name.confidence}" ) transaction_date = receipt.fields.get("TransactionDate") if transaction_date: print( f"Transaction Date: {transaction_date.value} has confidence: " f"{transaction_date.confidence}" ) if receipt.fields.get("Items"): print("Receipt items:") for idx, item in enumerate(receipt.fields.get("Items").value): print(f"...Item #{idx + 1}") item_description = item.value.get("Description") if item_description: print( f"......Item Description: {item_description.value} has confidence: " f"{item_description.confidence}" ) item_quantity = item.value.get("Quantity") if item_quantity: print( f"......Item Quantity: {item_quantity.value} has confidence: " f"{item_quantity.confidence}" ) item_price = item.value.get("Price") if item_price: print( f"......Individual Item Price: {item_price.value} has confidence: " f"{item_price.confidence}" ) item_total_price = item.value.get("TotalPrice") if item_total_price: print( f"......Total Item Price: {item_total_price.value} has confidence: " f"{item_total_price.confidence}" ) subtotal = receipt.fields.get("Subtotal") if subtotal: print(f"Subtotal: {subtotal.value} has confidence: {subtotal.confidence}") tax = receipt.fields.get("TotalTax") if tax: print(f"Total tax: {tax.value} has confidence: {tax.confidence}") tip = receipt.fields.get("Tip") if tip: print(f"Tip: {tip.value} has confidence: {tip.confidence}") total = receipt.fields.get("Total") if total: print(f"Total: {total.value} has confidence: {total.confidence}") print("--------------------------------------")begin_classify_document(classifier_id: str, document: bytes | IO[bytes], **kwargs: Any) LROPoller[AnalyzeResult] [source]¶
Classify a document using a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel.
Parameters: Returns:An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult
.
New in version 2023-07-31: The begin_classify_document client method.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] classifier_id = os.getenv("CLASSIFIER_ID", classifier_id) document_analysis_client = DocumentAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_documents, "rb") as f: poller = document_analysis_client.begin_classify_document( classifier_id, document=f ) result = poller.result() print("----Classified documents----") for doc in result.documents: print( f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on " f"the following pages: {[region.page_number for region in doc.bounding_regions]}" )begin_classify_document_from_url(classifier_id: str, document_url: str, **kwargs: Any) LROPoller[AnalyzeResult] [source]¶
Classify a given document with a document classifier. For more information on how to build a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel. The input must be the location (URL) of the document to be classified.
Parameters:classifier_id (str) – A unique document classifier identifier can be passed in as a string.
document_url (str) – The URL of the document to classify. The input must be a valid, properly encoded (i.e. encode special characters, such as empty spaces), and publicly accessible URL of one of the supported formats: https://aka.ms/azsdk/formrecognizer/supportedfiles.
An instance of an LROPoller. Call result() on the poller object to return a AnalyzeResult
.
New in version 2023-07-31: The begin_classify_document_from_url client method.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentAnalysisClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] classifier_id = os.getenv("CLASSIFIER_ID", classifier_id) document_analysis_client = DocumentAnalysisClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/forms/IRS-1040.pdf" poller = document_analysis_client.begin_classify_document_from_url( classifier_id, document_url=url ) result = poller.result() print("----Classified documents----") for doc in result.documents: print( f"Found document of type '{doc.doc_type or 'N/A'}' with a confidence of {doc.confidence} contained on " f"the following pages: {[region.page_number for region in doc.bounding_regions]}" )close() None [source]¶
Close the DocumentAnalysisClient
session.
Runs a network request using the client’s existing pipeline.
The request URL can be relative to the base URL. The service API version used for the request is the same as the client’s unless otherwise specified. Overriding the client’s configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.
Parameters:request (HttpRequest) – The network request you want to make.
Keyword Arguments:stream (bool) – Whether the response payload will be streamed. Defaults to False.
Returns:The response of your network call. Does not do error handling on your response.
Return type:class azure.ai.formrecognizer.DocumentAnalysisError(**kwargs: Any)[source]¶DocumentAnalysisError contains the details of the error returned by the service.
classmethod from_dict(data: Dict) DocumentAnalysisError [source]¶Converts a dict in the shape of a DocumentAnalysisError to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentAnalysisError.
Returns:DocumentAnalysisError
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentAnalysisError.
Returns:dict
Return type:code: str¶Error code.
details: List[DocumentAnalysisError] | None¶List of detailed errors.
innererror: DocumentAnalysisInnerError | None¶Detailed error.
message: str¶Error message.
target: str | None¶Target of the error.
class azure.ai.formrecognizer.DocumentAnalysisInnerError(**kwargs: Any)[source]¶Inner error details for the DocumentAnalysisError.
classmethod from_dict(data: Dict) DocumentAnalysisInnerError [source]¶Converts a dict in the shape of a DocumentAnalysisInnerError to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentAnalysisInnerError.
Returns:DocumentAnalysisInnerError
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentAnalysisInnerError.
Returns:dict
Return type:code: str¶Error code.
innererror: DocumentAnalysisInnerError | None¶Detailed error.
message: str | None¶Error message.
class azure.ai.formrecognizer.DocumentBarcode(**kwargs: Any)[source]¶A barcode object.
classmethod from_dict(data: Dict[str, Any]) DocumentBarcode [source]¶Converts a dict in the shape of a DocumentBarcode to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentBarcode.
Returns:DocumentBarcode
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of DocumentBarcode.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
confidence: float¶Confidence of correctly extracting the barcode.
kind: Literal['QRCode', 'PDF417', 'UPCA', 'UPCE', 'Code39', 'Code128', 'EAN8', 'EAN13', 'DataBar', 'Code93', 'Codabar', 'DataBarExpanded', 'ITF', 'MicroQRCode', 'Aztec', 'DataMatrix', 'MaxiCode']¶Barcode kind. Known values are “QRCode”, “PDF417”, “UPCA”, “UPCE”, “Code39”, “Code128”, “EAN8”, “EAN13”, “DataBar”, “Code93”, “Codabar”, “DataBarExpanded”, “ITF”, “MicroQRCode”, “Aztec”, “DataMatrix”, “MaxiCode”.
polygon: Sequence[Point]¶Bounding polygon of the barcode.
span: DocumentSpan¶Location of the barcode in the reading order concatenated content.
value: str¶Barcode value.
class azure.ai.formrecognizer.DocumentClassifierDetails(**kwargs: Any)[source]¶Document classifier information. Includes the doc types that the model can classify.
classmethod from_dict(data: Dict[str, Any]) DocumentClassifierDetails [source]¶Converts a dict in the shape of a DocumentClassifierDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentClassifierDetails.
Returns:DocumentClassifierDetails
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of DocumentClassifierDetails.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
api_version: str¶API version used to create this document classifier.
classifier_id: str¶Unique document classifier name.
created_on: datetime¶Date and time (UTC) when the document classifier was created.
description: str | None¶Document classifier description.
doc_types: Mapping[str, ClassifierDocumentTypeDetails]¶List of document types to classify against.
expires_on: datetime | None¶Date and time (UTC) when the document classifier will expire.
class azure.ai.formrecognizer.DocumentField(**kwargs: Any)[source]¶An object representing the content and location of a document field value.
New in version 2023-07-31: The boolean value_type and bool value
classmethod from_dict(data: Dict) DocumentField [source]¶Converts a dict in the shape of a DocumentField to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentField.
Returns:DocumentField
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentField.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the field.
confidence: float¶The confidence of correctly extracting the field.
content: str | None¶The field’s content.
spans: List[DocumentSpan] | None¶Location of the field in the reading order concatenated content.
value: str | int | float | bool | date | time | CurrencyValue | AddressValue | Dict[str, DocumentField] | List[DocumentField] | None¶The value for the recognized field. Its semantic data type is described by value_type. If the value is extracted from the document, but cannot be normalized to its type, then access the content property for a textual representation of the value.
value_type: str¶The type of value found on DocumentField. Possible types include: “string”, “date”, “time”, “phoneNumber”, “float”, “integer”, “selectionMark”, “countryRegion”, “signature”, “currency”, “address”, “boolean”, “list”, “dictionary”.
class azure.ai.formrecognizer.DocumentFormula(**kwargs: Any)[source]¶A formula object.
classmethod from_dict(data: Dict[str, Any]) DocumentFormula [source]¶Converts a dict in the shape of a DocumentFormula to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentFormula.
Returns:DocumentFormula
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of DocumentFormula.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
confidence: float¶Confidence of correctly extracting the formula.
kind: Literal['inline', 'display']¶Formula kind. Known values are “inline”, “display”.
polygon: Sequence[Point]¶Bounding polygon of the formula.
span: DocumentSpan¶Location of the formula in the reading order concatenated content.
value: str¶LaTex expression describing the formula.
class azure.ai.formrecognizer.DocumentKeyValueElement(**kwargs: Any)[source]¶An object representing the field key or value in a key-value pair.
classmethod from_dict(data: Dict) DocumentKeyValueElement [source]¶Converts a dict in the shape of a DocumentKeyValueElement to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentKeyValueElement.
Returns:DocumentKeyValueElement
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentKeyValueElement.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the key-value element.
content: str¶Concatenated content of the key-value element in reading order.
spans: List[DocumentSpan]¶Location of the key-value element in the reading order of the concatenated content.
class azure.ai.formrecognizer.DocumentKeyValuePair(**kwargs: Any)[source]¶An object representing a document field with distinct field label (key) and field value (may be empty).
classmethod from_dict(data: Dict) DocumentKeyValuePair [source]¶Converts a dict in the shape of a DocumentKeyValuePair to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentKeyValuePair.
Returns:DocumentKeyValuePair
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentKeyValuePair.
Returns:dict
Return type:confidence: float¶Confidence of correctly extracting the key-value pair.
key: DocumentKeyValueElement¶Field label of the key-value pair.
value: DocumentKeyValueElement | None¶Field value of the key-value pair.
class azure.ai.formrecognizer.DocumentLanguage(**kwargs: Any)[source]¶An object representing the detected language for a given text span.
classmethod from_dict(data: Dict) DocumentLanguage [source]¶Converts a dict in the shape of a DocumentLanguage to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentLanguage.
Returns:DocumentLanguage
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentLanguage.
Returns:dict
Return type:confidence: float¶Confidence of correctly identifying the language.
locale: str¶Detected language code. Value may be an ISO 639-1 language code (ex. “en”, “fr”) or a BCP 47 language tag (ex. “zh-Hans”).
spans: List[DocumentSpan]¶Location of the text elements in the concatenated content that the language applies to.
class azure.ai.formrecognizer.DocumentLine(**kwargs: Any)[source]¶A content line object representing the content found on a single line of the document.
classmethod from_dict(data: Dict) DocumentLine [source]¶Converts a dict in the shape of a DocumentLine to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentLine.
Returns:DocumentLine
Return type:get_words() Iterable[DocumentWord] [source]¶Get the words found in the spans of this DocumentLine.
Returns:iterable[DocumentWord]
Return type:iterable[DocumentWord]
to_dict() Dict [source]¶Returns a dict representation of DocumentLine.
Returns:dict
Return type:content: str¶Concatenated content of the contained elements in reading order.
polygon: Sequence[Point]¶Bounding polygon of the line.
spans: List[DocumentSpan]¶Location of the line in the reading order concatenated content.
class azure.ai.formrecognizer.DocumentModelAdministrationClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)[source]¶DocumentModelAdministrationClient is the Form Recognizer interface to use for building and managing models.
It provides methods for building models and classifiers, as well as methods for viewing and deleting models and classifiers, viewing model and classifier operations, accessing account information, copying models to another Form Recognizer resource, and composing a new model from a collection of existing models.
Note
DocumentModelAdministrationClient should be used with API versions 2022-08-31 and up. To use API versions <=v2.1, instantiate a FormTrainingClient.
Parameters: Keyword Arguments:api_version (str or DocumentAnalysisApiVersion) – The API version of the service to use for requests. It defaults to the latest service version. Setting to an older version may result in reduced feature compatibility. To use API versions <=v2.1, instantiate a FormTrainingClient.
New in version 2022-08-31: The DocumentModelAdministrationClient and its client methods.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentModelAdministrationClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] document_model_admin_client = DocumentModelAdministrationClient( endpoint, AzureKeyCredential(key) )¶
"""DefaultAzureCredential will use the values from these environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET """ from azure.ai.formrecognizer import DocumentModelAdministrationClient from azure.identity import DefaultAzureCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] credential = DefaultAzureCredential() document_model_admin_client = DocumentModelAdministrationClient( endpoint, credential )begin_build_document_classifier(doc_types: Mapping[str, ClassifierDocumentTypeDetails], *, classifier_id: str | None = None, description: str | None = None, **kwargs: Any) DocumentModelAdministrationLROPoller[DocumentClassifierDetails] [source]¶
Build a document classifier. For more information on how to build and train a custom classifier model, see https://aka.ms/azsdk/formrecognizer/buildclassifiermodel.
Parameters:doc_types (Mapping[str, ClassifierDocumentTypeDetails]) – Mapping of document types to classify against.
Keyword Arguments:classifier_id (str) – Unique document classifier name. If not specified, a classifier ID will be created for you.
description (str) – Document classifier description.
An instance of an DocumentModelAdministrationLROPoller. Call result() on the poller object to return a DocumentClassifierDetails
.
DocumentModelAdministrationLROPoller[DocumentClassifierDetails]
Raises:New in version 2023-07-31: The begin_build_document_classifier client method.
Example:
¶import os from azure.ai.formrecognizer import ( DocumentModelAdministrationClient, ClassifierDocumentTypeDetails, BlobSource, BlobFileListSource, ) from azure.core.credentials import AzureKeyCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] container_sas_url = os.environ["CLASSIFIER_CONTAINER_SAS_URL"] document_model_admin_client = DocumentModelAdministrationClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) poller = document_model_admin_client.begin_build_document_classifier( doc_types={ "IRS-1040-A": ClassifierDocumentTypeDetails( source=BlobSource( container_url=container_sas_url, prefix="IRS-1040-A/train" ) ), "IRS-1040-D": ClassifierDocumentTypeDetails( source=BlobFileListSource( container_url=container_sas_url, file_list="IRS-1040-D.jsonl" ) ), }, description="IRS document classifier", ) result = poller.result() print(f"Classifier ID: {result.classifier_id}") print(f"API version used to build the classifier model: {result.api_version}") print(f"Classifier description: {result.description}") print(f"Document classes used for training the model:") for doc_type, details in result.doc_types.items(): print(f"Document type: {doc_type}") print(f"Container source: {details.source.container_url}\n")begin_build_document_model(build_mode: str | ModelBuildMode, *, blob_container_url: str, prefix: str | None = None, model_id: str | None = None, description: str | None = None, tags: Mapping[str, str] | None = None, **kwargs: Any) DocumentModelAdministrationLROPoller[DocumentModelDetails] [source]¶ begin_build_document_model(build_mode: str | ModelBuildMode, *, blob_container_url: str, file_list: str, model_id: str | None = None, description: str | None = None, tags: Mapping[str, str] | None = None, **kwargs: Any) DocumentModelAdministrationLROPoller[DocumentModelDetails]
Build a custom document model.
The request must include a blob_container_url keyword parameter that is an externally accessible Azure storage blob container URI (preferably a Shared Access Signature URI). Note that a container URI (without SAS) is accepted only when the container is public or has a managed identity configured, see more about configuring managed identities to work with Form Recognizer here: https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/managed-identities. Models are built using documents that are of the following content type - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’, ‘image/bmp’, or ‘image/heif’. Other types of content in the container is ignored.
Parameters:build_mode (str or ModelBuildMode
) – The custom model build mode. Possible values include: “template”, “neural”. For more information about build modes, see: https://aka.ms/azsdk/formrecognizer/buildmode.
blob_container_url (str) – An Azure Storage blob container’s SAS URI. A container URI (without SAS) can be used if the container is public or has a managed identity configured. For more information on setting up a training data set, see: https://aka.ms/azsdk/formrecognizer/buildtrainingset.
model_id (str) – A unique ID for your model. If not specified, a model ID will be created for you.
description (str) – An optional description to add to the model.
prefix (str) – A case-sensitive prefix string to filter documents in the blob container url path. For example, when using an Azure storage blob URI, use the prefix to restrict sub folders. prefix should end in ‘/’ to avoid cases where filenames share the same prefix.
file_list (str) – Path to a JSONL file within the container specifying a subset of documents for training.
tags (dict[str, str]) – List of user defined key-value tag attributes associated with the model.
An instance of an DocumentModelAdministrationLROPoller. Call result() on the poller object to return a DocumentModelDetails
.
DocumentModelAdministrationLROPoller[DocumentModelDetails]
Raises:New in version 2023-07-31: The file_list keyword argument.
Example:
¶from azure.ai.formrecognizer import ( DocumentModelAdministrationClient, ModelBuildMode, ) from azure.core.credentials import AzureKeyCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] container_sas_url = os.environ["CONTAINER_SAS_URL"] document_model_admin_client = DocumentModelAdministrationClient( endpoint, AzureKeyCredential(key) ) poller = document_model_admin_client.begin_build_document_model( ModelBuildMode.TEMPLATE, blob_container_url=container_sas_url, description="my model description", ) model = poller.result() print(f"Model ID: {model.model_id}") print(f"Description: {model.description}") print(f"Model created on: {model.created_on}") print(f"Model expires on: {model.expires_on}") print("Doc types the model can recognize:") for name, doc_type in model.doc_types.items(): print( f"Doc Type: '{name}' built with '{doc_type.build_mode}' mode which has the following fields:" ) for field_name, field in doc_type.field_schema.items(): print( f"Field: '{field_name}' has type '{field['type']}' and confidence score " f"{doc_type.field_confidence[field_name]}" )begin_compose_document_model(component_model_ids: List[str], **kwargs: Any) DocumentModelAdministrationLROPoller[DocumentModelDetails] [source]¶
Creates a composed document model from a collection of existing models.
A composed model allows multiple models to be called with a single model ID. When a document is submitted to be analyzed with a composed model ID, a classification step is first performed to route it to the correct custom model.
Parameters:component_model_ids (list[str]) – List of model IDs to use in the composed model.
Keyword Arguments:model_id (str) – A unique ID for your composed model. If not specified, a model ID will be created for you.
description (str) – An optional description to add to the model.
tags (dict[str, str]) – List of user defined key-value tag attributes associated with the model.
An instance of an DocumentModelAdministrationLROPoller. Call result() on the poller object to return a DocumentModelDetails
.
DocumentModelAdministrationLROPoller[DocumentModelDetails]
Raises:Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import ( DocumentModelAdministrationClient, ModelBuildMode, ) endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] po_supplies = os.environ["PURCHASE_ORDER_OFFICE_SUPPLIES_SAS_URL"] po_equipment = os.environ["PURCHASE_ORDER_OFFICE_EQUIPMENT_SAS_URL"] po_furniture = os.environ["PURCHASE_ORDER_OFFICE_FURNITURE_SAS_URL"] po_cleaning_supplies = os.environ["PURCHASE_ORDER_OFFICE_CLEANING_SUPPLIES_SAS_URL"] document_model_admin_client = DocumentModelAdministrationClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) supplies_poller = document_model_admin_client.begin_build_document_model( ModelBuildMode.TEMPLATE, blob_container_url=po_supplies, description="Purchase order-Office supplies", ) equipment_poller = document_model_admin_client.begin_build_document_model( ModelBuildMode.TEMPLATE, blob_container_url=po_equipment, description="Purchase order-Office Equipment", ) furniture_poller = document_model_admin_client.begin_build_document_model( ModelBuildMode.TEMPLATE, blob_container_url=po_furniture, description="Purchase order-Furniture", ) cleaning_supplies_poller = document_model_admin_client.begin_build_document_model( ModelBuildMode.TEMPLATE, blob_container_url=po_cleaning_supplies, description="Purchase order-Cleaning Supplies", ) supplies_model = supplies_poller.result() equipment_model = equipment_poller.result() furniture_model = furniture_poller.result() cleaning_supplies_model = cleaning_supplies_poller.result() purchase_order_models = [ supplies_model.model_id, equipment_model.model_id, furniture_model.model_id, cleaning_supplies_model.model_id, ] poller = document_model_admin_client.begin_compose_document_model( purchase_order_models, description="Office Supplies Composed Model" ) model = poller.result() print("Office Supplies Composed Model Info:") print(f"Model ID: {model.model_id}") print(f"Description: {model.description}") print(f"Model created on: {model.created_on}") print(f"Model expires on: {model.expires_on}") print("Doc types the model can recognize:") for name, doc_type in model.doc_types.items(): print(f"Doc Type: '{name}' which has the following fields:") for field_name, field in doc_type.field_schema.items(): print( f"Field: '{field_name}' has type '{field['type']}' and confidence score " f"{doc_type.field_confidence[field_name]}" )begin_copy_document_model_to(model_id: str, target: TargetAuthorization, **kwargs: Any) DocumentModelAdministrationLROPoller[DocumentModelDetails] [source]¶
Copy a document model stored in this resource (the source) to the user specified target Form Recognizer resource.
This should be called with the source Form Recognizer resource (with the model that is intended to be copied). The target parameter should be supplied from the target resource’s output from calling the get_copy_authorization()
method.
model_id (str) – Model identifier of the model to copy to target resource.
target (TargetAuthorization) – The copy authorization generated from the target resource’s call to
get_copy_authorization()
.
An instance of a DocumentModelAdministrationLROPoller. Call result() on the poller object to return a DocumentModelDetails
.
DocumentModelAdministrationLROPoller[DocumentModelDetails]
Raises:Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentModelAdministrationClient source_endpoint = os.environ["AZURE_FORM_RECOGNIZER_SOURCE_ENDPOINT"] source_key = os.environ["AZURE_FORM_RECOGNIZER_SOURCE_KEY"] target_endpoint = os.environ["AZURE_FORM_RECOGNIZER_TARGET_ENDPOINT"] target_key = os.environ["AZURE_FORM_RECOGNIZER_TARGET_KEY"] source_model_id = os.getenv("AZURE_SOURCE_MODEL_ID", custom_model_id) target_client = DocumentModelAdministrationClient( endpoint=target_endpoint, credential=AzureKeyCredential(target_key) ) target = target_client.get_copy_authorization( description="model copied from other resource" ) source_client = DocumentModelAdministrationClient( endpoint=source_endpoint, credential=AzureKeyCredential(source_key) ) poller = source_client.begin_copy_document_model_to( model_id=source_model_id, target=target, # output from target client's call to get_copy_authorization() ) copied_over_model = poller.result() print(f"Model ID: {copied_over_model.model_id}") print(f"Description: {copied_over_model.description}") print(f"Model created on: {copied_over_model.created_on}") print(f"Model expires on: {copied_over_model.expires_on}") print("Doc types the model can recognize:") for name, doc_type in copied_over_model.doc_types.items(): print(f"Doc Type: '{name}' which has the following fields:") for field_name, field in doc_type.field_schema.items(): print( f"Field: '{field_name}' has type '{field['type']}' and confidence score " f"{doc_type.field_confidence[field_name]}" )close() None [source]¶
Close the DocumentModelAdministrationClient
session.
Delete a document classifier.
Parameters:classifier_id (str) – Classifier identifier.
Returns:None
Return type:None
Raises:HttpResponseError or ResourceNotFoundError –
New in version 2023-07-31: The delete_document_classifier client method.
Example:
¶document_model_admin_client.delete_document_classifier( classifier_id=my_classifier.classifier_id ) try: document_model_admin_client.get_document_classifier( classifier_id=my_classifier.classifier_id ) except ResourceNotFoundError: print(f"Successfully deleted classifier with ID {my_classifier.classifier_id}")delete_document_model(model_id: str, **kwargs: Any) None [source]¶
Delete a custom document model.
Parameters:model_id (str) – Model identifier.
Returns:None
Return type:None
Raises:HttpResponseError or ResourceNotFoundError –
Example:
¶document_model_admin_client.delete_document_model(model_id=my_model.model_id) try: document_model_admin_client.get_document_model(model_id=my_model.model_id) except ResourceNotFoundError: print(f"Successfully deleted model with ID {my_model.model_id}")get_copy_authorization(**kwargs: Any) TargetAuthorization [source]¶
Generate authorization for copying a custom model into the target Form Recognizer resource.
This should be called by the target resource (where the model will be copied to) and the output can be passed as the target parameter into begin_copy_document_model_to()
.
model_id (str) – A unique ID for your copied model. If not specified, a model ID will be created for you.
description (str) – An optional description to add to the model.
tags (dict[str, str]) – List of user defined key-value tag attributes associated with the model.
A dictionary with values necessary for the copy authorization.
Return type:TargetAuthorization
Raises:get_document_analysis_client(**kwargs: Any) DocumentAnalysisClient [source]¶Get an instance of a DocumentAnalysisClient from DocumentModelAdministrationClient.
Return type:Returns:A DocumentAnalysisClient
get_document_classifier(classifier_id: str, **kwargs: Any) DocumentClassifierDetails [source]¶Get a document classifier by its ID.
Parameters:classifier_id (str) – Classifier identifier.
Returns:DocumentClassifierDetails
Return type:Raises:HttpResponseError or ResourceNotFoundError –
New in version 2023-07-31: The get_document_classifier client method.
Example:
¶my_classifier = document_model_admin_client.get_document_classifier( classifier_id=classifier_model.classifier_id ) print(f"\nClassifier ID: {my_classifier.classifier_id}") print(f"Description: {my_classifier.description}") print(f"Classifier created on: {my_classifier.created_on}")get_document_model(model_id: str, **kwargs: Any) DocumentModelDetails [source]¶
Get a document model by its ID.
Parameters:model_id (str) – Model identifier.
Returns:DocumentModelDetails
Return type:Raises:HttpResponseError or ResourceNotFoundError –
Example:
¶my_model = document_model_admin_client.get_document_model(model_id=model.model_id) print(f"\nModel ID: {my_model.model_id}") print(f"Description: {my_model.description}") print(f"Model created on: {my_model.created_on}") print(f"Model expires on: {my_model.expires_on}")get_operation(operation_id: str, **kwargs: Any) OperationDetails [source]¶
Get an operation by its ID.
Get an operation associated with the Form Recognizer resource. Note that operation information only persists for 24 hours. If the document model operation was successful, the model can be accessed using the get_document_model()
or list_document_models()
APIs.
operation_id (str) – The operation ID.
Returns:OperationDetails
Return type:Raises:Example:
¶# Get an operation by ID if operations: print(f"\nGetting operation info by ID: {operations[0].operation_id}") operation_info = document_model_admin_client.get_operation( operations[0].operation_id ) if operation_info.status == "succeeded": print(f"My {operation_info.kind} operation is completed.") result = operation_info.result if result is not None: if operation_info.kind == "documentClassifierBuild": print(f"Classifier ID: {result.classifier_id}") else: print(f"Model ID: {result.model_id}") elif operation_info.status == "failed": print(f"My {operation_info.kind} operation failed.") error = operation_info.error if error is not None: print(f"{error.code}: {error.message}") else: print(f"My operation status is {operation_info.status}") else: print("No operations found.")get_resource_details(**kwargs: Any) ResourceDetails [source]¶
Get information about the models under the Form Recognizer resource.
Returns:Summary of custom models under the resource - model count and limit.
Return type:Raises:Example:
¶document_model_admin_client = DocumentModelAdministrationClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) account_details = document_model_admin_client.get_resource_details() print( f"Our resource has {account_details.custom_document_models.count} custom models, " f"and we can have at most {account_details.custom_document_models.limit} custom models" ) neural_models = account_details.neural_document_model_quota print( f"The quota limit for custom neural document models is {neural_models.quota} and the resource has" f"used {neural_models.used}. The resource quota will reset on {neural_models.quota_resets_on}" )list_document_classifiers(**kwargs: Any) ItemPaged[DocumentClassifierDetails] [source]¶
List information for each document classifier, including its classifier ID, description, and when it was created.
Returns:Pageable of DocumentClassifierDetails.
Return type:ItemPaged[DocumentClassifierDetails]
Raises:New in version 2023-07-31: The list_document_classifiers client method.
Example:
¶classifiers = document_model_admin_client.list_document_classifiers() print("We have the following 'ready' models with IDs and descriptions:") for classifier in classifiers: print(f"{classifier.classifier_id} | {classifier.description}")list_document_models(**kwargs: Any) ItemPaged[DocumentModelSummary] [source]¶
List information for each model, including its model ID, description, and when it was created.
Returns:Pageable of DocumentModelSummary.
Return type:ItemPaged[DocumentModelSummary]
Raises:Example:
¶models = document_model_admin_client.list_document_models() print("We have the following 'ready' models with IDs and descriptions:") for model in models: print(f"{model.model_id} | {model.description}")list_operations(**kwargs: Any) ItemPaged[OperationSummary] [source]¶
List information for each operation.
Lists all operations associated with the Form Recognizer resource. Note that operation information only persists for 24 hours. If a document model operation was successful, the document model can be accessed using the get_document_model()
or list_document_models()
APIs.
A pageable of OperationSummary.
Return type:Raises:Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import DocumentModelAdministrationClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] document_model_admin_client = DocumentModelAdministrationClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) operations = list(document_model_admin_client.list_operations()) print("The following document model operations exist under my resource:") for operation in operations: print(f"\nOperation ID: {operation.operation_id}") print(f"Operation kind: {operation.kind}") print(f"Operation status: {operation.status}") print(f"Operation percent completed: {operation.percent_completed}") print(f"Operation created on: {operation.created_on}") print(f"Operation last updated on: {operation.last_updated_on}") print( f"Resource location of successful operation: {operation.resource_location}" )send_request(request: HttpRequest, *, stream: bool = False, **kwargs) HttpResponse¶
Runs a network request using the client’s existing pipeline.
The request URL can be relative to the base URL. The service API version used for the request is the same as the client’s unless otherwise specified. Overriding the client’s configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.
Parameters:request (HttpRequest) – The network request you want to make.
Keyword Arguments:stream (bool) – Whether the response payload will be streamed. Defaults to False.
Returns:The response of your network call. Does not do error handling on your response.
Return type:class azure.ai.formrecognizer.DocumentModelAdministrationLROPoller(*args, **kwargs)[source]¶Implements a protocol followed by returned poller objects.
add_done_callback(func: Callable) None [source]¶ continuation_token() str [source]¶ done() bool [source]¶ polling_method() PollingMethod[PollingReturnType_co] [source]¶ remove_done_callback(func: Callable) None [source]¶ result(timeout: int | None = None) PollingReturnType_co [source]¶ status() str [source]¶ wait(timeout: float | None = None) None [source]¶ property details: Mapping[str, Any]¶ class azure.ai.formrecognizer.DocumentModelDetails(**kwargs: Any)[source]¶Document model information. Includes the doc types that the model can analyze.
New in version 2023-07-31: The expires_on property.
classmethod from_dict(data: Dict[str, Any]) DocumentModelDetails [source]¶Converts a dict in the shape of a DocumentModelDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentModelDetails.
Returns:DocumentModelDetails
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of DocumentModelDetails.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
api_version: str | None¶API version used to create this model.
created_on: datetime¶Date and time (UTC) when the model was created.
description: str | None¶A description for the model.
doc_types: Dict[str, DocumentTypeDetails] | None¶Supported document types, including the fields for each document and their types.
expires_on: datetime | None¶Date and time (UTC) when the document model will expire.
model_id: str¶Unique model id.
tags: Dict[str, str] | None¶List of user defined key-value tag attributes associated with the model.
class azure.ai.formrecognizer.DocumentModelSummary(**kwargs: Any)[source]¶A summary of document model information including the model ID, its description, and when the model was created.
New in version 2023-07-31: The expires_on property.
classmethod from_dict(data: Dict[str, Any]) DocumentModelSummary [source]¶Converts a dict in the shape of a DocumentModelSummary to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentModelSummary.
Returns:DocumentModelSummary
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of DocumentModelSummary.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
api_version: str | None¶API version used to create this model.
created_on: datetime¶Date and time (UTC) when the model was created.
description: str | None¶A description for the model.
expires_on: datetime | None¶Date and time (UTC) when the document model will expire.
model_id: str¶Unique model id.
tags: Dict[str, str] | None¶List of user defined key-value tag attributes associated with the model.
class azure.ai.formrecognizer.DocumentPage(**kwargs: Any)[source]¶Content and layout elements extracted from a page of the input.
New in version 2023-07-31: The barcodes, and formulas properties.
classmethod from_dict(data: Dict) DocumentPage [source]¶Converts a dict in the shape of a DocumentPage to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentPage.
Returns:DocumentPage
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentPage.
Returns:dict
Return type:angle: float | None¶The general orientation of the content in clockwise direction, measured in degrees between (-180, 180].
barcodes: List[DocumentBarcode]¶Extracted barcodes from the page.
formulas: List[DocumentFormula]¶Extracted formulas from the page
height: float | None¶The height of the image/PDF in pixels/inches, respectively.
lines: List[DocumentLine]¶Extracted lines from the page, potentially containing both textual and visual elements.
page_number: int¶1-based page number in the input document.
selection_marks: List[DocumentSelectionMark]¶Extracted selection marks from the page.
spans: List[DocumentSpan]¶Location of the page in the reading order concatenated content.
unit: str | None¶The unit used by the width, height, and bounding polygon properties. For images, the unit is “pixel”. For PDF, the unit is “inch”. Possible values include: “pixel”, “inch”.
width: float | None¶The width of the image/PDF in pixels/inches, respectively.
words: List[DocumentWord]¶Extracted words from the page.
class azure.ai.formrecognizer.DocumentParagraph(**kwargs: Any)[source]¶A paragraph object generally consisting of contiguous lines with common alignment and spacing.
New in version 2023-07-31: The formulaBlock role.
classmethod from_dict(data: Dict) DocumentParagraph [source]¶Converts a dict in the shape of a DocumentParagraph to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentParagraph.
Returns:DocumentParagraph
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentParagraph.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the paragraph.
content: str¶Concatenated content of the paragraph in reading order.
role: str | None¶“pageHeader”, “pageFooter”, “pageNumber”, “title”, “sectionHeading”, “footnote”, “formulaBlock”.
Type:Semantic role of the paragraph. Known values are
spans: List[DocumentSpan]¶Location of the paragraph in the reading order concatenated content.
class azure.ai.formrecognizer.DocumentSelectionMark(**kwargs: Any)[source]¶A selection mark object representing check boxes, radio buttons, and other elements indicating a selection.
classmethod from_dict(data: Dict) DocumentSelectionMark [source]¶Converts a dict in the shape of a DocumentSelectionMark to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentSelectionMark.
Returns:DocumentSelectionMark
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentSelectionMark.
Returns:dict
Return type:confidence: float¶Confidence of correctly extracting the selection mark.
polygon: Sequence[Point]¶Bounding polygon of the selection mark.
span: DocumentSpan¶Location of the selection mark in the reading order concatenated content.
state: str¶“selected”, “unselected”.
Type:State of the selection mark. Possible values include
class azure.ai.formrecognizer.DocumentSpan(**kwargs: Any)[source]¶Contiguous region of the content of the property, specified as an offset and length.
classmethod from_dict(data: Dict) DocumentSpan [source]¶Converts a dict in the shape of a DocumentSpan to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentSpan.
Returns:DocumentSpan
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentSpan.
Returns:dict
Return type:length: int¶Number of characters in the content represented by the span.
offset: int¶Zero-based index of the content represented by the span.
class azure.ai.formrecognizer.DocumentStyle(**kwargs: Any)[source]¶An object representing observed text styles.
New in version 2023-07-31: The similar_font_family, font_style, font_weight, color, and background_color properties.
classmethod from_dict(data: Dict) DocumentStyle [source]¶Converts a dict in the shape of a DocumentStyle to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentStyle.
Returns:DocumentStyle
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentStyle.
Returns:dict
Return type:background_color: str | None¶Background color in #rrggbb hexadecimal format.
color: str | None¶Foreground color in #rrggbb hexadecimal format.
confidence: float¶Confidence of correctly identifying the style.
font_style: str | None¶“normal”, “italic”.
Type:Font style. Known values are
font_weight: str | None¶“normal”, “bold”.
Type:Font weight. Known values are
is_handwritten: bool | None¶Indicates if the content is handwritten.
similar_font_family: str | None¶Visually most similar font from among the set of supported font families, with fallback fonts following CSS convention (ex. ‘Arial, sans-serif’).
spans: List[DocumentSpan]¶Location of the text elements in the concatenated content the style applies to.
class azure.ai.formrecognizer.DocumentTable(**kwargs: Any)[source]¶A table object consisting of table cells arranged in a rectangular layout.
classmethod from_dict(data: Dict) DocumentTable [source]¶Converts a dict in the shape of a DocumentTable to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentTable.
Returns:DocumentTable
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentTable.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the table.
cells: List[DocumentTableCell]¶Cells contained within the table.
column_count: int¶Number of columns in the table.
row_count: int¶Number of rows in the table.
spans: List[DocumentSpan]¶Location of the table in the reading order concatenated content.
class azure.ai.formrecognizer.DocumentTableCell(**kwargs: Any)[source]¶An object representing the location and content of a table cell.
classmethod from_dict(data: Dict) DocumentTableCell [source]¶Converts a dict in the shape of a DocumentTableCell to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentTableCell.
Returns:DocumentTableCell
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentTableCell.
Returns:dict
Return type:bounding_regions: List[BoundingRegion] | None¶Bounding regions covering the table cell.
column_index: int¶Column index of the cell.
column_span: int | None¶Number of columns spanned by this cell.
content: str¶Concatenated content of the table cell in reading order.
kind: str | None¶“content”, “rowHeader”, “columnHeader”, “stubHead”, “description”. Default value: “content”.
Type:Table cell kind. Possible values include
row_index: int¶Row index of the cell.
row_span: int | None¶Number of rows spanned by this cell.
spans: List[DocumentSpan]¶Location of the table cell in the reading order concatenated content.
class azure.ai.formrecognizer.DocumentTypeDetails(**kwargs: Any)[source]¶DocumentTypeDetails represents a document type that a model can recognize, including its fields and types, and the confidence for those fields.
classmethod from_dict(data: Dict) DocumentTypeDetails [source]¶Converts a dict in the shape of a DocumentTypeDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentTypeDetails.
Returns:DocumentTypeDetails
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentTypeDetails.
Returns:dict
Return type:build_mode: str | None¶The build mode used when building the custom model. Possible values include: “template”, “neural”.
description: str | None¶A description for the model.
field_confidence: Dict[str, float] | None¶Estimated confidence for each field.
field_schema: Dict[str, Any]¶Description of the document semantic schema.
class azure.ai.formrecognizer.DocumentWord(**kwargs: Any)[source]¶A word object consisting of a contiguous sequence of characters. For non-space delimited languages, such as Chinese, Japanese, and Korean, each character is represented as its own word.
classmethod from_dict(data: Dict) DocumentWord [source]¶Converts a dict in the shape of a DocumentWord to the model itself.
Parameters:data (dict) – A dictionary in the shape of DocumentWord.
Returns:DocumentWord
Return type:to_dict() Dict [source]¶Returns a dict representation of DocumentWord.
Returns:dict
Return type:confidence: float¶Confidence of correctly extracting the word.
content: str¶Text content of the word.
polygon: Sequence[Point]¶Bounding polygon of the word.
span: DocumentSpan¶Location of the word in the reading order concatenated content.
class azure.ai.formrecognizer.FieldData(**kwargs: Any)[source]¶Contains the data for the form field. This includes the text, location of the text on the form, and a collection of the elements that make up the text.
New in version v2.1: FormSelectionMark is added to the types returned in the list of field_elements, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FieldData [source]¶Converts a dict in the shape of a FieldData to the model itself.
Parameters:data (dict) – A dictionary in the shape of FieldData.
Returns:FieldData
Return type:to_dict() Dict [source]¶Returns a dict representation of FieldData.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
field_elements: List[FormElement | FormWord | FormLine | FormSelectionMark]¶When include_field_elements is set to true, a list of elements constituting this field or value is returned. The list constitutes of elements such as lines, words, and selection marks.
page_number: int¶The 1-based number of the page in which this content is present.
text: str¶The string representation of the field or value.
class azure.ai.formrecognizer.FieldValueType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Semantic data type of the field value.
New in version v2.1: The selectionMark and countryRegion values
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
COUNTRY_REGION = 'countryRegion'¶ DATE = 'date'¶ DICTIONARY = 'dictionary'¶ FLOAT = 'float'¶ INTEGER = 'integer'¶ LIST = 'list'¶ PHONE_NUMBER = 'phoneNumber'¶ SELECTION_MARK = 'selectionMark'¶ STRING = 'string'¶ TIME = 'time'¶ class azure.ai.formrecognizer.FormContentType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Content type for upload.
New in version v2.1: Support for image/bmp
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
APPLICATION_PDF = 'application/pdf'¶ IMAGE_BMP = 'image/bmp'¶ IMAGE_JPEG = 'image/jpeg'¶ IMAGE_PNG = 'image/png'¶ IMAGE_TIFF = 'image/tiff'¶ class azure.ai.formrecognizer.FormElement(**kwargs: Any)[source]¶Base type which includes properties for a form element.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormElement [source]¶Converts a dict in the shape of a FormElement to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormElement.
Returns:FormElement
Return type:to_dict() Dict [source]¶Returns a dict representation of FormElement.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
kind: str¶The kind of form element. Possible kinds are “word”, “line”, or “selectionMark” which correspond to a FormWord
FormLine
, or FormSelectionMark
, respectively.
The 1-based number of the page in which this content is present.
text: str¶The text content of the element.
class azure.ai.formrecognizer.FormField(**kwargs: Any)[source]¶Represents a field recognized in an input form.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormField [source]¶Converts a dict in the shape of a FormField to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormField.
Returns:FormField
Return type:to_dict() Dict [source]¶Returns a dict representation of FormField.
Returns:dict
Return type:confidence: float¶Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
label_data: FieldData¶Contains the text, bounding box, and field elements for the field label. Note that this is not returned for forms analyzed by models trained with labels.
name: str¶The unique name of the field or the training-time label if analyzed from a custom model that was trained with labels.
value: str | int | float | date | time | Dict[str, FormField] | List[FormField]¶The value for the recognized field. Its semantic data type is described by value_type. If the value is extracted from the form, but cannot be normalized to its type, then access the value_data.text property for a textual representation of the value.
value_data: FieldData¶Contains the text, bounding box, and field elements for the field value.
value_type: str¶The type of value found on FormField. Described in FieldValueType
, possible types include: ‘string’, ‘date’, ‘time’, ‘phoneNumber’, ‘float’, ‘integer’, ‘dictionary’, ‘list’, ‘selectionMark’, or ‘countryRegion’.
An object representing an extracted line of text.
New in version v2.1: appearance property, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormLine [source]¶Converts a dict in the shape of a FormLine to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormLine.
Returns:FormLine
Return type:to_dict() Dict [source]¶Returns a dict representation of FormLine.
Returns:dict
Return type:appearance: TextAppearance¶An object representing the appearance of the line.
bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
kind: str¶For FormLine, this is “line”.
page_number: int¶The 1-based number of the page in which this content is present.
text: str¶The text content of the line.
words: List[FormWord]¶A list of the words that make up the line.
class azure.ai.formrecognizer.FormPage(**kwargs: Any)[source]¶Represents a page recognized from the input document. Contains lines, words, selection marks, tables and page metadata.
New in version v2.1: selection_marks property, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormPage [source]¶Converts a dict in the shape of a FormPage to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormPage.
Returns:FormPage
Return type:to_dict() Dict [source]¶Returns a dict representation of FormPage.
Returns:dict
Return type:height: float¶The height of the image/PDF in pixels/inches, respectively.
lines: List[FormLine]¶When include_field_elements is set to true, a list of recognized text lines is returned. For calls to recognize content, this list is always populated. The maximum number of lines returned is 300 per page. The lines are sorted top to bottom, left to right, although in certain cases proximity is treated with higher priority. As the sorting order depends on the detected text, it may change across images and OCR version updates. Thus, business logic should be built upon the actual line location instead of order. The reading order of lines can be specified by the reading_order keyword argument (Note: reading_order only supported in begin_recognize_content and begin_recognize_content_from_url).
page_number: int¶The 1-based number of the page in which this content is present.
selection_marks: List[FormSelectionMark]¶List of selection marks extracted from the page.
tables: List[FormTable]¶A list of extracted tables contained in a page.
text_angle: float¶The general orientation of the text in clockwise direction, measured in degrees between (-180, 180].
unit: str¶The LengthUnit
used by the width, height, and bounding box properties. For images, the unit is “pixel”. For PDF, the unit is “inch”.
The width of the image/PDF in pixels/inches, respectively.
class azure.ai.formrecognizer.FormPageRange(first_page_number: int, last_page_number: int)[source]¶The 1-based page range of the form.
New in version v2.1: Support for to_dict and from_dict methods
Create new instance of FormPageRange(first_page_number, last_page_number)
count(value, /)¶Return number of occurrences of value.
classmethod from_dict(data: Dict) FormPageRange [source]¶Converts a dict in the shape of a FormPageRange to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormPageRange.
Returns:FormPageRange
Return type:index(value, start=0, stop=9223372036854775807, /)¶Return first index of value.
Raises ValueError if the value is not present.
to_dict() Dict [source]¶Returns a dict representation of FormPageRange.
Returns:dict
Return type:first_page_number: int¶The first page number of the form.
last_page_number: int¶The last page number of the form.
class azure.ai.formrecognizer.FormRecognizerApiVersion(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶Form Recognizer API versions supported by FormRecognizerClient and FormTrainingClient.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
V2_0 = '2.0'¶ V2_1 = '2.1'¶This is the default version
class azure.ai.formrecognizer.FormRecognizerClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)[source]¶FormRecognizerClient extracts information from forms and images into structured data. It is the interface to use for analyzing with prebuilt models (receipts, business cards, invoices, identity documents), recognizing content/layout from forms, and analyzing custom forms from trained models. It provides different methods based on inputs from a URL and inputs from a stream.
Note
FormRecognizerClient should be used with API versions <=v2.1. To use API versions 2022-08-31 and up, instantiate a DocumentAnalysisClient.
Parameters: Keyword Arguments:api_version (str or FormRecognizerApiVersion) – The API version of the service to use for requests. It defaults to API version v2.1. Setting to an older version may result in reduced feature compatibility. To use the latest supported API version and features, instantiate a DocumentAnalysisClient instead.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient(endpoint, AzureKeyCredential(key))¶
"""DefaultAzureCredential will use the values from these environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET """ from azure.ai.formrecognizer import FormRecognizerClient from azure.identity import DefaultAzureCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] credential = DefaultAzureCredential() form_recognizer_client = FormRecognizerClient(endpoint, credential)begin_recognize_business_cards(business_card: bytes | IO[bytes], **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Extract field text and semantic values from a given business card. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’ or ‘image/bmp’.
See fields found on a business card here: https://aka.ms/formrecognizer/businesscardfields
Parameters:business_card (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
Keyword Arguments:locale (str) – Locale of the business card. Supported locales include: en-US, en-AU, en-CA, en-GB, and en-IN.
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_business_cards client method
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_business_cards(business_card=f, locale="en-US") business_cards = poller.result() for idx, business_card in enumerate(business_cards): print("--------Recognizing business card #{}--------".format(idx+1)) contact_names = business_card.fields.get("ContactNames") if contact_names: for contact_name in contact_names.value: print("Contact First Name: {} has confidence: {}".format( contact_name.value["FirstName"].value, contact_name.value["FirstName"].confidence )) print("Contact Last Name: {} has confidence: {}".format( contact_name.value["LastName"].value, contact_name.value["LastName"].confidence )) company_names = business_card.fields.get("CompanyNames") if company_names: for company_name in company_names.value: print("Company Name: {} has confidence: {}".format(company_name.value, company_name.confidence)) departments = business_card.fields.get("Departments") if departments: for department in departments.value: print("Department: {} has confidence: {}".format(department.value, department.confidence)) job_titles = business_card.fields.get("JobTitles") if job_titles: for job_title in job_titles.value: print("Job Title: {} has confidence: {}".format(job_title.value, job_title.confidence)) emails = business_card.fields.get("Emails") if emails: for email in emails.value: print("Email: {} has confidence: {}".format(email.value, email.confidence)) websites = business_card.fields.get("Websites") if websites: for website in websites.value: print("Website: {} has confidence: {}".format(website.value, website.confidence)) addresses = business_card.fields.get("Addresses") if addresses: for address in addresses.value: print("Address: {} has confidence: {}".format(address.value, address.confidence)) mobile_phones = business_card.fields.get("MobilePhones") if mobile_phones: for phone in mobile_phones.value: print("Mobile phone number: {} has confidence: {}".format(phone.value, phone.confidence)) faxes = business_card.fields.get("Faxes") if faxes: for fax in faxes.value: print("Fax number: {} has confidence: {}".format(fax.value, fax.confidence)) work_phones = business_card.fields.get("WorkPhones") if work_phones: for work_phone in work_phones.value: print("Work phone number: {} has confidence: {}".format(work_phone.value, work_phone.confidence)) other_phones = business_card.fields.get("OtherPhones") if other_phones: for other_phone in other_phones.value: print("Other phone number: {} has confidence: {}".format(other_phone.value, other_phone.confidence))begin_recognize_business_cards_from_url(business_card_url: str, **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Extract field text and semantic values from a given business card. The input document must be the location (URL) of the card to be analyzed.
See fields found on a business card here: https://aka.ms/formrecognizer/businesscardfields
Parameters:business_card_url (str) – The URL of the business card to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
Keyword Arguments:locale (str) – Locale of the business card. Supported locales include: en-US, en-AU, en-CA, en-GB, and en-IN.
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_business_cards_from_url client method
begin_recognize_content(form: bytes | IO[bytes], **kwargs: Any) LROPoller[List[FormPage]] [source]¶Extract text and content/layout information from a given document. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’ or ‘image/bmp’.
Parameters:form (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
Keyword Arguments:pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
language (str) – The BCP-47 language code of the text in the document. See supported language codes here: https://docs.microsoft.com/azure/cognitive-services/form-recognizer/language-support. Content supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.
reading_order (str) – Reading order algorithm to sort the text lines returned. Supported reading orders include: basic (default), natural. Set ‘basic’ to sort lines left to right and top to bottom, although in some cases proximity is treated with higher priority. Set ‘natural’ to sort lines by using positional information to keep nearby lines together.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[FormPage
].
New in version v2.1: The pages, language and reading_order keyword arguments and support for image/bmp content
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient(endpoint=endpoint, credential=AzureKeyCredential(key)) with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_content(form=f) form_pages = poller.result() for idx, content in enumerate(form_pages): print("----Recognizing content from page #{}----".format(idx+1)) print("Page has width: {} and height: {}, measured with unit: {}".format( content.width, content.height, content.unit )) for table_idx, table in enumerate(content.tables): print("Table # {} has {} rows and {} columns".format(table_idx, table.row_count, table.column_count)) print("Table # {} location on page: {}".format(table_idx, format_bounding_box(table.bounding_box))) for cell in table.cells: print("...Cell[{}][{}] has text '{}' within bounding box '{}'".format( cell.row_index, cell.column_index, cell.text, format_bounding_box(cell.bounding_box) )) for line_idx, line in enumerate(content.lines): print("Line # {} has word count '{}' and text '{}' within bounding box '{}'".format( line_idx, len(line.words), line.text, format_bounding_box(line.bounding_box) )) if line.appearance: if line.appearance.style_name == "handwriting" and line.appearance.style_confidence > 0.8: print("Text line '{}' is handwritten and might be a signature.".format(line.text)) for word in line.words: print("...Word '{}' has a confidence of {}".format(word.text, word.confidence)) for selection_mark in content.selection_marks: print("Selection mark is '{}' within bounding box '{}' and has a confidence of {}".format( selection_mark.state, format_bounding_box(selection_mark.bounding_box), selection_mark.confidence )) print("----------------------------------------")begin_recognize_content_from_url(form_url: str, **kwargs: Any) LROPoller[List[FormPage]] [source]¶
Extract text and layout information from a given document. The input document must be the location (URL) of the document to be analyzed.
Parameters:form_url (str) – The URL of the form to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
Keyword Arguments:pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
language (str) – The BCP-47 language code of the text in the document. See supported language codes here: https://docs.microsoft.com/azure/cognitive-services/form-recognizer/language-support. Content supports auto language identification and multilanguage documents, so only provide a language code if you would like to force the documented to be processed as that specific language.
reading_order (str) – Reading order algorithm to sort the text lines returned. Supported reading orders include: basic (default), natural. Set ‘basic’ to sort lines left to right and top to bottom, although in some cases proximity is treated with higher priority. Set ‘natural’ to sort lines by using positional information to keep nearby lines together.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[FormPage
].
New in version v2.1: The pages, language and reading_order keyword arguments and support for image/bmp content
begin_recognize_custom_forms(model_id: str, form: bytes | IO[bytes], **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶Analyze a custom form with a model trained with or without labels. The form to analyze should be of the same type as the forms that were used to train the model. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’, or ‘image/bmp’.
Parameters:model_id (str) – Custom model identifier.
form (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] model_id = os.getenv("CUSTOM_TRAINED_MODEL_ID", custom_model_id) form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) # Make sure your form's type is included in the list of form types the custom model can recognize with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_custom_forms( model_id=model_id, form=f, include_field_elements=True ) forms = poller.result() for idx, form in enumerate(forms): print("--------Recognizing Form #{}--------".format(idx+1)) print("Form has type {}".format(form.form_type)) print("Form has form type confidence {}".format(form.form_type_confidence)) print("Form was analyzed with model with ID {}".format(form.model_id)) for name, field in form.fields.items(): # each field is of type FormField # label_data is populated if you are using a model trained without labels, # since the service needs to make predictions for labels if not explicitly given to it. if field.label_data: print("...Field '{}' has label '{}' with a confidence score of {}".format( name, field.label_data.text, field.confidence )) print("...Label '{}' has value '{}' with a confidence score of {}".format( field.label_data.text if field.label_data else name, field.value, field.confidence )) # iterate over tables, lines, and selection marks on each page for page in form.pages: for i, table in enumerate(page.tables): print("\nTable {} on page {}".format(i+1, table.page_number)) for cell in table.cells: print("...Cell[{}][{}] has text '{}' with confidence {}".format( cell.row_index, cell.column_index, cell.text, cell.confidence )) print("\nLines found on page {}".format(page.page_number)) for line in page.lines: print("...Line '{}' is made up of the following words: ".format(line.text)) for word in line.words: print("......Word '{}' has a confidence of {}".format( word.text, word.confidence )) if page.selection_marks: print("\nSelection marks found on page {}".format(page.page_number)) for selection_mark in page.selection_marks: print("......Selection mark is '{}' and has a confidence of {}".format( selection_mark.state, selection_mark.confidence )) print("-----------------------------------")begin_recognize_custom_forms_from_url(model_id: str, form_url: str, **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Analyze a custom form with a model trained with or without labels. The form to analyze should be of the same type as the forms that were used to train the model. The input document must be the location (URL) of the document to be analyzed.
Parameters:model_id (str) – Custom model identifier.
form_url (str) – The URL of the form to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:begin_recognize_identity_documents(identity_document: bytes | IO[bytes], **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶Extract field text and semantic values from a given identity document. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’ or ‘image/bmp’.
See fields found on an identity document here: https://aka.ms/formrecognizer/iddocumentfields
Parameters:identity_document (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
Keyword Arguments:include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.continuation_token (str) – A continuation token to restart a poller from a saved state.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_identity_documents client method
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_identity_documents(identity_document=f) id_documents = poller.result() for idx, id_document in enumerate(id_documents): print("--------Recognizing ID document #{}--------".format(idx+1)) first_name = id_document.fields.get("FirstName") if first_name: print("First Name: {} has confidence: {}".format(first_name.value, first_name.confidence)) last_name = id_document.fields.get("LastName") if last_name: print("Last Name: {} has confidence: {}".format(last_name.value, last_name.confidence)) document_number = id_document.fields.get("DocumentNumber") if document_number: print("Document Number: {} has confidence: {}".format(document_number.value, document_number.confidence)) dob = id_document.fields.get("DateOfBirth") if dob: print("Date of Birth: {} has confidence: {}".format(dob.value, dob.confidence)) doe = id_document.fields.get("DateOfExpiration") if doe: print("Date of Expiration: {} has confidence: {}".format(doe.value, doe.confidence)) sex = id_document.fields.get("Sex") if sex: print("Sex: {} has confidence: {}".format(sex.value, sex.confidence)) address = id_document.fields.get("Address") if address: print("Address: {} has confidence: {}".format(address.value, address.confidence)) country_region = id_document.fields.get("CountryRegion") if country_region: print("Country/Region: {} has confidence: {}".format(country_region.value, country_region.confidence)) region = id_document.fields.get("Region") if region: print("Region: {} has confidence: {}".format(region.value, region.confidence))begin_recognize_identity_documents_from_url(identity_document_url: str, **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Extract field text and semantic values from a given identity document. The input document must be the location (URL) of the identity document to be analyzed.
See fields found on an identity document here: https://aka.ms/formrecognizer/iddocumentfields
Parameters:identity_document_url (str) – The URL of the identity document to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
Keyword Arguments:include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
continuation_token (str) – A continuation token to restart a poller from a saved state.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_identity_documents_from_url client method
begin_recognize_invoices(invoice: bytes | IO[bytes], **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶Extract field text and semantic values from a given invoice. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’ or ‘image/bmp’.
See fields found on a invoice here: https://aka.ms/formrecognizer/invoicefields
Parameters:invoice (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
Keyword Arguments:locale (str) – Locale of the invoice. Supported locales include: en-US
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_invoices client method
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_invoices(invoice=f, locale="en-US") invoices = poller.result() for idx, invoice in enumerate(invoices): print("--------Recognizing invoice #{}--------".format(idx+1)) vendor_name = invoice.fields.get("VendorName") if vendor_name: print("Vendor Name: {} has confidence: {}".format(vendor_name.value, vendor_name.confidence)) vendor_address = invoice.fields.get("VendorAddress") if vendor_address: print("Vendor Address: {} has confidence: {}".format(vendor_address.value, vendor_address.confidence)) vendor_address_recipient = invoice.fields.get("VendorAddressRecipient") if vendor_address_recipient: print("Vendor Address Recipient: {} has confidence: {}".format(vendor_address_recipient.value, vendor_address_recipient.confidence)) customer_name = invoice.fields.get("CustomerName") if customer_name: print("Customer Name: {} has confidence: {}".format(customer_name.value, customer_name.confidence)) customer_id = invoice.fields.get("CustomerId") if customer_id: print("Customer Id: {} has confidence: {}".format(customer_id.value, customer_id.confidence)) customer_address = invoice.fields.get("CustomerAddress") if customer_address: print("Customer Address: {} has confidence: {}".format(customer_address.value, customer_address.confidence)) customer_address_recipient = invoice.fields.get("CustomerAddressRecipient") if customer_address_recipient: print("Customer Address Recipient: {} has confidence: {}".format(customer_address_recipient.value, customer_address_recipient.confidence)) invoice_id = invoice.fields.get("InvoiceId") if invoice_id: print("Invoice Id: {} has confidence: {}".format(invoice_id.value, invoice_id.confidence)) invoice_date = invoice.fields.get("InvoiceDate") if invoice_date: print("Invoice Date: {} has confidence: {}".format(invoice_date.value, invoice_date.confidence)) invoice_total = invoice.fields.get("InvoiceTotal") if invoice_total: print("Invoice Total: {} has confidence: {}".format(invoice_total.value, invoice_total.confidence)) due_date = invoice.fields.get("DueDate") if due_date: print("Due Date: {} has confidence: {}".format(due_date.value, due_date.confidence)) purchase_order = invoice.fields.get("PurchaseOrder") if purchase_order: print("Purchase Order: {} has confidence: {}".format(purchase_order.value, purchase_order.confidence)) billing_address = invoice.fields.get("BillingAddress") if billing_address: print("Billing Address: {} has confidence: {}".format(billing_address.value, billing_address.confidence)) billing_address_recipient = invoice.fields.get("BillingAddressRecipient") if billing_address_recipient: print("Billing Address Recipient: {} has confidence: {}".format(billing_address_recipient.value, billing_address_recipient.confidence)) shipping_address = invoice.fields.get("ShippingAddress") if shipping_address: print("Shipping Address: {} has confidence: {}".format(shipping_address.value, shipping_address.confidence)) shipping_address_recipient = invoice.fields.get("ShippingAddressRecipient") if shipping_address_recipient: print("Shipping Address Recipient: {} has confidence: {}".format(shipping_address_recipient.value, shipping_address_recipient.confidence)) print("Invoice items:") for idx, item in enumerate(invoice.fields.get("Items").value): print("...Item #{}".format(idx+1)) item_description = item.value.get("Description") if item_description: print("......Description: {} has confidence: {}".format(item_description.value, item_description.confidence)) item_quantity = item.value.get("Quantity") if item_quantity: print("......Quantity: {} has confidence: {}".format(item_quantity.value, item_quantity.confidence)) unit = item.value.get("Unit") if unit: print("......Unit: {} has confidence: {}".format(unit.value, unit.confidence)) unit_price = item.value.get("UnitPrice") if unit_price: print("......Unit Price: {} has confidence: {}".format(unit_price.value, unit_price.confidence)) product_code = item.value.get("ProductCode") if product_code: print("......Product Code: {} has confidence: {}".format(product_code.value, product_code.confidence)) item_date = item.value.get("Date") if item_date: print("......Date: {} has confidence: {}".format(item_date.value, item_date.confidence)) tax = item.value.get("Tax") if tax: print("......Tax: {} has confidence: {}".format(tax.value, tax.confidence)) amount = item.value.get("Amount") if amount: print("......Amount: {} has confidence: {}".format(amount.value, amount.confidence)) subtotal = invoice.fields.get("SubTotal") if subtotal: print("Subtotal: {} has confidence: {}".format(subtotal.value, subtotal.confidence)) total_tax = invoice.fields.get("TotalTax") if total_tax: print("Total Tax: {} has confidence: {}".format(total_tax.value, total_tax.confidence)) previous_unpaid_balance = invoice.fields.get("PreviousUnpaidBalance") if previous_unpaid_balance: print("Previous Unpaid Balance: {} has confidence: {}".format(previous_unpaid_balance.value, previous_unpaid_balance.confidence)) amount_due = invoice.fields.get("AmountDue") if amount_due: print("Amount Due: {} has confidence: {}".format(amount_due.value, amount_due.confidence)) service_start_date = invoice.fields.get("ServiceStartDate") if service_start_date: print("Service Start Date: {} has confidence: {}".format(service_start_date.value, service_start_date.confidence)) service_end_date = invoice.fields.get("ServiceEndDate") if service_end_date: print("Service End Date: {} has confidence: {}".format(service_end_date.value, service_end_date.confidence)) service_address = invoice.fields.get("ServiceAddress") if service_address: print("Service Address: {} has confidence: {}".format(service_address.value, service_address.confidence)) service_address_recipient = invoice.fields.get("ServiceAddressRecipient") if service_address_recipient: print("Service Address Recipient: {} has confidence: {}".format(service_address_recipient.value, service_address_recipient.confidence)) remittance_address = invoice.fields.get("RemittanceAddress") if remittance_address: print("Remittance Address: {} has confidence: {}".format(remittance_address.value, remittance_address.confidence)) remittance_address_recipient = invoice.fields.get("RemittanceAddressRecipient") if remittance_address_recipient: print("Remittance Address Recipient: {} has confidence: {}".format(remittance_address_recipient.value, remittance_address_recipient.confidence))begin_recognize_invoices_from_url(invoice_url: str, **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Extract field text and semantic values from a given invoice. The input document must be the location (URL) of the invoice to be analyzed.
See fields found on a invoice card here: https://aka.ms/formrecognizer/invoicefields
Parameters:invoice_url (str) – The URL of the invoice to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
Keyword Arguments:locale (str) – Locale of the invoice. Supported locales include: en-US
include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The begin_recognize_invoices_from_url client method
begin_recognize_receipts(receipt: bytes | IO[bytes], **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶Extract field text and semantic values from a given sales receipt. The input document must be of one of the supported content types - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’ or ‘image/bmp’.
See fields found on a receipt here: https://aka.ms/formrecognizer/receiptfields
Parameters:receipt (bytes or IO[bytes]) – JPEG, PNG, PDF, TIFF, or BMP type file stream or bytes.
Keyword Arguments:include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
content_type (str or FormContentType) – Content-type of the body sent to the API. Content-type is auto-detected, but can be overridden by passing this keyword argument. For options, see
FormContentType
.continuation_token (str) – A continuation token to restart a poller from a saved state.
locale (str) – Locale of the receipt. Supported locales include: en-US, en-AU, en-CA, en-GB, and en-IN.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The locale and pages keyword arguments and support for image/bmp content
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) with open(path_to_sample_forms, "rb") as f: poller = form_recognizer_client.begin_recognize_receipts(receipt=f, locale="en-US") receipts = poller.result() for idx, receipt in enumerate(receipts): print("--------Recognizing receipt #{}--------".format(idx+1)) receipt_type = receipt.fields.get("ReceiptType") if receipt_type: print("Receipt Type: {} has confidence: {}".format(receipt_type.value, receipt_type.confidence)) merchant_name = receipt.fields.get("MerchantName") if merchant_name: print("Merchant Name: {} has confidence: {}".format(merchant_name.value, merchant_name.confidence)) transaction_date = receipt.fields.get("TransactionDate") if transaction_date: print("Transaction Date: {} has confidence: {}".format(transaction_date.value, transaction_date.confidence)) if receipt.fields.get("Items"): print("Receipt items:") for idx, item in enumerate(receipt.fields.get("Items").value): print("...Item #{}".format(idx+1)) item_name = item.value.get("Name") if item_name: print("......Item Name: {} has confidence: {}".format(item_name.value, item_name.confidence)) item_quantity = item.value.get("Quantity") if item_quantity: print("......Item Quantity: {} has confidence: {}".format(item_quantity.value, item_quantity.confidence)) item_price = item.value.get("Price") if item_price: print("......Individual Item Price: {} has confidence: {}".format(item_price.value, item_price.confidence)) item_total_price = item.value.get("TotalPrice") if item_total_price: print("......Total Item Price: {} has confidence: {}".format(item_total_price.value, item_total_price.confidence)) subtotal = receipt.fields.get("Subtotal") if subtotal: print("Subtotal: {} has confidence: {}".format(subtotal.value, subtotal.confidence)) tax = receipt.fields.get("Tax") if tax: print("Tax: {} has confidence: {}".format(tax.value, tax.confidence)) tip = receipt.fields.get("Tip") if tip: print("Tip: {} has confidence: {}".format(tip.value, tip.confidence)) total = receipt.fields.get("Total") if total: print("Total: {} has confidence: {}".format(total.value, total.confidence)) print("--------------------------------------")begin_recognize_receipts_from_url(receipt_url: str, **kwargs: Any) LROPoller[List[RecognizedForm]] [source]¶
Extract field text and semantic values from a given sales receipt. The input document must be the location (URL) of the receipt to be analyzed.
See fields found on a receipt here: https://aka.ms/formrecognizer/receiptfields
Parameters:receipt_url (str) – The URL of the receipt to analyze. The input must be a valid, encoded URL of one of the supported formats: JPEG, PNG, PDF, TIFF, or BMP.
Keyword Arguments:include_field_elements (bool) – Whether or not to include all lines per page and field elements such as lines, words, and selection marks for each form field.
continuation_token (str) – A continuation token to restart a poller from a saved state.
locale (str) – Locale of the receipt. Supported locales include: en-US, en-AU, en-CA, en-GB, and en-IN.
pages (list[str]) – Custom page numbers for multi-page documents(PDF/TIFF). Input the page numbers and/or ranges of pages you want to get in the result. For a range of pages, use a hyphen, like pages=[“1-3”, “5-6”]. Separate each page number or range with a comma.
An instance of an LROPoller. Call result() on the poller object to return a list[RecognizedForm
].
LROPoller[list[RecognizedForm]]
Raises:New in version v2.1: The locale and pages keyword arguments and support for image/bmp content
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormRecognizerClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_recognizer_client = FormRecognizerClient( endpoint=endpoint, credential=AzureKeyCredential(key) ) url = "https://raw.githubusercontent.com/Azure/azure-sdk-for-python/main/sdk/formrecognizer/azure-ai-formrecognizer/tests/sample_forms/receipt/contoso-receipt.png" poller = form_recognizer_client.begin_recognize_receipts_from_url(receipt_url=url) receipts = poller.result() for idx, receipt in enumerate(receipts): print("--------Recognizing receipt #{}--------".format(idx+1)) receipt_type = receipt.fields.get("ReceiptType") if receipt_type: print("Receipt Type: {} has confidence: {}".format(receipt_type.value, receipt_type.confidence)) merchant_name = receipt.fields.get("MerchantName") if merchant_name: print("Merchant Name: {} has confidence: {}".format(merchant_name.value, merchant_name.confidence)) transaction_date = receipt.fields.get("TransactionDate") if transaction_date: print("Transaction Date: {} has confidence: {}".format(transaction_date.value, transaction_date.confidence)) if receipt.fields.get("Items"): print("Receipt items:") for idx, item in enumerate(receipt.fields.get("Items").value): print("...Item #{}".format(idx+1)) item_name = item.value.get("Name") if item_name: print("......Item Name: {} has confidence: {}".format(item_name.value, item_name.confidence)) item_quantity = item.value.get("Quantity") if item_quantity: print("......Item Quantity: {} has confidence: {}".format(item_quantity.value, item_quantity.confidence)) item_price = item.value.get("Price") if item_price: print("......Individual Item Price: {} has confidence: {}".format(item_price.value, item_price.confidence)) item_total_price = item.value.get("TotalPrice") if item_total_price: print("......Total Item Price: {} has confidence: {}".format(item_total_price.value, item_total_price.confidence)) subtotal = receipt.fields.get("Subtotal") if subtotal: print("Subtotal: {} has confidence: {}".format(subtotal.value, subtotal.confidence)) tax = receipt.fields.get("Tax") if tax: print("Tax: {} has confidence: {}".format(tax.value, tax.confidence)) tip = receipt.fields.get("Tip") if tip: print("Tip: {} has confidence: {}".format(tip.value, tip.confidence)) total = receipt.fields.get("Total") if total: print("Total: {} has confidence: {}".format(total.value, total.confidence)) print("--------------------------------------")close() None [source]¶
Close the FormRecognizerClient
session.
Runs a network request using the client’s existing pipeline.
The request URL can be relative to the base URL. The service API version used for the request is the same as the client’s unless otherwise specified. Overriding the client’s configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.
Parameters:request (HttpRequest) – The network request you want to make.
Keyword Arguments:stream (bool) – Whether the response payload will be streamed. Defaults to False.
Returns:The response of your network call. Does not do error handling on your response.
Return type:class azure.ai.formrecognizer.FormRecognizerError(**kwargs: Any)[source]¶Represents an error that occurred while training.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormRecognizerError [source]¶Converts a dict in the shape of a FormRecognizerError to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormRecognizerError.
Returns:FormRecognizerError
Return type:to_dict() Dict [source]¶Returns a dict representation of FormRecognizerError.
Returns:dict
Return type:code: str¶Error code.
message: str¶Error message.
class azure.ai.formrecognizer.FormSelectionMark(**kwargs: Any)[source]¶Information about the extracted selection mark.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormSelectionMark [source]¶Converts a dict in the shape of a FormSelectionMark to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormSelectionMark.
Returns:FormSelectionMark
Return type:to_dict() Dict [source]¶Returns a dict representation of FormSelectionMark.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
confidence: float¶Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
kind: str¶For FormSelectionMark, this is “selectionMark”.
page_number: int¶The 1-based number of the page in which this content is present.
state: str¶“selected”, “unselected”.
Type:State of the selection mark. Possible values include
text: str¶The text content - not returned for FormSelectionMark.
class azure.ai.formrecognizer.FormTable(**kwargs: Any)[source]¶Information about the extracted table contained on a page.
New in version v2.1: The bounding_box property, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormTable [source]¶Converts a dict in the shape of a FormTable to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormTable.
Returns:FormTable
Return type:to_dict() Dict [source]¶Returns a dict representation of FormTable.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the table. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
cells: List[FormTableCell]¶List of cells contained in the table.
column_count: int¶Number of columns in table.
page_number: int¶The 1-based number of the page in which this table is present.
row_count: int¶Number of rows in table.
class azure.ai.formrecognizer.FormTableCell(**kwargs: Any)[source]¶Represents a cell contained in a table recognized from the input document.
New in version v2.1: FormSelectionMark is added to the types returned in the list of field_elements, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormTableCell [source]¶Converts a dict in the shape of a FormTableCell to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormTableCell.
Returns:FormTableCell
Return type:to_dict() Dict [source]¶Returns a dict representation of FormTableCell.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
column_index: int¶Column index of the cell.
column_span: int¶Number of columns spanned by this cell.
confidence: float¶Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
field_elements: List[FormElement | FormWord | FormLine | FormSelectionMark]¶When include_field_elements is set to true, a list of elements constituting this cell is returned. The list constitutes of elements such as lines, words, and selection marks. For calls to begin_recognize_content(), this list is always populated.
is_footer: bool¶Whether the current cell is a footer cell.
is_header: bool¶Whether the current cell is a header cell.
page_number: int¶The 1-based number of the page in which this content is present.
row_index: int¶Row index of the cell.
row_span: int¶Number of rows spanned by this cell.
text: str¶Text content of the cell.
class azure.ai.formrecognizer.FormTrainingClient(endpoint: str, credential: AzureKeyCredential | TokenCredential, **kwargs: Any)[source]¶FormTrainingClient is the Form Recognizer interface to use for creating and managing custom models. It provides methods for training models on the forms you provide, as well as methods for viewing and deleting models, accessing account properties, copying models to another Form Recognizer resource, and composing models from a collection of existing models trained with labels.
Note
FormTrainingClient should be used with API versions <=v2.1. To use API versions 2022-08-31 and up, instantiate a DocumentModelAdministrationClient.
Parameters: Keyword Arguments:api_version (str or FormRecognizerApiVersion) – The API version of the service to use for requests. It defaults to API version v2.1. Setting to an older version may result in reduced feature compatibility. To use the latest supported API version and features, instantiate a DocumentModelAdministrationClient instead.
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormTrainingClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] form_training_client = FormTrainingClient(endpoint, AzureKeyCredential(key))¶
"""DefaultAzureCredential will use the values from these environment variables: AZURE_CLIENT_ID, AZURE_TENANT_ID, AZURE_CLIENT_SECRET """ from azure.ai.formrecognizer import FormTrainingClient from azure.identity import DefaultAzureCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] credential = DefaultAzureCredential() form_training_client = FormTrainingClient(endpoint, credential)begin_copy_model(model_id: str, target: Dict[str, str | int], **kwargs: Any) LROPoller[CustomFormModelInfo] [source]¶
Copy a custom model stored in this resource (the source) to the user specified target Form Recognizer resource. This should be called with the source Form Recognizer resource (with the model that is intended to be copied). The target parameter should be supplied from the target resource’s output from calling the get_copy_authorization()
method.
model_id (str) – Model identifier of the model to copy to target resource.
target (Dict[str, Union[str, int]]) – The copy authorization generated from the target resource’s call to
get_copy_authorization()
.
continuation_token (str) – A continuation token to restart a poller from a saved state.
Returns:An instance of an LROPoller. Call result() on the poller object to return a CustomFormModelInfo
.
LROPoller[CustomFormModelInfo]
Raises:Example:
¶source_client = FormTrainingClient(endpoint=source_endpoint, credential=AzureKeyCredential(source_key)) poller = source_client.begin_copy_model( model_id=source_model_id, target=target # output from target client's call to get_copy_authorization() ) copied_over_model = poller.result() print("Model ID: {}".format(copied_over_model.model_id)) print("Status: {}".format(copied_over_model.status))begin_create_composed_model(model_ids: List[str], **kwargs: Any) LROPoller[CustomFormModel] [source]¶
Creates a composed model from a collection of existing models that were trained with labels.
A composed model allows multiple models to be called with a single model ID. When a document is submitted to be analyzed with a composed model ID, a classification step is first performed to route it to the correct custom model.
Parameters:model_ids (list[str]) – List of model IDs to use in the composed model.
Keyword Arguments:model_name (str) – An optional, user-defined name to associate with your model.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a CustomFormModel
.
New in version v2.1: The begin_create_composed_model client method
Example:
¶from azure.core.credentials import AzureKeyCredential from azure.ai.formrecognizer import FormTrainingClient endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] po_supplies = os.environ['PURCHASE_ORDER_OFFICE_SUPPLIES_SAS_URL_V2'] po_equipment = os.environ['PURCHASE_ORDER_OFFICE_EQUIPMENT_SAS_URL_V2'] po_furniture = os.environ['PURCHASE_ORDER_OFFICE_FURNITURE_SAS_URL_V2'] po_cleaning_supplies = os.environ['PURCHASE_ORDER_OFFICE_CLEANING_SUPPLIES_SAS_URL_V2'] form_training_client = FormTrainingClient(endpoint=endpoint, credential=AzureKeyCredential(key)) supplies_poller = form_training_client.begin_training( po_supplies, use_training_labels=True, model_name="Purchase order - Office supplies" ) equipment_poller = form_training_client.begin_training( po_equipment, use_training_labels=True, model_name="Purchase order - Office Equipment" ) furniture_poller = form_training_client.begin_training( po_furniture, use_training_labels=True, model_name="Purchase order - Furniture" ) cleaning_supplies_poller = form_training_client.begin_training( po_cleaning_supplies, use_training_labels=True, model_name="Purchase order - Cleaning Supplies" ) supplies_model = supplies_poller.result() equipment_model = equipment_poller.result() furniture_model = furniture_poller.result() cleaning_supplies_model = cleaning_supplies_poller.result() models_trained_with_labels = [ supplies_model.model_id, equipment_model.model_id, furniture_model.model_id, cleaning_supplies_model.model_id ] poller = form_training_client.begin_create_composed_model( models_trained_with_labels, model_name="Office Supplies Composed Model" ) model = poller.result() print("Office Supplies Composed Model Info:") print("Model ID: {}".format(model.model_id)) print("Model name: {}".format(model.model_name)) print("Is this a composed model?: {}".format(model.properties.is_composed_model)) print("Status: {}".format(model.status)) print("Composed model creation started on: {}".format(model.training_started_on)) print("Creation completed on: {}".format(model.training_completed_on))begin_training(training_files_url: str, use_training_labels: bool, **kwargs: Any) LROPoller[CustomFormModel] [source]¶
Create and train a custom model. The request must include a training_files_url parameter that is an externally accessible Azure storage blob container URI (preferably a Shared Access Signature URI). Note that a container URI (without SAS) is accepted only when the container is public or has a managed identity configured, see more about configuring managed identities to work with Form Recognizer here: https://docs.microsoft.com/azure/applied-ai-services/form-recognizer/managed-identities. Models are trained using documents that are of the following content type - ‘application/pdf’, ‘image/jpeg’, ‘image/png’, ‘image/tiff’, or ‘image/bmp’. Other types of content in the container is ignored.
Parameters:training_files_url (str) – An Azure Storage blob container’s SAS URI. A container URI (without SAS) can be used if the container is public or has a managed identity configured. For more information on setting up a training data set, see: https://aka.ms/azsdk/formrecognizer/buildtrainingset.
use_training_labels (bool) – Whether to train with labels or not. Corresponding labeled files must exist in the blob container if set to True.
prefix (str) – A case-sensitive prefix string to filter documents in the source path for training. For example, when using an Azure storage blob URI, use the prefix to restrict sub folders for training.
include_subfolders (bool) – A flag to indicate if subfolders within the set of prefix folders will also need to be included when searching for content to be preprocessed. Not supported if training with labels.
model_name (str) – An optional, user-defined name to associate with your model.
continuation_token (str) – A continuation token to restart a poller from a saved state.
An instance of an LROPoller. Call result() on the poller object to return a CustomFormModel
.
HttpResponseError – Note that if the training fails, the exception is raised, but a model with an “invalid” status is still created. You can delete this model by calling delete_model()
New in version v2.1: The model_name keyword argument
Example:
¶from azure.ai.formrecognizer import FormTrainingClient from azure.core.credentials import AzureKeyCredential endpoint = os.environ["AZURE_FORM_RECOGNIZER_ENDPOINT"] key = os.environ["AZURE_FORM_RECOGNIZER_KEY"] container_sas_url = os.environ["CONTAINER_SAS_URL_V2"] form_training_client = FormTrainingClient(endpoint, AzureKeyCredential(key)) poller = form_training_client.begin_training(container_sas_url, use_training_labels=False) model = poller.result() # Custom model information print("Model ID: {}".format(model.model_id)) print("Status: {}".format(model.status)) print("Model name: {}".format(model.model_name)) print("Training started on: {}".format(model.training_started_on)) print("Training completed on: {}".format(model.training_completed_on)) print("Recognized fields:") # Looping through the submodels, which contains the fields they were trained on for submodel in model.submodels: print("...The submodel has form type '{}'".format(submodel.form_type)) for name, field in submodel.fields.items(): print("...The model found field '{}' to have label '{}'".format( name, field.label ))close() None [source]¶
Close the FormTrainingClient
session.
Mark model for deletion. Model artifacts will be permanently removed within a predetermined period.
Parameters:model_id (str) – Model identifier.
Return type:None
Raises:HttpResponseError or ResourceNotFoundError –
Example:
¶form_training_client.delete_model(model_id=custom_model.model_id) try: form_training_client.get_custom_model(model_id=custom_model.model_id) except ResourceNotFoundError: print("Successfully deleted model with id {}".format(custom_model.model_id))get_account_properties(**kwargs: Any) AccountProperties [source]¶
Get information about the models on the form recognizer account.
Returns:Summary of models on account - custom model count, custom model limit.
Return type:Raises:Example:
¶form_training_client = FormTrainingClient(endpoint=endpoint, credential=AzureKeyCredential(key)) # First, we see how many custom models we have, and what our limit is account_properties = form_training_client.get_account_properties() print("Our account has {} custom models, and we can have at most {} custom models\n".format( account_properties.custom_model_count, account_properties.custom_model_limit ))get_copy_authorization(resource_id: str, resource_region: str, **kwargs: Any) Dict[str, str | int] [source]¶
Generate authorization for copying a custom model into the target Form Recognizer resource. This should be called by the target resource (where the model will be copied to) and the output can be passed as the target parameter into begin_copy_model()
.
A dictionary with values for the copy authorization - “modelId”, “accessToken”, “resourceId”, “resourceRegion”, and “expirationDateTimeTicks”.
Return type:Raises:Example:
¶target_client = FormTrainingClient(endpoint=target_endpoint, credential=AzureKeyCredential(target_key)) target = target_client.get_copy_authorization( resource_region=target_region, resource_id=target_resource_id ) # model ID that target client will use to access the model once copy is complete print("Model ID: {}".format(target["modelId"]))get_custom_model(model_id: str, **kwargs: Any) CustomFormModel [source]¶
Get a description of a custom model, including the types of forms it can recognize, and the fields it will extract for each form type.
Parameters:model_id (str) – Model identifier.
Returns:CustomFormModel
Return type:Raises:HttpResponseError or ResourceNotFoundError –
Example:
¶custom_model = form_training_client.get_custom_model(model_id=model.model_id) print("\nModel ID: {}".format(custom_model.model_id)) print("Status: {}".format(custom_model.status)) print("Model name: {}".format(custom_model.model_name)) print("Is this a composed model?: {}".format(custom_model.properties.is_composed_model)) print("Training started on: {}".format(custom_model.training_started_on)) print("Training completed on: {}".format(custom_model.training_completed_on))get_form_recognizer_client(**kwargs: Any) FormRecognizerClient [source]¶
Get an instance of a FormRecognizerClient from FormTrainingClient.
Return type:Returns:A FormRecognizerClient
list_custom_models(**kwargs: Any) ItemPaged[CustomFormModelInfo] [source]¶List information for each model, including model id, model status, and when it was created and last modified.
Returns:ItemPaged[CustomFormModelInfo
]
Example:
¶custom_models = form_training_client.list_custom_models() print("We have models with the following IDs:") for model_info in custom_models: print(model_info.model_id)send_request(request: HttpRequest, *, stream: bool = False, **kwargs) HttpResponse¶
Runs a network request using the client’s existing pipeline.
The request URL can be relative to the base URL. The service API version used for the request is the same as the client’s unless otherwise specified. Overriding the client’s configured API version in relative URL is supported on client with API version 2022-08-31 and later. Overriding in absolute URL supported on client with any API version. This method does not raise if the response is an error; to raise an exception, call raise_for_status() on the returned response object. For more information about how to send custom requests with this method, see https://aka.ms/azsdk/dpcodegen/python/send_request.
Parameters:request (HttpRequest) – The network request you want to make.
Keyword Arguments:stream (bool) – Whether the response payload will be streamed. Defaults to False.
Returns:The response of your network call. Does not do error handling on your response.
Return type:class azure.ai.formrecognizer.FormWord(**kwargs: Any)[source]¶Represents a word recognized from the input document.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) FormWord [source]¶Converts a dict in the shape of a FormWord to the model itself.
Parameters:data (dict) – A dictionary in the shape of FormWord.
Returns:FormWord
Return type:to_dict() Dict [source]¶Returns a dict representation of FormWord.
Returns:dict
Return type:bounding_box: List[Point]¶A list of 4 points representing the quadrilateral bounding box that outlines the text. The points are listed in clockwise order: top-left, top-right, bottom-right, bottom-left. Units are in pixels for images and inches for PDF.
confidence: float¶Measures the degree of certainty of the recognition result. Value is between [0.0, 1.0].
kind: str¶For FormWord, this is “word”.
page_number: int¶The 1-based number of the page in which this content is present.
text: str¶The text content of the word.
class azure.ai.formrecognizer.LengthUnit(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶The unit used by the width, height and bounding box properties. For images, the unit is “pixel”. For PDF, the unit is “inch”.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
INCH = 'inch'¶ PIXEL = 'pixel'¶ class azure.ai.formrecognizer.ModelBuildMode(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶The mode used when building custom models.
For more information, see https://aka.ms/azsdk/formrecognizer/buildmode.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
NEURAL = 'neural'¶ TEMPLATE = 'template'¶ class azure.ai.formrecognizer.OperationDetails(**kwargs: Any)[source]¶OperationDetails consists of information about the model operation, including the result or error of the operation if it has completed.
Note that operation information only persists for 24 hours. If the operation was successful, the model can also be accessed using the get_document_model()
, list_document_models()
, get_document_classifier()
, list_document_classifiers()
APIs.
New in version 2023-07-31: The documentClassifierBuild kind and DocumentClassifierDetails result.
classmethod from_dict(data: Dict) OperationDetails [source]¶Converts a dict in the shape of a OperationDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of OperationDetails.
Returns:OperationDetails
Return type:to_dict() Dict [source]¶Returns a dict representation of OperationDetails.
Returns:dict
Return type:api_version: str | None¶API version used to create this operation.
created_on: datetime¶Date and time (UTC) when the operation was created.
error: DocumentAnalysisError | None¶Encountered error, includes the error code, message, and details for why the operation failed.
kind: str¶“documentModelBuild”, “documentModelCompose”, “documentModelCopyTo”, “documentClassifierBuild”.
Type:Type of operation. Possible values include
last_updated_on: datetime¶Date and time (UTC) when the operation was last updated.
operation_id: str¶Operation ID.
percent_completed: int | None¶Operation progress (0-100).
resource_location: str¶URL of the resource targeted by this operation.
result: DocumentModelDetails | DocumentClassifierDetails | None¶Operation result upon success. Returns a DocumentModelDetails or DocumentClassifierDetails which contains all the information about the model.
status: str¶“notStarted”, “running”, “failed”, “succeeded”, “canceled”.
Type:Operation status. Possible values include
tags: Dict[str, str] | None¶List of user defined key-value tag attributes associated with the model.
class azure.ai.formrecognizer.OperationSummary(**kwargs: Any)[source]¶Model operation information, including the kind and status of the operation, when it was created, and more.
Note that operation information only persists for 24 hours. If the operation was successful, the model can be accessed using the get_document_model()
, list_document_models()
, get_document_classifier()
, list_document_classifiers()
APIs. To find out why an operation failed, use get_operation()
and provide the operation_id.
New in version 2023-07-31: The documentClassifierBuild kind.
classmethod from_dict(data: Dict) OperationSummary [source]¶Converts a dict in the shape of a OperationSummary to the model itself.
Parameters:data (dict) – A dictionary in the shape of OperationSummary.
Returns:OperationSummary
Return type:to_dict() Dict [source]¶Returns a dict representation of OperationSummary.
Returns:dict
Return type:api_version: str | None¶API version used to create this operation.
created_on: datetime¶Date and time (UTC) when the operation was created.
kind: str¶“documentModelBuild”, “documentModelCompose”, “documentModelCopyTo”, “documentClassifierBuild”.
Type:Type of operation. Possible values include
last_updated_on: datetime¶Date and time (UTC) when the operation was last updated.
operation_id: str¶Operation ID.
percent_completed: int | None¶Operation progress (0-100).
resource_location: str¶URL of the resource targeted by this operation.
status: str¶“notStarted”, “running”, “failed”, “succeeded”, “canceled”.
Type:Operation status. Possible values include
tags: Dict[str, str] | None¶List of user defined key-value tag attributes associated with the model.
class azure.ai.formrecognizer.Point(x: float, y: float)[source]¶The x, y coordinate of a point on a bounding box or polygon.
New in version v2.1: Support for to_dict and from_dict methods
Create new instance of Point(x, y)
count(value, /)¶Return number of occurrences of value.
classmethod from_dict(data: Dict) Point [source]¶Converts a dict in the shape of a Point to the model itself.
Parameters:data (dict) – A dictionary in the shape of Point.
Returns:Point
Return type:index(value, start=0, stop=9223372036854775807, /)¶Return first index of value.
Raises ValueError if the value is not present.
to_dict() Dict [source]¶Returns a dict representation of Point.
Returns:dict
Return type:x: float¶x-coordinate
y: float¶y-coordinate
class azure.ai.formrecognizer.QuotaDetails(**kwargs: Any)[source]¶Quota used, limit, and next reset date/time.
classmethod from_dict(data: Dict) QuotaDetails [source]¶Converts a dict in the shape of a QuotaDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of QuotaDetails.
Returns:QuotaDetails
Return type:to_dict() Dict[str, Any] [source]¶Returns a dict representation of QuotaDetails.
Returns:Dict[str, Any]
Return type:Dict[str, Any]
quota: int¶Resource quota limit.
quota_resets_on: datetime¶Date/time when the resource quota usage will be reset.
used: int¶Amount of the resource quota used.
class azure.ai.formrecognizer.RecognizedForm(**kwargs: Any)[source]¶Represents a form that has been recognized by a trained or prebuilt model. The fields property contains the form fields that were extracted from the form. Tables, text lines/words, and selection marks are extracted per page and found in the pages property.
New in version v2.1: The form_type_confidence and model_id properties, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) RecognizedForm [source]¶Converts a dict in the shape of a RecognizedForm to the model itself.
Parameters:data (dict) – A dictionary in the shape of RecognizedForm.
Returns:RecognizedForm
Return type:to_dict() Dict [source]¶Returns a dict representation of RecognizedForm.
Returns:dict
Return type:fields: Dict[str, FormField]¶A dictionary of the fields found on the form. The fields dictionary keys are the name of the field. For models trained with labels, this is the training-time label of the field. For models trained without labels, a unique name is generated for each field.
form_type: str¶The type of form the model identified the submitted form to be.
form_type_confidence: int¶Confidence of the type of form the model identified the submitted form to be.
model_id: str¶Model identifier of model used to analyze form if not using a prebuilt model.
page_range: FormPageRange¶The first and last page number of the input form.
pages: List[FormPage]¶A list of pages recognized from the input document. Contains lines, words, selection marks, tables and page metadata.
class azure.ai.formrecognizer.ResourceDetails(**kwargs: Any)[source]¶Details regarding the Form Recognizer resource.
New in version 2023-07-31: The neural_document_model_quota property.
classmethod from_dict(data: Dict) ResourceDetails [source]¶Converts a dict in the shape of a ResourceDetails to the model itself.
Parameters:data (dict) – A dictionary in the shape of ResourceDetails.
Returns:ResourceDetails
Return type:to_dict() Dict [source]¶Returns a dict representation of ResourceDetails.
Returns:dict
Return type:custom_document_models: CustomDocumentModelsDetails¶Details regarding the custom models under the Form Recognizer resource.
neural_document_model_quota: QuotaDetails | None¶Quota details regarding the custom neural document model builds under the Form Recognizer resource.
class azure.ai.formrecognizer.TextAppearance(**kwargs: Any)[source]¶An object representing the appearance of the text line.
New in version v2.1: Support for to_dict and from_dict methods
classmethod from_dict(data: Dict) TextAppearance [source]¶Converts a dict in the shape of a TextAppearance to the model itself.
Parameters:data (dict) – A dictionary in the shape of TextAppearance.
Returns:TextAppearance
Return type:to_dict() Dict [source]¶Returns a dict representation of TextAppearance.
Returns:dict
Return type:style_confidence: float¶The confidence of text line style.
style_name: str¶The text line style name. Possible values include: “other”, “handwriting”.
class azure.ai.formrecognizer.TrainingDocumentInfo(**kwargs: Any)[source]¶Report for an individual document used for training a custom model.
New in version v2.1: The model_id property, support for to_dict and from_dict methods
classmethod from_dict(data: Dict) TrainingDocumentInfo [source]¶Converts a dict in the shape of a TrainingDocumentInfo to the model itself.
Parameters:data (dict) – A dictionary in the shape of TrainingDocumentInfo.
Returns:TrainingDocumentInfo
Return type:to_dict() Dict [source]¶Returns a dict representation of TrainingDocumentInfo.
Returns:dict
Return type:errors: List[FormRecognizerError]¶List of any errors for document.
model_id: str¶The model ID that used the document to train.
name: str¶The name of the document.
page_count: int¶Total number of pages trained.
status: str¶The TrainingStatus
of the training operation. Possible values include: ‘succeeded’, ‘partiallySucceeded’, ‘failed’.
Status of the training operation.
capitalize()¶Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
casefold()¶Return a version of the string suitable for caseless comparisons.
center(width, fillchar=' ', /)¶Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
count(sub[, start[, end]]) int¶Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
encode(encoding='utf-8', errors='strict')¶Encode the string using the codec registered for encoding.
encodingThe encoding in which to encode the string.
errorsThe error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
endswith(suffix[, start[, end]]) bool¶Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
expandtabs(tabsize=8)¶Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
find(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
format(*args, **kwargs) str¶Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
format_map(mapping) str¶Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
index(sub[, start[, end]]) int¶Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
isalnum()¶Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
isalpha()¶Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
isascii()¶Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
isdecimal()¶Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
isdigit()¶Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
isidentifier()¶Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
islower()¶Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
isnumeric()¶Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
isprintable()¶Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
isspace()¶Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
istitle()¶Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
isupper()¶Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
join(iterable, /)¶Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
ljust(width, fillchar=' ', /)¶Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
lower()¶Return a copy of the string converted to lowercase.
lstrip(chars=None, /)¶Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
static maketrans()¶Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
partition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
removeprefix(prefix, /)¶Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
removesuffix(suffix, /)¶Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
replace(old, new, count=-1, /)¶Return a copy with all occurrences of substring old replaced by new.
countMaximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
rfind(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
rindex(sub[, start[, end]]) int¶Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
rjust(width, fillchar=' ', /)¶Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
rpartition(sep, /)¶Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
rsplit(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
rstrip(chars=None, /)¶Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
split(sep=None, maxsplit=-1)¶Return a list of the substrings in the string, using sep as the separator string.
sepThe separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
maxsplitMaximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
splitlines(keepends=False)¶Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
startswith(prefix[, start[, end]]) bool¶Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
strip(chars=None, /)¶Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
swapcase()¶Convert uppercase characters to lowercase and lowercase characters to uppercase.
title()¶Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
translate(table, /)¶Replace each character in the string using the given translation table.
tableTranslation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
upper()¶Return a copy of the string converted to uppercase.
zfill(width, /)¶Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
FAILED = 'failed'¶ PARTIALLY_SUCCEEDED = 'partiallySucceeded'¶ SUCCEEDED = 'succeeded'¶Subpackages¶
ncG1vNJzZmiZqqq%2Fpr%2FDpJuom6Njr627wWeaqKqVY8SqusOorqxmnprBcHDWnploqKmptbC6jpqxrqqVYq6qecWoqaaqlZi8qLrIs5yrZ5yWwaa%2F02iYs62imnuitY2fpqulopqwsLPNorGeql6dwa64