lambeq.text2diagram¶
- exception lambeq.text2diagram.BobcatParseError(sentence: str)[source]¶
Bases:
Exception
- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class lambeq.text2diagram.BobcatParser(model_name_or_path: str = 'bert', root_cats: Iterable[str] | None = None, device: int = -1, cache_dir: StrPathT | None = None, force_download: bool = False, verbose: str = 'progress', **kwargs: Any)[source]¶
Bases:
CCGParser
CCG parser using Bobcat as the backend.
- __init__(model_name_or_path: str = 'bert', root_cats: Iterable[str] | None = None, device: int = -1, cache_dir: StrPathT | None = None, force_download: bool = False, verbose: str = 'progress', **kwargs: Any) None [source]¶
Instantiate a BobcatParser.
- Parameters:
- model_name_or_pathstr, default: ‘bert’
- Can be either:
The path to a directory containing a Bobcat model.
The name of a pre-trained model. By default, it uses the “bert” model. See also: BobcatParser.available_models()
- root_catsiterable of str, optional
A list of the categories allowed at the root of the parse tree.
- deviceint, default: -1
The GPU device ID on which to run the model, if positive. If negative (the default), run on the CPU.
- cache_dirstr or os.PathLike, optional
The directory to which a downloaded pre-trained model should be cached instead of the standard cache ($XDG_CACHE_HOME or ~/.cache).
- force_downloadbool, default: False
Force the model to be downloaded, even if it is already available locally.
- verbosestr, default: ‘progress’,
See
VerbosityLevel
for options.- **kwargsdict, optional
Additional keyword arguments to be passed to the underlying parsers (see Other Parameters). By default, they are set to the values in the pipeline_config.json file in the model directory.
- Other Parameters:
- Tagger parameters:
- batch_sizeint, optional
The number of sentences per batch.
- tag_top_kint, optional
The maximum number of tags to keep. If 0, keep all tags.
- tag_prob_thresholdfloat, optional
The probability multiplier used for the threshold to keep tags.
- tag_prob_threshold_strategy{‘relative’, ‘absolute’}
If “relative”, the probablity threshold is relative to the highest scoring tag. Otherwise, the probability is an absolute threshold.
- span_top_kint, optional
The maximum number of entries to keep per span. If 0, keep all entries.
- span_prob_thresholdfloat, optional
The probability multiplier used for the threshold to keep entries for a span.
- span_prob_threshold_strategy{‘relative’, ‘absolute’}
If “relative”, the probablity threshold is relative to the highest scoring entry. Otherwise, the probability is an absolute threshold.
- Chart parser parameters:
- eisner_normal_formbool, default: True
Whether to use eisner normal form.
- max_parse_treesint, optional
A safety limit to the number of parse trees that can be generated per parse before automatically failing.
- beam_sizeint, optional
The beam size to use in the chart cells.
- input_tag_score_weightfloat, optional
A scaling multiplier to the log-probabilities of the input tags. This means that a weight of 0 causes all of the input tags to have the same score.
- missing_cat_scorefloat, optional
The default score for a category that is generated but not part of the grammar.
- missing_span_scorefloat, optional
The default score for a category that is part of the grammar but has no score, due to being below the threshold kept by the tagger.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None ¶
Parse a sentence into a lambeq diagram.
- Parameters:
- sentencestr or list of str
The sentence to be parsed.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting the tree to a diagram, all noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- sentence2tree(sentence: str | List[str], tokenised: bool = False, suppress_exceptions: bool = False) CCGTree | None ¶
Parse a sentence into a
CCGTree
.- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- Returns:
- CCGTree or None
The parsed tree, or
None
on failure.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False, verbose: str | None = None) list[Diagram | None] ¶
Parse multiple sentences into a list of lambeq diagrams.
- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of
lambeq.backend.grammar.Diagram
or None The parsed diagrams. May contain
None
if exceptions are suppressed.
- list of
- sentences2trees(sentences: List[str] | List[List[str]], tokenised: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) list[CCGTree] | None [source]¶
Parse multiple sentences into a list of
CCGTree
s.- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed, passed either as strings or as lists of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of CCGTree or None
The parsed trees. (May contain
None
if exceptions are suppressed)
- exception lambeq.text2diagram.CCGBankParseError(sentence: str = '', message: str = '')[source]¶
Bases:
Exception
Error raised if parsing fails in CCGBank.
- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class lambeq.text2diagram.CCGBankParser(root: StrPathT, verbose: str = 'suppress')[source]¶
Bases:
CCGParser
A parser for CCGBank trees.
- __init__(root: StrPathT, verbose: str = 'suppress') None [source]¶
Initialise a CCGBank parser.
- Parameters:
- rootstr or os.PathLike
Path to the root of the corpus. The sections must be located in <root>/data/AUTO.
- verbosestr, default: ‘suppress’,
See
VerbosityLevel
for options.
- section2diagrams(section_id: int, planar: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) dict[str, Diagram | None] [source]¶
Parse a CCGBank section into diagrams.
- Parameters:
- section_idint
The section to parse.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- suppress_exceptionsbool, default: False
Stop exceptions from being raised, instead returning
None
for a diagram.- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.- Returns
- ——-
- diagramsdict
A dictionary of diagrams labelled by their ID in CCGBank. If a diagram fails to draw and exceptions are suppressed, that entry is replaced by
None
.
- Raises:
- CCGBankParseError
If parsing fails and exceptions are not suppressed.
- section2diagrams_gen(section_id: int, planar: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) Iterator[tuple[str, Diagram | None]] [source]¶
Parse a CCGBank section into diagrams, given as a generator.
The generator only reads data when it is accessed, providing the user with control over the reading process.
- Parameters:
- section_idint
The section to parse.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- suppress_exceptionsbool, default: False
Stop exceptions from being raised, instead returning
None
for a diagram.- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.
- Yields:
- ID, diagramtuple of str and Diagram
ID in CCGBank and the corresponding diagram. If a diagram fails to draw and exceptions are suppressed, that entry is replaced by
None
.
- Raises:
- CCGBankParseError
If parsing fails and exceptions are not suppressed.
- section2trees(section_id: int, suppress_exceptions: bool = False, verbose: str | None = None) dict[str, CCGTree | None] [source]¶
Parse a CCGBank section into trees.
- Parameters:
- section_idint
The section to parse.
- suppress_exceptionsbool, default: False
Stop exceptions from being raised, instead returning
None
for a tree.- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- treesdict
A dictionary of trees labelled by their ID in CCGBank. If a tree fails to parse and exceptions are suppressed, that entry is
None
.
- Raises:
- CCGBankParseError
If parsing fails and exceptions are not suppressed.
- section2trees_gen(section_id: int, suppress_exceptions: bool = False, verbose: str | None = None) Iterator[tuple[str, CCGTree | None]] [source]¶
Parse a CCGBank section into trees, given as a generator.
The generator only reads data when it is accessed, providing the user with control over the reading process.
- Parameters:
- section_idint
The section to parse.
- suppress_exceptionsbool, default: False
Stop exceptions from being raised, instead returning
None
for a tree.- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.
- Yields:
- ID, treetuple of str and CCGTree
ID in CCGBank and the corresponding tree. If a tree fails to parse and exceptions are suppressed, that entry is
None
.
- Raises:
- CCGBankParseError
If parsing fails and exceptions are not suppressed.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None ¶
Parse a sentence into a lambeq diagram.
- Parameters:
- sentencestr or list of str
The sentence to be parsed.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting the tree to a diagram, all noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- sentence2tree(sentence: str | List[str], tokenised: bool = False, suppress_exceptions: bool = False) CCGTree | None ¶
Parse a sentence into a
CCGTree
.- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- Returns:
- CCGTree or None
The parsed tree, or
None
on failure.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False, verbose: str | None = None) list[Diagram | None] ¶
Parse multiple sentences into a list of lambeq diagrams.
- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of
lambeq.backend.grammar.Diagram
or None The parsed diagrams. May contain
None
if exceptions are suppressed.
- list of
- sentences2trees(sentences: List[str] | List[List[str]], tokenised: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) list[CCGTree | None] [source]¶
Parse a CCGBank sentence derivation into a CCGTree.
The sentence must be in the format outlined in the CCGBank manual section D.2 and not just a list of words.
- Parameters:
- sentenceslist of str
List of sentences to parse.
- suppress_exceptionsbool, default: False
Stop exceptions from being raised, instead returning
None
for a tree.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens. For CCGBankParser, it should be kept False.
- verbosestr, optional
See
VerbosityLevel
for options. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- treeslist of CCGTree
A list of trees. If a tree fails to parse and exceptions are suppressed, that entry is
None
.
- Raises:
- CCGBankParseError
If parsing fails and exceptions are not suppressed.
- ValueError
If tokenised flag is True (not valid for CCGBankParser).
- class lambeq.text2diagram.CCGParser(root_cats: Iterable[str] | None = None, verbose: str = 'suppress')[source]¶
Bases:
Reader
Base class for CCG parsers.
- abstract __init__(root_cats: Iterable[str] | None = None, verbose: str = 'suppress') None [source]¶
Initialise the CCG parser.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None [source]¶
Parse a sentence into a lambeq diagram.
- Parameters:
- sentencestr or list of str
The sentence to be parsed.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting the tree to a diagram, all noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- sentence2tree(sentence: str | List[str], tokenised: bool = False, suppress_exceptions: bool = False) CCGTree | None [source]¶
Parse a sentence into a
CCGTree
.- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- Returns:
- CCGTree or None
The parsed tree, or
None
on failure.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False, verbose: str | None = None) list[Diagram | None] [source]¶
Parse multiple sentences into a list of lambeq diagrams.
- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of
lambeq.backend.grammar.Diagram
or None The parsed diagrams. May contain
None
if exceptions are suppressed.
- list of
- abstract sentences2trees(sentences: List[str] | List[List[str]], tokenised: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) list[CCGTree | None] [source]¶
Parse multiple sentences into a list of
CCGTree
s.- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed, passed either as strings or as lists of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of CCGTree or None
The parsed trees. May contain
None
if exceptions are suppressed.
- class lambeq.text2diagram.CCGRule(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
str
,Enum
An enumeration of the available CCG rules.
- BACKWARD_APPLICATION = 'BA'¶
- BACKWARD_COMPOSITION = 'BC'¶
- BACKWARD_CROSSED_COMPOSITION = 'BX'¶
- BACKWARD_TYPE_RAISING = 'BTR'¶
- CONJUNCTION = 'CONJ'¶
- FORWARD_APPLICATION = 'FA'¶
- FORWARD_COMPOSITION = 'FC'¶
- FORWARD_CROSSED_COMPOSITION = 'FX'¶
- FORWARD_TYPE_RAISING = 'FTR'¶
- GENERALIZED_BACKWARD_COMPOSITION = 'GBC'¶
- GENERALIZED_BACKWARD_CROSSED_COMPOSITION = 'GBX'¶
- GENERALIZED_FORWARD_COMPOSITION = 'GFC'¶
- GENERALIZED_FORWARD_CROSSED_COMPOSITION = 'GFX'¶
- LEXICAL = 'L'¶
- REMOVE_PUNCTUATION_LEFT = 'LP'¶
- REMOVE_PUNCTUATION_RIGHT = 'RP'¶
- UNARY = 'U'¶
- UNKNOWN = 'UNK'¶
- __call__(dom: Sequence[CCGType], cod: CCGType | None = None) Diagram [source]¶
Call self as a function.
- __init__(*args, **kwds)¶
- apply(dom: Sequence[CCGType], cod: CCGType | None = None) Diagram [source]¶
Produce a lambeq diagram for this rule.
This is primarily used by CCG trees that have been resolved. This means, for example, that diagrams cannot be produced for the conjunction rule, since they are rewritten when resolved.
- Parameters:
- domlist of CCGType
The domain of the diagram.
- codCCGType, optional
The codomain of the diagram. This is only used for type-raising rules.
- Returns:
lambeq.backend.grammar.Diagram
The resulting diagram.
- Raises:
- CCGRuleUseError
If a diagram cannot be produced.
- capitalize(/)¶
Return a capitalized version of the string.
More specifically, make the first character have upper case and the rest lower case.
- casefold(/)¶
Return a version of the string suitable for caseless comparisons.
- center(width, fillchar=' ', /)¶
Return a centered string of length width.
Padding is done using the specified fill character (default is a space).
- check_match(left: CCGType, right: CCGType) None [source]¶
Raise an exception if the two arguments do not match.
- count(sub[, start[, end]]) int ¶
Return the number of non-overlapping occurrences of substring sub in string S[start:end]. Optional arguments start and end are interpreted as in slice notation.
- encode(/, encoding='utf-8', errors='strict')¶
Encode the string using the codec registered for encoding.
- encoding
The encoding in which to encode the string.
- errors
The error handling scheme to use for encoding errors. The default is ‘strict’ meaning that encoding errors raise a UnicodeEncodeError. Other possible values are ‘ignore’, ‘replace’ and ‘xmlcharrefreplace’ as well as any other name registered with codecs.register_error that can handle UnicodeEncodeErrors.
- endswith(suffix[, start[, end]]) bool ¶
Return True if S ends with the specified suffix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. suffix can also be a tuple of strings to try.
- expandtabs(/, tabsize=8)¶
Return a copy where all tab characters are expanded using spaces.
If tabsize is not given, a tab size of 8 characters is assumed.
- find(sub[, start[, end]]) int ¶
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
- format(*args, **kwargs) str ¶
Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{’ and ‘}’).
- format_map(mapping) str ¶
Return a formatted version of S, using substitutions from mapping. The substitutions are identified by braces (‘{’ and ‘}’).
- index(sub[, start[, end]]) int ¶
Return the lowest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
- classmethod infer_rule(dom: Sequence[CCGType], cod: CCGType) CCGRule [source]¶
Infer the CCG rule that admits the given domain and codomain.
Return
CCGRule.UNKNOWN
if no other rule matches.- Parameters:
- domlist of CCGType
The domain of the rule.
- codCCGType
The codomain of the rule.
- Returns:
- CCGRule
A CCG rule that admits the required domain and codomain.
- isalnum(/)¶
Return True if the string is an alpha-numeric string, False otherwise.
A string is alpha-numeric if all characters in the string are alpha-numeric and there is at least one character in the string.
- isalpha(/)¶
Return True if the string is an alphabetic string, False otherwise.
A string is alphabetic if all characters in the string are alphabetic and there is at least one character in the string.
- isascii(/)¶
Return True if all characters in the string are ASCII, False otherwise.
ASCII characters have code points in the range U+0000-U+007F. Empty string is ASCII too.
- isdecimal(/)¶
Return True if the string is a decimal string, False otherwise.
A string is a decimal string if all characters in the string are decimal and there is at least one character in the string.
- isdigit(/)¶
Return True if the string is a digit string, False otherwise.
A string is a digit string if all characters in the string are digits and there is at least one character in the string.
- isidentifier(/)¶
Return True if the string is a valid Python identifier, False otherwise.
Call keyword.iskeyword(s) to test whether string s is a reserved identifier, such as “def” or “class”.
- islower(/)¶
Return True if the string is a lowercase string, False otherwise.
A string is lowercase if all cased characters in the string are lowercase and there is at least one cased character in the string.
- isnumeric(/)¶
Return True if the string is a numeric string, False otherwise.
A string is numeric if all characters in the string are numeric and there is at least one character in the string.
- isprintable(/)¶
Return True if the string is printable, False otherwise.
A string is printable if all of its characters are considered printable in repr() or if it is empty.
- isspace(/)¶
Return True if the string is a whitespace string, False otherwise.
A string is whitespace if all characters in the string are whitespace and there is at least one character in the string.
- istitle(/)¶
Return True if the string is a title-cased string, False otherwise.
In a title-cased string, upper- and title-case characters may only follow uncased characters and lowercase characters only cased ones.
- isupper(/)¶
Return True if the string is an uppercase string, False otherwise.
A string is uppercase if all cased characters in the string are uppercase and there is at least one cased character in the string.
- join(iterable, /)¶
Concatenate any number of strings.
The string whose method is called is inserted in between each given string. The result is returned as a new string.
Example: ‘.’.join([‘ab’, ‘pq’, ‘rs’]) -> ‘ab.pq.rs’
- ljust(width, fillchar=' ', /)¶
Return a left-justified string of length width.
Padding is done using the specified fill character (default is a space).
- lower(/)¶
Return a copy of the string converted to lowercase.
- lstrip(chars=None, /)¶
Return a copy of the string with leading whitespace removed.
If chars is given and not None, remove characters in chars instead.
- static maketrans(x, y=<unrepresentable>, z=<unrepresentable>, /)¶
Return a translation table usable for str.translate().
If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters to Unicode ordinals, strings or None. Character keys will be then converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to None in the result.
- partition(sep, /)¶
Partition the string into three parts using the given separator.
This will search for the separator in the string. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing the original string and two empty strings.
- removeprefix(prefix, /)¶
Return a str with the given prefix string removed if present.
If the string starts with the prefix string, return string[len(prefix):]. Otherwise, return a copy of the original string.
- removesuffix(suffix, /)¶
Return a str with the given suffix string removed if present.
If the string ends with the suffix string and that suffix is not empty, return string[:-len(suffix)]. Otherwise, return a copy of the original string.
- replace(old, new, count=-1, /)¶
Return a copy with all occurrences of substring old replaced by new.
- count
Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences.
If the optional argument count is given, only the first count occurrences are replaced.
- resolve(dom: Sequence[CCGType], cod: CCGType) tuple[CCGType, ...] [source]¶
Perform type resolution on this rule use.
This is used to propagate any type changes that has occured in the codomain to the domain, such that applying this rule to the rewritten domain produces the provided codomain, while remaining as compatible as possible with the provided domain.
- Parameters:
- domlist of CCGType
The original domain of this rule use.
- codCCGType
The required codomain of this rule use.
- Returns:
- tuple of CCGType
The rewritten domain.
- rfind(sub[, start[, end]]) int ¶
Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Return -1 on failure.
- rindex(sub[, start[, end]]) int ¶
Return the highest index in S where substring sub is found, such that sub is contained within S[start:end]. Optional arguments start and end are interpreted as in slice notation.
Raises ValueError when the substring is not found.
- rjust(width, fillchar=' ', /)¶
Return a right-justified string of length width.
Padding is done using the specified fill character (default is a space).
- rpartition(sep, /)¶
Partition the string into three parts using the given separator.
This will search for the separator in the string, starting at the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it.
If the separator is not found, returns a 3-tuple containing two empty strings and the original string.
- rsplit(/, sep=None, maxsplit=-1)¶
Return a list of the substrings in the string, using sep as the separator string.
- sep
The separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
- maxsplit
Maximum number of splits. -1 (the default value) means no limit.
Splitting starts at the end of the string and works to the front.
- rstrip(chars=None, /)¶
Return a copy of the string with trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
- split(/, sep=None, maxsplit=-1)¶
Return a list of the substrings in the string, using sep as the separator string.
- sep
The separator used to split the string.
When set to None (the default value), will split on any whitespace character (including n r t f and spaces) and will discard empty strings from the result.
- maxsplit
Maximum number of splits. -1 (the default value) means no limit.
Splitting starts at the front of the string and works to the end.
Note, str.split() is mainly useful for data that has been intentionally delimited. With natural text that includes punctuation, consider using the regular expression module.
- splitlines(/, keepends=False)¶
Return a list of the lines in the string, breaking at line boundaries.
Line breaks are not included in the resulting list unless keepends is given and true.
- startswith(prefix[, start[, end]]) bool ¶
Return True if S starts with the specified prefix, False otherwise. With optional start, test S beginning at that position. With optional end, stop comparing S at that position. prefix can also be a tuple of strings to try.
- strip(chars=None, /)¶
Return a copy of the string with leading and trailing whitespace removed.
If chars is given and not None, remove characters in chars instead.
- swapcase(/)¶
Convert uppercase characters to lowercase and lowercase characters to uppercase.
- property symbol: str¶
The standard CCG symbol for the rule.
- title(/)¶
Return a version of the string where each word is titlecased.
More specifically, words start with uppercased characters and all remaining cased characters have lower case.
- translate(table, /)¶
Replace each character in the string using the given translation table.
- table
Translation table, which must be a mapping of Unicode ordinals to Unicode ordinals, strings, or None.
The table must implement lookup/indexing via __getitem__, for instance a dictionary or list. If this operation raises LookupError, the character is left untouched. Characters mapped to None are deleted.
- upper(/)¶
Return a copy of the string converted to uppercase.
- zfill(width, /)¶
Pad a numeric string with zeros on the left, to fill a field of the given width.
The string is never truncated.
- exception lambeq.text2diagram.CCGRuleUseError(rule: CCGRule, message: str)[source]¶
Bases:
Exception
Error raised when a
CCGRule
is applied incorrectly.- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class lambeq.text2diagram.CCGTree(text: str | None = None, *, rule: CCGRule | str = CCGRule.UNKNOWN, biclosed_type: CCGType, children: Iterable[CCGTree] | None = None, metadata: dict[Any, Any] | None = None)[source]¶
Bases:
object
Derivation tree for a CCG.
This provides a standard derivation interface between the parser and the rest of the model.
- __init__(text: str | None = None, *, rule: CCGRule | str = CCGRule.UNKNOWN, biclosed_type: CCGType, children: Iterable[CCGTree] | None = None, metadata: dict[Any, Any] | None = None) None [source]¶
Initialise a CCG tree.
- Parameters:
- textstr, optional
The word or phrase associated to the whole tree. If
None
, it is inferred from its children.- ruleCCGRule, default: CCGRule.UNKNOWN
The final
CCGRule
used in the derivation.- biclosed_typeCCGType
The type associated to the derived phrase.
- childrenlist of CCGTree, optional
A list of JSON subtrees. The types of these subtrees can be combined with the
rule
to produce the outputtype
. A leaf node has an empty list of children.- metadatadict, optional
A dictionary of miscellaneous data.
- collapse_noun_phrases() CCGTree [source]¶
Change noun phrase types into noun types.
This includes sub-types, e.g. S/NP becomes S/N.
- deriv(word_spacing: int = 2, use_slashes: bool = True, use_ascii: bool = False, vertical: bool = False) str [source]¶
Produce a string representation of the tree.
- Parameters:
- word_spacingint, default: 2
The minimum number of spaces between the words of the diagram. Only used for horizontal diagrams.
- use_slashes: bool, default: True
Whether to use slashes in the CCG types instead of arrows. Automatically set to True when use_ascii is True.
- use_ascii: bool, default: False
Whether to draw using ASCII characters only.
- vertical: bool, default: False
Whether to create a vertical tree representation, instead of the standard horizontal one.
- Returns:
- str
A string that contains the graphical representation of the CCG tree.
- classmethod from_json(data: None) None [source]¶
- classmethod from_json(data: Dict[str, Any] | str) CCGTree
Create a
CCGTree
from a JSON representation.A JSON representation of a derivation contains the following fields:
- text
str
orNone
The word or phrase associated to the whole tree. If
None
, it is inferred from its children.- rule
CCGRule
The final
CCGRule
used in the derivation.- type
CCGType
The type associated to the derived phrase.
- children
list
orNone
A list of JSON subtrees. The types of these subtrees can be combined with the
rule
to produce the outputtype
. A leaf node has an empty list of children.
- text
- property text: str¶
The word or phrase associated to the tree.
- to_diagram(planar: bool = False, collapse_noun_phrases: bool = True) Diagram [source]¶
Convert tree to a DisCoCat diagram.
- Parameters:
- planarbool, default: False
Force the diagram to be planar. This only affects trees using cross composition.
- without_trivial_unary_rules() CCGTree [source]¶
Create a new CCGTree from the current tree, with all trivial unary rules (i.e. rules that map X to X) removed.
This might happen because there is no exact correspondence between CCG types and pregroup types, e.g. both CCG types NP and N are mapped to the same pregroup type n.
- Returns:
lambeq.text2diagram.CCGTree
A new tree free of trivial unary rules.
- class lambeq.text2diagram.CCGType(name: str | None = None, result: CCGType | None = None, direction: str | None = None, argument: CCGType | None = None)[source]¶
Bases:
object
A type in the Combinatory Categorical Grammar (CCG).
- Attributes:
name
strThe name of an atomic CCG type.
result
CCGTypeThe result of a complex CCG type.
direction
‘/’ or ‘'The direction of a complex CCG type.
argument
CCGTypeThe argument of a complex CCG type.
- is_emptybool
Whether the CCG type is the empty type.
- is_atomicbool
Whether the CCG type is an atomic type.
- is_complexbool
Whether the CCG type is a complex type.
- is_overbool
Whether the argument of a complex CCG type appears on the right, i.e. X/Y.
- is_underbool
Whether the argument of a complex CCG type appears on the left, i.e. XY.
- CONJ_TAG: ClassVar[str] = '[conj]'¶
- __init__(name: str | None = None, result: CCGType | None = None, direction: str | None = None, argument: CCGType | None = None) None [source]¶
Initialise a CCG type.
- Parameters:
- namestr, optional
(Atomic types only) The name of an atomic CCG type.
- resultCCGType, optional
(Complex types only) The result of a complex CCG type.
- direction{ ‘/’, ‘' }, optional
(Complex types only) The direction of a complex CCG type.
- argumentCCGType, optional
(Complex types only) The argument of a complex CCG type.
- property argument: CCGType¶
The argument of a complex CCG type.
Raises an error if called on a non-complex CCG type.
- property direction: str¶
The direction of a complex CCG type.
Raises an error if called on a non-complex CCG type.
- is_atomic: bool¶
- is_complex: bool¶
- property is_conjoinable: bool¶
Whether the CCG type can be used to conjoin words.
- is_empty: bool¶
- is_over: bool¶
- is_under: bool¶
- property left: CCGType¶
The left-hand side (diagrammatically) of a complex CCG type.
Raises an error if called on a non-complex CCG type.
- property name: str¶
The name of an atomic CCG type.
Raises an error if called on a non-atomic CCG type.
- classmethod parse(cat: str, map_atomic: Callable[[str], str] | None = None) CCGType [source]¶
Parse a CCG category string into a CCGType.
The string should follow the following grammar:
atomic_cat = { <any character except "(", ")", "/", "\"> } op = "/" | "\" bracket_cat = atomic_cat | "(" bracket_cat [ op bracket_cat ] ")" cat = bracketed_cat [ op bracket_cat ] [ "[conj]" ]
- Parameters:
- map_atomic: callable, optional
If provided, this function is called on the atomic type names in the original string, and should return their name in the output CCGType. This can be used to fix any inconsistencies in capitalisation or unify types, such as noun and noun phrase types.
- Returns:
- CCGType
The parsed category as a CCGType.
- Raises:
- CCGParseError
If parsing fails.
Notes
Conjunctions follow the CCGBank convention of:
x and y C conj C \ \ / \ C[conj] \ / C
thus
C[conj]
is equivalent toC\C
.
- replace(original: CCGType, replacement: CCGType) CCGType [source]¶
Replace all occurrences of a sub-type with a different type.
- replace_result(original: CCGType, replacement: CCGType, direction: str = '|') tuple[CCGType, CCGType | None] [source]¶
Replace the innermost category result with a new category.
This performs a lenient replacement operation. This means that it will attempt to replace the specified result category original with replacement, but if original cannot be found, the innermost result category will be replaced (still by replacement). This makes it suitable for cases where type resolution has occurred, so that type rewrites can propagate. This method returns the new category, alongside which category has been replaced. direction can be used to specify a particular structure that must be satisfied by the replacement operation. If this is not satisfied, then no replacement takes place, and the returned replaced result category is None.
- Parameters:
- originalCCGType
The category that should be replaced.
- replacementCCGType
The replacement for the new category.
- directionstr
Used to check the operations in the category. Consists of either 1 or 2 characters, each being one of ‘/’, ‘', ‘|’. If 2 characters, the first checks the innermost operation, and the second checks the rest. If only 1 character, it is used for all checks.
- Returns:
- CCGType
The new category. If replacement fails, this is set to the original category.
- CCGType or None
The replaced result category. If replacement fails, this is set to None.
Notes
This function is mainly used for substituting inner types in generalised versions of CCG rules. (See
infer_rule()
)Examples
>>> a, b, c, x, y = map(CCGType, 'abcxy')
Example 1:
b >> c
ina >> (b >> c)
is matched and replaced withx
.>>> new, replaced = (a >> (b >> c)).replace_result(b >> c, x) >>> print(new, replaced) x\a c\b
Example 2:
x
cannot be matched, so the innermost categoryc
is replaced instead.>>> new, replaced = (a >> (b >> c)).replace_result(x, x << y) >>> print(new, replaced) ((x/y)\b)\a c
Example 3: if not all operators are
<<
, then nothing is replaced.>>> new, replaced = (a >> (c << b)).replace_result(x, y, '/') >>> print(new, replaced) (c/b)\a None
Example 4: the innermost use of
<<
is onc
andb
, so the targetc
is replaced withy
.>>> new, replaced = (a >> (c << b)).replace_result(x, y, '/|') >>> print(new, replaced) (y/b)\a c
Example 5: the innermost use of
>>
is ona
and(c << b)
, so its target(c << b)
is replaced byy
.>>> new, replaced = (a >> (c << b)).replace_result(x, y, r'\|') >>> print(new, replaced) y\a c/b
- property result: CCGType¶
The result of a complex CCG type.
Raises an error if called on a non-complex CCG type.
- property right: CCGType¶
The right-hand side (diagrammatically) of a complex CCG type.
Raises an error if called on a non-complex CCG type.
- split(base: CCGType) tuple[grammar.Ty, grammar.Ty, grammar.Ty] [source]¶
Isolate the inner type of a CCG type, in lambeq.
For example, if the input is T = (XY)/Z, the lambeq type would be Y.r @ X @ Z.l so:
>>> T = CCGType.parse(r'(X\Y)/Z') >>> left, mid, right = T.split(CCGType('X')) >>> print(left, mid, right, sep=' + ') Y.r + X + Z.l
>>> left, mid, right = T.split(CCGType.parse(r'X\Y')) >>> print(left, mid, right, sep=' + ') Ty() + Y.r @ X + Z.l
- to_grammar(Ty: type | None = None) grammar.Ty | Any [source]¶
Turn the CCG type into a lambeq grammar type.
- exception lambeq.text2diagram.DepCCGParseError(sentence: str)[source]¶
Bases:
Exception
- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class lambeq.text2diagram.DepCCGParser(*, lang: str = 'en', model: str | None = None, use_model_unary_rules: bool = False, annotator: str = 'janome', tokenize: bool | None = None, device: int = -1, root_cats: Iterable[str] | None = None, verbose: str = 'progress', **kwargs: Any)[source]¶
Bases:
CCGParser
CCG parser using depccg as the backend.
- __init__(*, lang: str = 'en', model: str | None = None, use_model_unary_rules: bool = False, annotator: str = 'janome', tokenize: bool | None = None, device: int = -1, root_cats: Iterable[str] | None = None, verbose: str = 'progress', **kwargs: Any) None [source]¶
Instantiate a parser based on depccg.
- Parameters:
- lang{ ‘en’, ‘ja’ }
The language to use: ‘en’ for English, ‘ja’ for Japanese.
- modelstr, optional
The name of the model variant to use, if any. depccg only has English model variants, namely ‘elmo’, ‘rebank’ and ‘elmo_rebank’.
- use_model_unary_rulesbool, default: False
Use the unary rules supplied by the model instead of the ones by lambeq.
- annotatorstr, default: ‘janome’
The annotator to use, if any. depccg supports ‘candc’ and ‘spacy’ for English, and ‘janome’ and ‘jigg’ for Japanese. By default, no annotator is used for English, and ‘janome’ is used for Japanese.
- tokenizebool, optional
Whether to tokenise the input when annotating. This option should only be specified when using the ‘spacy’ annotator.
- deviceint, optional
The ID of the GPU to use. By default, uses the CPU.
- root_catsiterable of str, optional
A list of categories allowed at the root of the parse. By default, the English categories are:
S[dcl]
S[wq]
S[q]
S[qem]
NP
- and the Japanese categories are:
NP[case=nc,mod=nm,fin=f]
NP[case=nc,mod=nm,fin=t]
S[mod=nm,form=attr,fin=t]
S[mod=nm,form=base,fin=f]
S[mod=nm,form=base,fin=t]
S[mod=nm,form=cont,fin=f]
S[mod=nm,form=cont,fin=t]
S[mod=nm,form=da,fin=f]
S[mod=nm,form=da,fin=t]
S[mod=nm,form=hyp,fin=t]
S[mod=nm,form=imp,fin=f]
S[mod=nm,form=imp,fin=t]
S[mod=nm,form=r,fin=t]
S[mod=nm,form=s,fin=t]
S[mod=nm,form=stem,fin=f]
S[mod=nm,form=stem,fin=t]
- verbosestr, default: ‘progress’,
Controls the command-line output of the parser. Only ‘progress’ option is available for this parser.
- **kwargsdict, optional
Optional arguments passed to depccg.
- sentence2diagram(sentence: SentenceType, tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None [source]¶
Parse a sentence into a lambeq diagram.
- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, all noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- Raises:
- ValueErrorIf tokenised does not match with the input type.
- sentence2tree(sentence: str | List[str], tokenised: bool = False, suppress_exceptions: bool = False) CCGTree | None [source]¶
Parse a sentence into a
CCGTree
.- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- Returns:
- CCGTree or None
The parsed tree, or
None
on failure.
- Raises:
- ValueErrorIf tokenised does not match with the input type.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False, verbose: str | None = None) list[Diagram | None] ¶
Parse multiple sentences into a list of lambeq diagrams.
- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of
lambeq.backend.grammar.Diagram
or None The parsed diagrams. May contain
None
if exceptions are suppressed.
- list of
- sentences2trees(sentences: List[str] | List[List[str]], tokenised: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) list[CCGTree | None] [source]¶
Parse multiple sentences into a list of
CCGTree
s.- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed, passed either as strings or as lists of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- verbosestr, optional
Controls the form of progress tracking. If set, takes priority over the
verbose
attribute of the parser. This class only supports ‘progress’ verbosity level - a progress bar.
- Returns:
- list of CCGTree or None
The parsed trees. May contain
None
if exceptions are suppressed.
- Raises:
- ValueErrorIf tokenised does not match with the input type
- or if verbosity is set to an unsupported value
- class lambeq.text2diagram.LinearReader(combining_diagram: Diagram, word_type: Ty = Ty(s), start_box: Diagram = Id(Ty()))[source]¶
Bases:
Reader
A reader that combines words linearly using a stair diagram.
- __init__(combining_diagram: Diagram, word_type: Ty = Ty(s), start_box: Diagram = Id(Ty())) None [source]¶
Initialise a linear reader.
- Parameters:
- combining_diagramDiagram
The diagram that is used to combine two word boxes. It is continuously applied on the left-most wires until a single output wire remains.
- word_typeTy, default: core.types.AtomicType.SENTENCE
The type of each word box. By default, it uses the sentence type from
core.types.AtomicType
.- start_boxDiagram, default: Id()
The start box used as a sentinel value for combining. By default, the empty diagram is used.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False) Diagram [source]¶
Parse a sentence into a lambeq diagram.
If tokenise is
True
, sentence is tokenised, otherwise it is split into tokens by whitespace. This method creates a box for each token, and combines them linearly.- Parameters:
- sentencestr or list of str
The input sentence, passed either as a string or as a list of tokens.
- tokenisedbool, default: False
Set to
True
, if the sentence is passed as a list of tokens instead of a single string. If set toFalse
, words are split by whitespace.
- Raises:
- ValueError
If sentence does not match tokenised flag, or if an invalid mode or parser is passed to the initialiser.
- class lambeq.text2diagram.Reader[source]¶
Bases:
ABC
Base class for readers and parsers.
- class lambeq.text2diagram.TreeReader(ccg_parser: ~lambeq.text2diagram.ccg_parser.CCGParser | ~collections.abc.Callable[[], ~lambeq.text2diagram.ccg_parser.CCGParser] = <class 'lambeq.text2diagram.bobcat_parser.BobcatParser'>, mode: ~lambeq.text2diagram.tree_reader.TreeReaderMode = TreeReaderMode.NO_TYPE, word_type: ~lambeq.backend.grammar.Ty = Ty(s))[source]¶
Bases:
Reader
A reader that combines words according to a parse tree.
- __init__(ccg_parser: ~lambeq.text2diagram.ccg_parser.CCGParser | ~collections.abc.Callable[[], ~lambeq.text2diagram.ccg_parser.CCGParser] = <class 'lambeq.text2diagram.bobcat_parser.BobcatParser'>, mode: ~lambeq.text2diagram.tree_reader.TreeReaderMode = TreeReaderMode.NO_TYPE, word_type: ~lambeq.backend.grammar.Ty = Ty(s)) None [source]¶
Initialise a tree reader.
- Parameters:
- ccg_parserCCGParser or callable, default: BobcatParser
A
CCGParser
object or a function that returns it. The parse tree produced by the parser is used to generate the tree diagram.- modeTreeReaderMode, default: TreeReaderMode.NO_TYPE
Determines what boxes are used to combine the tree. See
TreeReaderMode
for options.- word_typeTy, default: core.types.AtomicType.SENTENCE
The type of each word box. By default, it uses the sentence type from
core.types.AtomicType
.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None [source]¶
Parse a sentence into a lambeq diagram.
This produces a tree-shaped diagram based on the output of the CCG parser.
- Parameters:
- sentencestr or list of str
The sentence to be parsed.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False) list[Diagram | None] ¶
Parse multiple sentences into a list of lambeq diagrams.
- static tree2diagram(tree: CCGTree, mode: TreeReaderMode = TreeReaderMode.NO_TYPE, word_type: Ty = Ty(s), suppress_exceptions: bool = False) Diagram | None [source]¶
Convert a
CCGTree
into aDiagram
.This produces a tree-shaped diagram based on the output of the CCG parser.
- Parameters:
- tree
CCGTree
The CCG tree to be converted.
- modeTreeReaderMode, default: TreeReaderMode.NO_TYPE
Determines what boxes are used to combine the tree. See
TreeReaderMode
for options.- word_typeTy, default: core.types.AtomicType.SENTENCE
The type of each word box. By default, it uses the sentence type from
core.types.AtomicType
.- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.
- tree
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- class lambeq.text2diagram.TreeReaderMode(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]¶
Bases:
Enum
An enumeration for
TreeReader
.The words in the tree diagram can be combined using 3 modes:
- NO_TYPE¶
The ‘no type’ mode names every rule box
UNIBOX
.- RULE_ONLY¶
The ‘rule name’ mode names every rule box based on the name of the original CCG rule. For example, for the forward application rule
FA(N << N)
, the rule box will be namedFA
.- RULE_TYPE¶
The ‘rule type’ mode names every rule box based on the name and type of the original CCG rule. For example, for the forward application rule
FA(N << N)
, the rule box will be namedFA(N << N)
.- HEIGHT¶
The ‘height’ mode names every rule box based on the tree height of its subtree. For example, a rule box directly combining two words will be named
layer_1
.
- HEIGHT = 3¶
- NO_TYPE = 0¶
- RULE_ONLY = 1¶
- RULE_TYPE = 2¶
- exception lambeq.text2diagram.WebParseError(sentence: str)[source]¶
Bases:
OSError
- add_note()¶
Exception.add_note(note) – add a note to the exception
- args¶
- characters_written¶
- errno¶
POSIX exception code
- filename¶
exception filename
- filename2¶
second exception filename
- strerror¶
exception strerror
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class lambeq.text2diagram.WebParser(parser: str = 'depccg', verbose: str = 'suppress')[source]¶
Bases:
CCGParser
Wrapper that allows passing parser queries to an online service.
- __init__(parser: str = 'depccg', verbose: str = 'suppress') None [source]¶
Initialise a web parser.
- Parameters:
- parserstr, optional
The web parser to use. By default, this is depccg parser.
- verbosestr, default: ‘suppress’,
See
VerbosityLevel
for options.
- sentence2diagram(sentence: str | List[str], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False) Diagram | None ¶
Parse a sentence into a lambeq diagram.
- Parameters:
- sentencestr or list of str
The sentence to be parsed.
- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting the tree to a diagram, all noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.
- Returns:
lambeq.backend.grammar.Diagram
or NoneThe parsed diagram, or
None
on failure.
- sentence2tree(sentence: str | List[str], tokenised: bool = False, suppress_exceptions: bool = False) CCGTree | None ¶
Parse a sentence into a
CCGTree
.- Parameters:
- sentencestr, list[str]
The sentence to be parsed, passed either as a string, or as a list of tokens.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if the sentence fails to parse, instead of raising an exception, returnsNone
.- tokenisedbool, default: False
Whether the sentence has been passed as a list of tokens.
- Returns:
- CCGTree or None
The parsed tree, or
None
on failure.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False, planar: bool = False, collapse_noun_phrases: bool = True, suppress_exceptions: bool = False, verbose: str | None = None) list[Diagram | None] ¶
Parse multiple sentences into a list of lambeq diagrams.
- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- tokenisedbool, default: False
Whether each sentence has been passed as a list of tokens.
- planarbool, default: False
Force diagrams to be planar when they contain crossed composition.
- collapse_noun_phrasesbool, default: True
If set, then before converting each tree to a diagram, any noun phrase types in the tree are changed into nouns. This includes sub-types, e.g. S/NP becomes S/N.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. Not all parsers implement all three levels of progress reporting, see the respective documentation for each parser. If set, takes priority over theverbose
attribute of the parser.
- Returns:
- list of
lambeq.backend.grammar.Diagram
or None The parsed diagrams. May contain
None
if exceptions are suppressed.
- list of
- sentences2trees(sentences: List[str] | List[List[str]], tokenised: bool = False, suppress_exceptions: bool = False, verbose: str | None = None) list[CCGTree | None] [source]¶
Parse multiple sentences into a list of
CCGTree
s.- Parameters:
- sentenceslist of str, or list of list of str
The sentences to be parsed.
- suppress_exceptionsbool, default: False
Whether to suppress exceptions. If
True
, then if a sentence fails to parse, instead of raising an exception, its return entry isNone
.- verbosestr, optional
See
VerbosityLevel
for options. If set, it takes priority over theverbose
attribute of the parser.
- Returns:
- list of
CCGTree
or None The parsed trees. May contain
None
if exceptions are suppressed.
- list of
- Raises:
- URLError
If the service URL is not well formed.
- ValueError
If a sentence is blank or type of the sentence does not match tokenised flag.
- WebParseError
If the parser fails to obtain a parse tree from the server.
- lambeq.text2diagram.cups_reader = <lambeq.text2diagram.linear_reader.LinearReader object>¶
A reader that combines words linearly using a stair diagram.
- lambeq.text2diagram.spiders_reader = <lambeq.text2diagram.spiders_reader.SpidersReader object>¶
A reader that combines words using a spider.
- lambeq.text2diagram.stairs_reader = <lambeq.text2diagram.linear_reader.LinearReader object>¶
A reader that combines words linearly using a stair diagram.
- lambeq.text2diagram.word_sequence_reader = <lambeq.text2diagram.linear_reader.LinearReader object>¶
A reader that combines words linearly using a stair diagram.
- lambeq.text2diagram.bag_of_words_reader = <lambeq.text2diagram.spiders_reader.SpidersReader object>¶
A reader that combines words using a spider.