lambeq.experimental¶
lambeq.experimental.discocirc¶
- class lambeq.experimental.discocirc.CoreferenceResolver[source]¶
Bases:
ABC
Class implementing corefence resolution.
- dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]] [source]¶
Convert coreferences into a dict mapping each coreference to its first instance.
- Parameters:
- corefsCorefDataT
Coreferences as returned by tokenise_and_coref
- Returns:
- dict[tuple[int, int], tuple[int, int]]
Maps pairs of (sent index, tok index) to their first occurring coreference
- abstract tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]] [source]¶
Tokenise text and return its coreferences.
Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.
- Parameters:
- textstr
The text to tokenise.
- Returns:
- TokenisedTextT
Each sentence in text as a list of tokens
- CorefDataT
Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.
- class lambeq.experimental.discocirc.DisCoCircReader(parser: ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader | ~collections.abc.Callable[[], ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader] = <class 'lambeq.text2diagram.model_based_reader.bobcat_parser.BobcatParser'>, coref_resolver: ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver | ~collections.abc.Callable[[], ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver] = <class 'lambeq.experimental.discocirc.coref_resolver.MaverickCoreferenceResolver'>)[source]¶
Bases:
Reader
A reader that converts text to a DisCoCirc diagram.
- __init__(parser: ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader | ~collections.abc.Callable[[], ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader] = <class 'lambeq.text2diagram.model_based_reader.bobcat_parser.BobcatParser'>, coref_resolver: ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver | ~collections.abc.Callable[[], ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver] = <class 'lambeq.experimental.discocirc.coref_resolver.MaverickCoreferenceResolver'>) None [source]¶
- sentence2diagram(sentence: str | List[str], tokenised: bool = False) Diagram | None [source]¶
Parse a sentence into a lambeq diagram.
- sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False) list[Diagram | None] [source]¶
Parse multiple sentences into a list of lambeq diagrams.
- text2circuit(text: str, sandwich: bool = False, break_cycles: bool = True, pruned_nouns: Iterable[str] = (), min_noun_freq: int = 1, rewrite_rules: Iterable[TreeRewriteRule | str] | None = ('determiner', 'auxiliary'), foliated_frame_labels: bool = True) Diagram [source]¶
Return the DisCoCirc diagram for a given text.
- Parameters:
- textstr
A single string that contains one or multiple sentences.
- sandwichbool, default: False
If False, returns diagrams using Frames for higher-order boxes, else uses sandwiches, including one box between each subdiagram of a higher-order box.
- break_cyclesbool, default: True
Whether to break any cycles present in the pregroup tree.
- pruned_nounsiterable of strings, default: ()
If any of the nouns in this list are present in the diagram, the corresponding state and wire are removed from the diagram.
- min_noun_freq: int, default: 1
Mininum number of times a noun needs to be referenced to appear in the circuit.
- rewrite_ruleslist of TreeRewriteRule or str
List of rewrite rules to apply to the pregroup tree before conversion to a circuit.
- foliated_frame_labelsbool, default: True
When sandwich is True, setting to True labels frames with numbered suffixes. False makes all sandwich layers have the same labels.
- Returns:
- Diagram
A DisCoCirc diagram for the given text.
- class lambeq.experimental.discocirc.MaverickCoreferenceResolver(hf_name_or_path: str = 'sapienzanlp/maverick-mes-ontonotes', device: int | str | device = 'cpu')[source]¶
Bases:
CoreferenceResolver
Corefence resolution and tokenisation based on Maverick (https://github.com/sapienzanlp/maverick-coref).
- __init__(hf_name_or_path: str = 'sapienzanlp/maverick-mes-ontonotes', device: int | str | device = 'cpu')[source]¶
- dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]] ¶
Convert coreferences into a dict mapping each coreference to its first instance.
- Parameters:
- corefsCorefDataT
Coreferences as returned by tokenise_and_coref
- Returns:
- dict[tuple[int, int], tuple[int, int]]
Maps pairs of (sent index, tok index) to their first occurring coreference
- tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]] [source]¶
Tokenise text and return its coreferences.
Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.
- Parameters:
- textstr
The text to tokenise.
- Returns:
- TokenisedTextT
Each sentence in text as a list of tokens
- CorefDataT
Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.
- class lambeq.experimental.discocirc.SpacyCoreferenceResolver[source]¶
Bases:
CoreferenceResolver
Corefence resolution and tokenisation based on spaCy.
- dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]] ¶
Convert coreferences into a dict mapping each coreference to its first instance.
- Parameters:
- corefsCorefDataT
Coreferences as returned by tokenise_and_coref
- Returns:
- dict[tuple[int, int], tuple[int, int]]
Maps pairs of (sent index, tok index) to their first occurring coreference
- tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]] [source]¶
Tokenise text and return its coreferences.
Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.
- Parameters:
- textstr
The text to tokenise.
- Returns:
- TokenisedTextT
Each sentence in text as a list of tokens
- CorefDataT
Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.
- class lambeq.experimental.discocirc.TreeRewriteRule(match_type=False, match_words=None, max_depth=None, word_join='merge')[source]¶
Bases:
object
General rewrite rule that merges tree nodes based on optional conditions.
- class lambeq.experimental.discocirc.TreeRewriter(rules: Iterable[TreeRewriteRule | str] | None = None)[source]¶
Bases:
object
Class that rewrites a pregroup tree
Comes with a set of default rules
- __init__(rules: Iterable[TreeRewriteRule | str] | None = None) None [source]¶
initialise a rewriter
- add_rules(*rules: TreeRewriteRule | str) None [source]¶
Add rules to this rewriter.