lambeq.experimental

lambeq.experimental.discocirc

class lambeq.experimental.discocirc.CoreferenceResolver[source]

Bases: ABC

Class implementing corefence resolution.

dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]][source]

Convert coreferences into a dict mapping each coreference to its first instance.

Parameters:
corefsCorefDataT

Coreferences as returned by tokenise_and_coref

Returns:
dict[tuple[int, int], tuple[int, int]]

Maps pairs of (sent index, tok index) to their first occurring coreference

abstract tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]][source]

Tokenise text and return its coreferences.

Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.

Parameters:
textstr

The text to tokenise.

Returns:
TokenisedTextT

Each sentence in text as a list of tokens

CorefDataT

Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.

class lambeq.experimental.discocirc.DisCoCircReader(parser: ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader | ~collections.abc.Callable[[], ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader] = <class 'lambeq.text2diagram.model_based_reader.bobcat_parser.BobcatParser'>, coref_resolver: ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver | ~collections.abc.Callable[[], ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver] = <class 'lambeq.experimental.discocirc.coref_resolver.MaverickCoreferenceResolver'>)[source]

Bases: Reader

A reader that converts text to a DisCoCirc diagram.

__init__(parser: ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader | ~collections.abc.Callable[[], ~lambeq.text2diagram.model_based_reader.base.ModelBasedReader] = <class 'lambeq.text2diagram.model_based_reader.bobcat_parser.BobcatParser'>, coref_resolver: ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver | ~collections.abc.Callable[[], ~lambeq.experimental.discocirc.coref_resolver.CoreferenceResolver] = <class 'lambeq.experimental.discocirc.coref_resolver.MaverickCoreferenceResolver'>) None[source]
sentence2diagram(sentence: str | List[str], tokenised: bool = False) Diagram | None[source]

Parse a sentence into a lambeq diagram.

sentences2diagrams(sentences: List[str] | List[List[str]], tokenised: bool = False) list[Diagram | None][source]

Parse multiple sentences into a list of lambeq diagrams.

text2circuit(text: str, sandwich: bool = False, break_cycles: bool = True, pruned_nouns: Iterable[str] = (), min_noun_freq: int = 1, rewrite_rules: Iterable[TreeRewriteRule | str] | None = ('determiner', 'auxiliary'), foliated_frame_labels: bool = True) Diagram[source]

Return the DisCoCirc diagram for a given text.

Parameters:
textstr

A single string that contains one or multiple sentences.

sandwichbool, default: False

If False, returns diagrams using Frames for higher-order boxes, else uses sandwiches, including one box between each subdiagram of a higher-order box.

break_cyclesbool, default: True

Whether to break any cycles present in the pregroup tree.

pruned_nounsiterable of strings, default: ()

If any of the nouns in this list are present in the diagram, the corresponding state and wire are removed from the diagram.

min_noun_freq: int, default: 1

Mininum number of times a noun needs to be referenced to appear in the circuit.

rewrite_ruleslist of TreeRewriteRule or str

List of rewrite rules to apply to the pregroup tree before conversion to a circuit.

foliated_frame_labelsbool, default: True

When sandwich is True, setting to True labels frames with numbered suffixes. False makes all sandwich layers have the same labels.

Returns:
Diagram

A DisCoCirc diagram for the given text.

class lambeq.experimental.discocirc.MaverickCoreferenceResolver(hf_name_or_path: str = 'sapienzanlp/maverick-mes-ontonotes', device: int | str | device = 'cpu')[source]

Bases: CoreferenceResolver

Corefence resolution and tokenisation based on Maverick (https://github.com/sapienzanlp/maverick-coref).

__init__(hf_name_or_path: str = 'sapienzanlp/maverick-mes-ontonotes', device: int | str | device = 'cpu')[source]
dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]]

Convert coreferences into a dict mapping each coreference to its first instance.

Parameters:
corefsCorefDataT

Coreferences as returned by tokenise_and_coref

Returns:
dict[tuple[int, int], tuple[int, int]]

Maps pairs of (sent index, tok index) to their first occurring coreference

tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]][source]

Tokenise text and return its coreferences.

Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.

Parameters:
textstr

The text to tokenise.

Returns:
TokenisedTextT

Each sentence in text as a list of tokens

CorefDataT

Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.

class lambeq.experimental.discocirc.SpacyCoreferenceResolver[source]

Bases: CoreferenceResolver

Corefence resolution and tokenisation based on spaCy.

__init__()[source]
dict_from_corefs(corefs: list[list[list[int]]]) dict[tuple[int, int], tuple[int, int]]

Convert coreferences into a dict mapping each coreference to its first instance.

Parameters:
corefsCorefDataT

Coreferences as returned by tokenise_and_coref

Returns:
dict[tuple[int, int], tuple[int, int]]

Maps pairs of (sent index, tok index) to their first occurring coreference

tokenise_and_coref(text: str) tuple[list[list[str]], list[list[list[int]]]][source]

Tokenise text and return its coreferences.

Given a text consisting of possibly multiple sentences, return the sentences split into sentences and tokens. Additionally, return coreference information indicating tokens which correspond to the same entity.

Parameters:
textstr

The text to tokenise.

Returns:
TokenisedTextT

Each sentence in text as a list of tokens

CorefDataT

Coreference information provided as a list for each coreferenced entity, consisting of a span for each sentence in text.

class lambeq.experimental.discocirc.TreeRewriteRule(match_type=False, match_words=None, max_depth=None, word_join='merge')[source]

Bases: object

General rewrite rule that merges tree nodes based on optional conditions.

__init__(match_type=False, match_words=None, max_depth=None, word_join='merge')[source]

Instantiate a general rewrite rule

edit_tree(node)[source]
rewrite(node)[source]
class lambeq.experimental.discocirc.TreeRewriter(rules: Iterable[TreeRewriteRule | str] | None = None)[source]

Bases: object

Class that rewrites a pregroup tree

Comes with a set of default rules

__call__(node)[source]

Apply the rewrite rules to the given tree.

__init__(rules: Iterable[TreeRewriteRule | str] | None = None) None[source]

initialise a rewriter

add_rules(*rules: TreeRewriteRule | str) None[source]

Add rules to this rewriter.