tarexp.experiments module#

TAR inherits both the large topic-to-topic variability of IR tasks, and the strong dependence on initial conditions and random seeds of active learning processes. Multiple collections, topics, and runs are necessary to reliably demonstrate that one approach dominates another.

We support spawning multiple processes both across machines on a network, and in multiple threads on appropriate hardware. The method .run dispatches the TAR tasks with runtime settings. In the above example, experiments will run on the first of the two machines with two processes on each, resulting in all four tasks being run simultaneously. The .run method returns the per-round metric values of all the experiment tasks running on the node.

class tarexp.experiments.Action(should_save: bool = False, should_stop: bool = False)[source]#

Bases: object

Action controler for feedback functions in experiments.

Feedback functions recieve an Action class that each attribute modifiable for suggesting the experiment running function to perform certain operation.

should_save: bool = False#: Suggest the workflow to save a checkpoint at this round.

should_stop: bool = False#: Suggest the workflow to stop after this round.

class tarexp.experiments.Experiment(output_path: Path | str, random_seed: int = None, metrics: List[ir_measures.measures.Measure | OptimisticCost | str] = None, saved_score_limit: int = -1, saved_checkpoint_limit: int = 2, max_round_exec: int = -1)[source]#

Bases: object

Experiment is implemented as a Python Dataclass, which contains information for experiments but not the states, which are determined by the checkpoint of each individual run on disk. The attributes (or the properties of the class) are designed to be static configurations of the experiments. Runtime configurations are passed as arguments to the run() method.

All experiments inheriting this class should implement the static method exec() which describes how a run should execute and generateSettings() which generates a list of experimenting settings that will be yielded as TAR runs. The configuration would be passed as the first arugment (or keyed as setting) to this static method.

All downstream classes should also register any additional arguments through registerExecArguments() to ensure these configurations are passed into the exec() static method.

output_path: Path | str#: Output directory of the experiment.

random_seed: int = None#: Random seed used in the experiment and all experiment runs.

metrics: List[ir_measures.measures.Measure | OptimisticCost | str] = None#: Evaluation metrics that would be calculated and stored at each round of the runs.

saved_score_limit: int = -1#: Number of rounds the document scores each workflow would be storing. Negative indicates no limitation.

saved_checkpoint_limit: int = 2#: Maximum number of checkpoints would be stored on disk. Older checkpoints would be deleted silently.

max_round_exec: int = -1#: Maximum number of rounds each run would be allowed to execute.

on(event: str, func: callable)[source]#

Register a callback function tied to a certain event. List of events available for each experiment is stored in available_events.

Parameters:

event – Name of of the event the callback function would be invoked.
func – Callback funtion that will take in an Action instance and the tarexp.workflow.Workflow instance as arguments.

registerExecArguments(kwargs: Dict[str, Any])[source]#

Parameters:: kwargs – Dictionary of arguments where the keys and values are the name and values of the argument.

property available_events#: List of available events of this experiment.

property savable_fields#: List of attributes that would be saved in the experiment directory.

run(disable_tqdm=False, n_processes=1, n_nodes=1, node_id=0, resume=False, dump_frequency=10, **runtime_kwargs)[source]#

Execute all experiment runs.

This method natively support multiprocessing on a single machine and parallel processing on multiple machines. However, when running on multiple machines, user needs to manually invoke this method with proper n_nodes and node_id values in order to let each machine execute their own parts of experiments.

Arguments of this method are designed to be runtime configurations, including ones that will eventually be passed to the underlying workflow instances.

Parameters:

disable_tqdm – Whether to show the progress bar. Default is False (showing the bar).
n_process – Number of processers to use on this machine.
n_nodes – Number of machines running this set of experiments in total.
node_id – The index of this machine among all machines starting from 0. This value should be < n_nodes.
resume – Whether to resume from existing runs. Default is True. This value will be passed to the workflow and expect the workflow to respect it.
dump_frequency – The frequency of the workflow to save a checkpoint on disk.
**runtime-kwargs – Other runtime arguments to pass to the workflow instance.

generateSettings() → iter_with_length | list[source]#

Generate a list experimenting settings.

Any experiment class that inherits this class should implement its own version since each experiment should define its own set of attributes that could be experimenting with (or more complex ways to generate sensible combinations). The rest of the attributes should be static.

static exec(setting: dict, run_path: Path, **kwargs) → List[Dict[MeasureKey, int | float]][source]#

Executing a TAR run for experiment.

Any experiment class that inherits this class should implement its own version of this method. This method describe how an experiment of the workflow will be executed, including when the metrics are calcuated and when the callback functions are invoked.

Note

This method can be confused with the tarexp.workflow.Workflow.step() method. However, exec focuses on the experimenting side which should not modify the behavior of a workflow.

class tarexp.experiments.TARExperiment(output_path: Path | str, random_seed: int = None, metrics: List[ir_measures.measures.Measure | OptimisticCost | str] = None, saved_score_limit: int = -1, saved_checkpoint_limit: int = 2, max_round_exec: int = -1, tasks: Dataset | List[Dataset] | TaskFeeder = None, components: Component | List[Component] | iter_with_length = None, workflow: Workflow = <class 'tarexp.workflow.OnePhaseTARWorkflow'>, batch_size: int | List[int] = 200, control_set_size: int | List[int] = 0, workflow_kwargs: dict = None, repeat_tasks: int = 1, seed_docs_construction: Tuple[int, int] = (1, 0))[source]#

Bases: Experiment

An experiment that executes a TAR workflow.

This experiment class is designed to be compatible with any workflow that inherits tarexp.workflow.Workflow class.

Please refer to Experiment for properties inherited from it.

Important

Any attribute that marked as experimentable will contribute to the process of generating the combination of the experiment runs. For example, if 2 tasks, 3 sets of components, 3 batch sizes, and 10 control set sizes are provided, generateSettings() will yield 2x3x3x10 = 180 runs.

tasks: Dataset | List[Dataset] | TaskFeeder = None#: Experimentable (A list of) dataset instance or tarexp.dataset.TaskFeeder class. Each task is considered as an independent TAR review project which consists of a collection and a set of gold labels.

components: Component | List[Component] | iter_with_length = None#: Experimentable (A list of) combined components for experiments.

workflow[source]#

Workflow that will be used for experiment.

alias of OnePhaseTARWorkflow

batch_size: int | List[int] = 200#: Experimentable (A list of) batch sizes for the TAR workflow.

control_set_size: int | List[int] = 0#: Experimentable (A list of) control set sizes for TAR workflow.

workflow_kwargs: dict = None#: Experimentable Other experimenting arguments. If the value of the pair in the dictionary is a list, it will be considered as an experimentable arguments and also contributes to the combination.

repeat_tasks: int = 1#: Experimentable Number of times each task will be replicated with different random seed.

seed_docs_construction: Tuple[int, int] = (1, 0)#: Number of positive and negative seed documents will be generated for each run.

property available_events#: List of available events of this experiment.

generateSettings()[source]#

Generate an iterator that yields experiment settings in dictionaries.

In this experiments, tasks, components, repeat_tasks, batch_size, control_set_size, and any other values in workflow_kwargs will be contributed to the combination.

static exec(exp_setting: dict, run_path: Path, workflow_cls: Workflow, resume: bool, metrics: list, dump_frequency: int, callbacks: dict, **static_setting) → List[Dict[MeasureKey, int | float]][source]#: Execute a run of experiment using the workflow specified in workflow. This method is used internally by the dispatcher invoked by tarexp.experiments.Experiment.run().

class tarexp.experiments.StoppingExperimentOnReplay(output_path: Path | str, random_seed: int = None, metrics: List[ir_measures.measures.Measure | OptimisticCost | str] = None, saved_score_limit: int = -1, saved_checkpoint_limit: int = 2, max_round_exec: int = -1, saved_exp_path: Path | str = None, tasks: Dataset | List[Dataset] | TaskFeeder = None, replay: WorkflowReplay = None, stopping_rules: List[StoppingRule] = None, exp_early_stopping: bool = True)[source]#

Bases: Experiment

Stopping Rule experiments on existing TAR runs using tarexp.workflow.WorkflowReplay.

This experiment invokes different stopping rules at each replay round to test whether the rule suggests stopping. All runs in the provided TAR experiment directory will be tested.

Important

Since stopping rules are only tested on the replays, any rule that intervenes with the TAR process (such as changing the sampling of the documents for estimating progress) cannot be tested with this experiment. User should execute individual TAR runs by using TARExperiment to test those stopping rules.

saved_exp_path: Path | str = None#: Path to the directory of the TAR runs.

tasks: Dataset | List[Dataset] | TaskFeeder = None#: List of dataset instance or TaskFeeder that provides the datasets for replay experiments.

replay: WorkflowReplay = None#: The replay workflow class that corresponds to the workflow used to generate the TAR runs provided in saved_exp_path.

stopping_rules: List[StoppingRule] = None#: List of stopping rules that will be tested on the replay workflow.

exp_early_stopping: bool = True#: Whether to early stop the replay if all stopping rule tested have already suggested stopping.

property available_events#: List of available events of this experiment.

generateSettings()[source]#

Generate a list experimenting settings.

exec(run_path: Path, replay_cls: WorkflowReplay, stopping_rules: List[StoppingRule], metrics: list, dump_frequency: int, exp_early_stopping: bool, callbacks: dict, **kwargs)[source]#: Execute a run of experiment using the replay workflow specified in replay_workflow. This method is used internally by the dispatcher invoked by tarexp.experiments.Experiment.run().

Note

Since the stopping rule runs are designed to be fast, resume is not supported in this experiment.