Examples ======== Please visit our Google Colab Demo to check out the full example |Colab|. .. |Colab| raw:: html Running Workflow ---------------- The following snippet is an example of creating a ``dataset`` instance for ``TARexp``. For ``scikit-learn`` rankers, the structure of the dataset is bascially a sparse ``scipy`` matrix for the vectorized dataset and a list or an array of binary labels with the same length of the matrix. .. code-block:: python from sklearn import datasets import pandas as pd rcv1 = datasets.fetch_rcv1() X = rcv1['data'] rel_info = pd.DataFrame(rcv1['target'].todense().astype(bool), columns=rcv1['target_names']) ds = tarexp.SparseVectorDataset.from_sparse(X) The following snippet defines a set of componets to use for a workflow, .. code-block:: python setting = component.combine(component.SklearnRanker(LogisticRegression, solver='liblinear'), component.PerfectLabeler(), component.RelevanceSampler(), component.FixedRoundStoppingRule(max_round=20))() And to declare a workflow, simply put in your dataset, setting, and other parameters to the workflow. .. code-block:: python workflow = tarexp.OnePhaseTARWorkflow( ds.set_label(rel_info['GPRO']), setting, seed_doc=[1023], batch_size=200, random_seed=123 ) And finally, you can start executing the workflow by running it as an iterator. We also support everything from `ir-measures `__ as evaluation metrics. .. code-block:: python recording_metrics = [ir_measures.RPrec, tarexp.OptimisticCost(target_recall=0.8, cost_structure=(25,5,5,1))] for ledger in workflow: print("Round {}: found {} positives in total".format(ledger.n_rounds, ledger.n_pos_annotated)) print("metric:", workflow.getMetrics(recording_metrics)) Besides standard IR evaluation metrics, we also implement ``OptimisticCost`` as cost-based evaluation metrics in ``TARexp``. Please refer to `this paper `__. .. code-block:: python tarexp.helper.cost_dynamic( df.loc[:, 'GOBIT', :].groupby(level='dataset'), recall_targets=[0.8], cost_structures=[(1,1,1,1), (10, 10, 1, 1), (25, 5, 5, 1)], with_hatches=True ) .. image:: ../../examples/cost-dynamic-1.png :alt: Cost Dynamic Graph with 3 Cost Structures :class: with-shadow Alternatively, you can also create this graph by using a command line interface .. code-block:: bash python -m tarexp.helper.plotting \ --runs GPRO=./my_tar_exp/GPRO.61b1f31a0a29de634939db77c0dde383/ \ GOBIT=./my_tar_exp/GOBIT.ae86e0b37809cb139dfa1f4cf914fb9b/ \ --cost_structures 1-1-1-1 25-5-5-1 --y_thousands --with_hatches .. image:: ../../examples/cost-dynamic-2.png :alt: Cost Dynamic Graph with 2 Runs and 2 Cost Structures :class: with-shadow