Examples
========
Please visit our Google Colab Demo to check out the full example |Colab|.
.. |Colab| raw:: html
Running Workflow
----------------
The following snippet is an example of creating a ``dataset`` instance for ``TARexp``.
For ``scikit-learn`` rankers, the structure of the dataset is bascially a sparse ``scipy`` matrix
for the vectorized dataset and a list or an array of binary labels with the same length of the matrix.
.. code-block:: python
from sklearn import datasets
import pandas as pd
rcv1 = datasets.fetch_rcv1()
X = rcv1['data']
rel_info = pd.DataFrame(rcv1['target'].todense().astype(bool), columns=rcv1['target_names'])
ds = tarexp.SparseVectorDataset.from_sparse(X)
The following snippet defines a set of componets to use for a workflow,
.. code-block:: python
setting = component.combine(component.SklearnRanker(LogisticRegression, solver='liblinear'),
component.PerfectLabeler(),
component.RelevanceSampler(),
component.FixedRoundStoppingRule(max_round=20))()
And to declare a workflow, simply put in your dataset, setting, and other parameters to the workflow.
.. code-block:: python
workflow = tarexp.OnePhaseTARWorkflow(
ds.set_label(rel_info['GPRO']),
setting,
seed_doc=[1023],
batch_size=200,
random_seed=123
)
And finally, you can start executing the workflow by running it as an iterator.
We also support everything from `ir-measures `__ as evaluation metrics.
.. code-block:: python
recording_metrics = [ir_measures.RPrec, tarexp.OptimisticCost(target_recall=0.8, cost_structure=(25,5,5,1))]
for ledger in workflow:
print("Round {}: found {} positives in total".format(ledger.n_rounds, ledger.n_pos_annotated))
print("metric:", workflow.getMetrics(recording_metrics))
Besides standard IR evaluation metrics, we also implement ``OptimisticCost`` as cost-based evaluation metrics in ``TARexp``.
Please refer to `this paper `__.
.. code-block:: python
tarexp.helper.cost_dynamic(
df.loc[:, 'GOBIT', :].groupby(level='dataset'),
recall_targets=[0.8], cost_structures=[(1,1,1,1), (10, 10, 1, 1), (25, 5, 5, 1)],
with_hatches=True
)
.. image:: ../../examples/cost-dynamic-1.png
:alt: Cost Dynamic Graph with 3 Cost Structures
:class: with-shadow
Alternatively, you can also create this graph by using a command line interface
.. code-block:: bash
python -m tarexp.helper.plotting \
--runs GPRO=./my_tar_exp/GPRO.61b1f31a0a29de634939db77c0dde383/ \
GOBIT=./my_tar_exp/GOBIT.ae86e0b37809cb139dfa1f4cf914fb9b/ \
--cost_structures 1-1-1-1 25-5-5-1 --y_thousands --with_hatches
.. image:: ../../examples/cost-dynamic-2.png
:alt: Cost Dynamic Graph with 2 Runs and 2 Cost Structures
:class: with-shadow