Evaluator#

class composer.Evaluator(*, label, dataloader, metric_names=None, subset_num_batches=None, eval_interval=None, device_eval_microbatch_size=None)[source]#

A wrapper for a dataloader to include metrics that apply to a specific dataset.

For example, CrossEntropyLoss metric for NLP models.

>>> eval_evaluator = Evaluator(
...     label='myEvaluator',
...     dataloader=eval_dataloader,
...     metric_names=['MulticlassAccuracy']
... )
>>> trainer = Trainer(
...     model=model,
...     train_dataloader=train_dataloader,
...     eval_dataloader=eval_evaluator,
...     optimizers=optimizer,
...     max_duration='1ep',
... )

Parameters

label (str) – Name of the Evaluator.
dataloader (DataSpec | Iterable | dict[str, Any]) – Iterable that yields batches, a DataSpec for evaluation, or a dict of DataSpec kwargs.
metric_names –
The list of metric names to compute. Each value in this list can be a regex string (e.g. “MulticlassAccuracy”, “f1” for “BinaryF1Score”, “Top-.” for “Top-1”, “Top-2”, etc). Each regex string will be matched against the keys of the dictionary returned by model.get_metrics(). All matching metrics will be evaluated.

By default, if left blank, then all metrics returned by model.get_metrics() will be used.
subset_num_batches (int, optional) – The maximum number of batches to use for each evaluation. Defaults to None, which means that the eval_subset_num_batches parameter from the Trainer will be used. Set to -1 to evaluate the entire dataloader.
eval_interval (Time | int | str | (State, Event) -> bool, optional) –
An integer, which will be interpreted to be epochs, a str (e.g. 1ep, or 10ba), a Time object, or a callable. Defaults to None, which means that the eval_interval parameter from the Trainer will be used.

If an integer (in epochs), Time string, or Time instance, the evaluator will be run with this frequency. Time strings or Time instances must have units of TimeUnit.BATCH or TimeUnit.EPOCH.

Set to 0 to disable evaluation.

If a callable, it should take two arguments (State, Event) and return a bool representing whether the evaluator should be invoked. The event will be either Event.BATCH_END or Event.EPOCH_END.

When specifying eval_interval, the evaluator(s) are also run at the Event.FIT_END if it doesn’t evenly divide the training duration.
device_eval_microbatch_size (str | int | float, optional) – The number of samples to use for each microbatch when evaluating. If set to auto, dynamically decreases device_eval_microbatch_size if microbatch is too large for GPU. If None, sets device_eval_microbatch_size to per rank batch size. (default: None)