Evaluator#
- class composer.Evaluator(*, label, dataloader, metric_names=None, subset_num_batches=None, eval_interval=None, device_eval_microbatch_size=None)[source]#
A wrapper for a dataloader to include metrics that apply to a specific dataset.
For example,
CrossEntropyLoss
metric for NLP models.>>> eval_evaluator = Evaluator( ... label='myEvaluator', ... dataloader=eval_dataloader, ... metric_names=['MulticlassAccuracy'] ... ) >>> trainer = Trainer( ... model=model, ... train_dataloader=train_dataloader, ... eval_dataloader=eval_evaluator, ... optimizers=optimizer, ... max_duration='1ep', ... )
- Parameters
label (str) โ Name of the Evaluator.
dataloader (DataSpec | Iterable | dict[str, Any]) โ Iterable that yields batches, a
DataSpec
for evaluation, or a dict ofDataSpec
kwargs.metric_names โ
The list of metric names to compute. Each value in this list can be a regex string (e.g. โMulticlassAccuracyโ, โf1โ for โBinaryF1Scoreโ, โTop-.โ for โTop-1โ, โTop-2โ, etc). Each regex string will be matched against the keys of the dictionary returned by
model.get_metrics()
. All matching metrics will be evaluated.By default, if left blank, then all metrics returned by
model.get_metrics()
will be used.subset_num_batches (int, optional) โ The maximum number of batches to use for each evaluation. Defaults to
None
, which means that theeval_subset_num_batches
parameter from theTrainer
will be used. Set to-1
to evaluate the entiredataloader
.eval_interval (Time | int | str | (State, Event) -> bool, optional) โ
An integer, which will be interpreted to be epochs, a str (e.g.
1ep
, or10ba
), aTime
object, or a callable. Defaults toNone
, which means that theeval_interval
parameter from theTrainer
will be used.If an integer (in epochs),
Time
string, orTime
instance, the evaluator will be run with this frequency.Time
strings orTime
instances must have units ofTimeUnit.BATCH
orTimeUnit.EPOCH
.Set to
0
to disable evaluation.If a callable, it should take two arguments (
State
,Event
) and return a bool representing whether the evaluator should be invoked. The event will be eitherEvent.BATCH_END
orEvent.EPOCH_END
.When specifying
eval_interval
, the evaluator(s) are also run at theEvent.FIT_END
if it doesnโt evenly divide the training duration.device_eval_microbatch_size (str | int | float, optional) โ The number of samples to use for each microbatch when evaluating. If set to
auto
, dynamically decreases device_eval_microbatch_size if microbatch is too large for GPU. If None, sets device_eval_microbatch_size to per rank batch size. (default:None
)