๐ Evaluation#
To track training progress, validation datasets can be provided to the
Composer Trainer through the eval_dataloader
parameter. The trainer
will compute evaluation metrics on the evaluation dataset at a frequency
specified by the the Trainer
parameter eval_interval
.
from composer import Trainer
trainer = Trainer(
...,
eval_dataloader=my_eval_dataloader,
eval_interval="1ep", # Default is every epoch
)
The metrics should be provided by ComposerModel.get_metrics()
.
For more information, see the โMetricsโ section in ๐ป ComposerModel.
To provide a deeper intuition, hereโs pseudocode for the evaluation logic that occurs every eval_interval
:
metrics = model.get_metrics(train=False)
for batch in eval_dataloader:
outputs, targets = model.eval_forward(batch)
metrics.update(outputs, targets) # implements the torchmetrics interface
metrics.compute()
The trainer iterates over
eval_dataloader
and passes each batch to the modelโsComposerModel.eval_forward()
method.Outputs of
model.eval_forward
are used to update the metrics (atorchmetrics.Metric
returned by.ComposerModel.get_metrics
).Finally, metrics over the whole validation dataset are computed.
Note that the tuple returned by ComposerModel.eval_forward()
provide the positional arguments to metric.update
.
Please keep this in mind when using custom models and/or metrics.
Multiple Datasets#
If there are multiple validation datasets that may have different metrics,
use Evaluator
to specify each pair of dataloader and metrics.
This class is just a container for a few attributes:
label
: a user-specified name for the evaluator.dataloader
: PyTorchDataLoader
or ourDataSpec
.See DataLoaders for more details.
metric_names
: list of names of metrics to track.
For example, the GLUE tasks for language models can be specified as in the following example:
from composer.core import Evaluator
from composer.models.nlp_metrics import BinaryF1Score
glue_mrpc_task = Evaluator(
label='glue_mrpc',
dataloader=mrpc_dataloader,
metric_names=['BinaryF1Score', 'Accuracy']
)
glue_mnli_task = Evaluator(
label='glue_mnli',
dataloader=mnli_dataloader,
metric_names=['Accuracy']
)
trainer = Trainer(
...,
eval_dataloader=[glue_mrpc_task, glue_mnli_task],
...
)
Note that metric_names must be a subset of the metrics provided by the model in ComposerModel.get_metrics()
.