📊 Evaluation#

To track training progress, validation datasets can be provided to the Composer Trainer through the eval_dataloader parameter. The trainer will compute evaluation metrics on the evaluation dataset at a frequency specified by the the Trainer parameter eval_interval.

from composer import Trainer

trainer = Trainer(
...,
eval_interval="1ep",  # Default is every epoch
)


The metrics should be provided by ComposerModel.metrics().

Multiple Datasets#

If there are multiple validation datasets that may have different metrics, use Evaluator to specify each pair of dataloader and metrics. This class is just a container for a few attributes:

For example, the GLUE tasks for language models can be specified as in the following example:

from composer.core import Evaluator
from torchmetrics import Accuracy, MetricCollection
from composer.models.nlp_metrics import BinaryF1Score

label='glue_mrpc',
metrics=MetricCollection([BinaryF1Score(), Accuracy()])
)

label='glue_mnli',
metrics=Accuracy()
)

trainer = Trainer(
...,
...
)


In this case, the metrics from ComposerModel.metrics() will be ignored since they are explicitly provided above.