InContextLearningCodeEvalAccuracy#
- class composer.metrics.InContextLearningCodeEvalAccuracy(dist_sync_on_step=False)[source]#
Computes accuracy for In-context learning (ICL) code evaluation tasks.
ICL code eval tasks consist of some number of example code eval tasks (referred to as the โcontextโ), followed by a test task where the model must complete the code, where we term the code completion a โcontinuationโ.
In each case, the model constructs a given number of continuations (termed pass@K for K continuations), and each continuation is run against a set of test cases. The model is considered correct if at least one of the proposed continuations passes all the test cases.
Runs on AWS Lambdas by default.
- Adds metric state variables:
correct (float): The number of instances where the predictions passed all the test cases. total (float): The number of total instances that were predicted.
- Parameters
dist_sync_on_step (bool, optional) โ Synchronize metric state across processes at each forward() before returning the value at the step. Default:
False
.
- estimator(n, c, k)[source]#
Computes the pass@k metric.
Given the number of generated samples, n, the number of correct samples, c, and the k of interest, this function calculates pass@k as 1 - comb(n - c, k) / comb(n, k) as per the definition of pass@k in the HumanEval paper (https://arxiv.org/abs/2107.03374) and itโs associated implementation: https://github.com/openai/human-eval.
- update(batch, outputs, labels)[source]#
Updates the pass@k accuracy of code generation.
Given a batch of prompts, test cases, and code generations, evaluates the code generations against the test cases and augments the pass@k accuracy of the batch to the values so far.
- Parameters
batch (Dict[str, Any]) โ A batch of data produced by the InContextLearningCodeEvalDataset, with
prompt (the) โ
cases (test) โ
following (and entry points. This will be a dictionary that must have the) โ
arguments โ
{ โ โpromptsโ: List[str], โtest_inputsโ: List[List[str]], โtest_outputsโ: List[List[str]], โentry_pointsโ: List[str], โlanguagesโ: List[str], โgeneration_kwargsโ: Dict[str, Any]
} โ
outputs (List[str]) โ A list of code generations in the format of HF generate with beam search,
2 (prompt 1 gen) โ
list (the) โ
1 (prompt 2 gen) โ
2 โ
1 โ
2] (prompt 2 gen) โ
labels (List[str]) โ A list of the correct code generations, for compatibility with existing HF generate
used. (functionalities. This is not) โ