EvalClient#

class composer.utils.EvalClient[source]#

Abstract class for implementing eval clients, such as LambdaEvalClient.

close()[source]#

Close the object store.

invoke(payload)[source]#

Invoke a provided batch of dictionary payload to the client.

For code generation, the payload is a list of list of lists of JSONs. The lists are organized in a nested structure, with the outer list being grouped by the prompt. For each prompt, the model generates a series of possible continuations, which we term generation beams. As a result, in the nested list for each prompt, the lists are grouped by the generation beam, since each prompt produces some set number. In the final tier of nesting for each generation, the JSONs are grouped by test case. We note that the evaluation client is agnostic to the list structure and instead iterates over each JSON payload for a test cases, converting each JSON to a boolean independently, only maintaining the list shape. The JSON for each test case containing the following attributes:

{

โ€˜codeโ€™: <code to be evaluated>, โ€˜inputโ€™: <test input>, โ€˜outputโ€™: <test output>, โ€˜entry_pointโ€™: <entry point>, โ€˜languageโ€™: <language>,

}

The JSON is formatted as [[[request]]] so that the client can batch requests. The outermost list is for the generations of a given prompt, the middle list is for the beam generations of a given prompt, and the innermost list is for each test cases. :param payload: the materials of the batched HTTPS request to the client organized by prompt, beam generation, and test case.

Returns

Whether the test case passed or failed.