CodeLLM Evaluator
Easy to evaluate with fast inference settings CodeLLMs
Overview
CodeLLM Evaluator provide the ability for fast and efficiently evaluation on code generation task. Inspired by lm-evaluation-harness and bigcode-eval-harness, we designed our framework for multiple use-case, easy to add new metrics and customized task.
Features:
Implemented HumanEval, MBPP benchmarks for Coding LLMs.
Support for models loaded via transformers, DeepSpeed.
Support for evaluation on adapters (e.g. LoRA) supported in HuggingFace’s PEFT library.
Support for inference with distributed native transformers or fast inference with VLLMs backend.
Easy support for custom prompts, task and metrics.
Setup
Install code-eval package from the github repository via pip:
$ git clone https://github.com/FSoft-AI4Code/code-llm-evaluator.git
$ cd code-llm-evaluator
$ pip install -e .
Quick-start
To evaluate a supported task in python, you can load our code_eval.Evaluator() to generate
and compute evaluate metrics on the run.
from code_eval import Evaluator
from code_eval.task import HumanEval
task = HumanEval()
evaluator = Evaluator(task=task)
output = evaluator.generate(num_return_sequences=3,
batch_size=16,
temperature=0.9)
result = evaluator.evaluate(output)
CLI Usage
Inference with Transformers
Load model and generate answer using native transformers (tf), pass model local path or
HuggingFace Hub name. We select transformers as default backend, but you can pass backend="tf" to specify it:
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend hf \
Tip
Load LoRA adapters by add --peft_model argument. The --model_name must point
to full model architecture.
$ code-eval --model_name microsoft/phi-1 \
--peft_model <adapters-name> \
--task humaneval \
--batch_size 8 \
--backend hf \
Inference with vLLM engine
We recommend using vLLM engine for fast inference. vLLM supported tensor parallel, data parallel or combination of both. Reference to vLLM documenation for more detail.
To use code-eval with vLLM engine, please refer to vLLM engine documents to instal it.
Note
You can install vLLM using pip:
$ pip install vllm
With model supported by vLLM (See more: vLLM supported model) run:
$ code-eval --model_name microsoft/phi-1 \
--task humaneval \
--batch_size 8 \
--backend vllm
Tip
You can use LoRA with similar syntax.
$ code-eval --model_name microsoft/phi-1 \
--peft_model <adapters-name> \
--task humaneval \
--batch_size 8 \
--backend vllm \
Cite as
@misc{code-eval,
author = {Dung Nguyen Manh},
title = {A framework for easily evaluation code generation model},
month = 3,
year = 2024,
publisher = {github},
version = {v0.0.1},
url = {https://github.com/FSoft-AI4Code/code-llm-evaluator}
}