R.
|
Model
|
Text-level Score
|
YAML-Aware Score
|
Function-level Score
|
|
Name
|
Size
|
Open
Source
|
BLEU
|
Edit
Dist.
|
Exact
Match
|
Key-value
Exact
|
Key-value
Wildcard
|
Unit Test
|
1
|
GPT-4 Turbo
|
?
|
N
|
0.649
|
0.551
|
0.099
|
0.208
|
0.667
|
0.561
|
2
|
GPT-4
|
?
|
N
|
0.629
|
0.538
|
0.092
|
0.198
|
0.641
|
0.515
|
3
|
GPT-3.5
|
?
|
N
|
0.612
|
0.511
|
0.075
|
0.154
|
0.601
|
0.412
|
4
|
PaLM-2-bison
|
?
|
N
|
0.537
|
0.432
|
0.040
|
0.092
|
0.506
|
0.322
|
5
|
Llama-2-70b-chat
|
70B
|
Y
|
0.355
|
0.305
|
0.000
|
0.020
|
0.276
|
0.085
|
6
|
Llama-2-13b-chat
|
13B
|
Y
|
0.341
|
0.298
|
0.000
|
0.016
|
0.265
|
0.067
|
7
|
Wizardcoder-34b-v1.0
|
34B
|
Y
|
0.238
|
0.247
|
0.007
|
0.013
|
0.230
|
0.056
|
8
|
Llama-2-7b-chat
|
7B
|
Y
|
0.289
|
0.231
|
0.000
|
0.009
|
0.177
|
0.027
|
9
|
Wizardcoder-15b-v1.0
|
15B
|
Y
|
0.217
|
0.255
|
0.002
|
0.002
|
0.226
|
0.026
|
10
|
Llama-7b
|
7B
|
Y
|
0.106
|
0.058
|
0.004
|
0.005
|
0.069
|
0.023
|
11
|
Llama-13b-lora
|
13B
|
Y
|
0.101
|
0.054
|
0.001
|
0.003
|
0.065
|
0.021
|
12
|
Codellama-7b-instruct
|
7B
|
Y
|
0.154
|
0.174
|
0.001
|
0.001
|
0.124
|
0.015
|
13
|
Codellama-13b-instruct
|
13B
|
Y
|
0.179
|
0.206
|
0.002
|
0.002
|
0.142
|
0.012
|