CloudEval-YAML: A Practical Benchmark for Cloud Native YAML Configuration Generation

Abstract

We present CloudEval-YAML, a practical benchmark for cloud configuration generation. CloudEval-YAML tackles the diversity challenge by focusing on YAML, the de facto standard of numerous cloud-native tools. We develop the CloudEval-YAML benchmark with practicality in mind: the dataset consists of hand-written problems with unit tests targeting practical scenarios. We further enhanced the dataset to meet practical needs by rephrasing questions in a concise, abbreviated, and bilingual manner. The dataset consists of 1011 problems that take more than 1200 human hours to complete. To improve practicality during evaluation, we build a scalable evaluation platform for CloudEval-YAML that achieves a 20 times speedup over a single machine. To the best of our knowledge, the CloudEval-YAML dataset is the first hand-written dataset targeting cloud-native applications. We present an in-depth evaluation of 12 LLMs, leading to a deeper understanding of the problems and LLMs, as well as effective methods to improve task performance and reduce cost.

Dataset

Data Augmentation

Description of the image

Evaluation Platform

Leaderboard

</p> </div> </div> </div>   <div class="columns is-centered has-text-centered"> <div class="column is-four-fifths"> <h2 class="title is-3">BibTex</h2> <div class="content has-text-justified"> <p>If you find our work useful, please cite:</p> <pre><code>@article{xu2023cloudeval, author = {Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Z. Morley Mao, Ennan Zhai, Dennis Cai}, title = {CloudEval-YAML: A Practical Benchmark for Cloud Native YAML Configuration Generation}, journal = {Proceedings of The Seventh Annual Conference on Machine Learning and Systems, 2024, Santa Clara}, year = {2024} }</code></pre> </div> </div> </div>  </div> </section> <section class="section" id="BibTeX"> <div class="container is-max-desktop content"> <h2 class="title">BibTeX</h2> <pre><code>@article{xu2023cloudeval, author = {Yifei Xu, Yuning Chen, Xumiao Zhang, Xianshang Lin, Pan Hu, Yunfei Ma, Songwu Lu, Wan Du, Z. Morley Mao, Ennan Zhai, Dennis Cai}, title = {CloudEval-YAML: A Practical Benchmark for Cloud Native YAML Configuration Generation}, journal={Proceedings of The Seventh Annual Conference on Machine Learning and Systems, 2024, Santa Clara}, year={2024} }</code></pre> </div> </section> <footer class="footer"> <div class="container"> <div class="content has-text-centered"> <a class="icon-link" href="./static/files/cloudeval-yaml.pdf"> <i class="fas fa-file-pdf"></i> </a> <a class="icon-link" href="https://github.com/alibaba/CloudEval-YAML" class="external-link" disabled> <i class="fab fa-github"></i> </a> </div> <div class="columns is-centered"> <div class="column is-8"> <div class="content"> <p> This website is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>. </p> <p> This means you are free to borrow the <a href="https://github.com/nerfies/nerfies.github.io">source code</a> of this website, we just ask that you link back to this page in the footer. Please remember to remove the analytics code included in the header of the website which you do not want on your website. </p> </div> </div> </div> </div> </footer> </body> </html>