OpenAI’s SWE-Lancer benchmark is here!
A dataset of 1,400+ tasks sourced from Upwork worth $1M to evaluate the capabilities of advanced AI language models in real-world freelance software engineering tasks.
Learn more on #InfoQ https://bit.ly/4iwLhrY