The AppWorld dataset is designed for "Computer Use Agents". These agents possess capabilities such as multimodal reasoning, control over applications through simulated or API-driven inputs, memory management, and autonomy in executing multistep flows. They can adaptively interact with systems, perform actions, update files, navigate menus, and generate responses, effectively automating tasks across various applications by understanding user instructions.
No results indexed yet — be the first to submit a score.
Submit a checkpoint and a reproduction script. We will run it, publish the score, and — if it takes the top — annotate the step on the progress chart with your name.