Company Introduction
At Mindrift, innovation meets opportunity. We believe in using the power of collective human intelligence to ethically shape the future of AI.
Responsibilities
- Create structured test cases that simulate complex human workflows
- Define gold-standard behavior and scoring logic to evaluate agent actions
- Analyze agent logs, failure modes, and decision paths
- Work with code repositories and test frameworks to validate your scenarios
- Iterate on prompts, instructions, and test cases to improve clarity and difficulty
- Ensure that scenarios are production-ready, easy to run, and reusable
Requirements
- Bachelor's and/or Master's Degree in Computer Science, Software Engineering, Data Science / Data Analytics, Artificial Intelligence / Machine Learning, Computational Linguistics / Natural Language Processing (NLP), Information Systems or other related fields
- Background in QA, software testing, data analysis, or NLP annotation
- Good understanding of test design principles (e.g., reproducibility, coverage, edge cases)
- Strong written communication skills in English
- Comfortable with structured formats like JSON/YAML for scenario description
- Can define expected agent behaviors (gold paths) and scoring logic
- Basic experience with Python and JS
- Curious and open to working with AI-generated content, agent logs, and prompt-based behavior
Nice to Have
- Experience in writing manual or automated test cases
- Familiarity with LLM capabilities and typical failure modes
- Understanding of scoring metrics (precision, recall, coverage, reward functions)
Benefits
- Get paid for your expertise, with rates that can go up to $50/hour depending on your skills, experience, and project needs
- Take part in a flexible, remote, freelance project that fits around your primary professional or academic commitments
- Participate in an advanced AI project and gain valuable experience to enhance your portfolio
- Influence how future AI models understand and communicate in your field of expertise