🚀 AI Evaluation Data Scientist - 9-Month Contract - LLM / GenAI

Hiring now — limited positions available!

European Tech Recruit

  • 📍 Location: Barcelona
  • đź“… Posted: Oct 29, 2025
Overview

AI Evaluation Data Scientist - 9-Month Contract - LLM / GenAI. Hybrid model (2-3 days per week onsite) in Barcelona or Madrid. We’re seeking a

AI Evaluation Data Scientist

to join our client on an initial 9 month fixed term contract. What you’ll do

Lead evaluation strategies for AI systems, turning customer workflows and business objectives into measurable success metrics. Shape system design from a data and evaluation perspective, influencing retrieval, orchestration, tool integration, and memory for real-world problem solving. Create multi-step, task-based evaluations reflecting performance of components in cloud and edge scenarios. Develop robust frameworks that assess reasoning, factual accuracy, reliability, and end-user success beyond standard benchmarks. Build reproducible evaluation pipelines with datasets, scenarios, test suites, versioned assets, and automated runs. Curate and generate high-quality evaluation datasets, including synthetic and adversarial examples. Implement automated scoring aligned with human feedback, ensuring fairness and robustness. Perform deep error analyses, identify failure modes, and provide actionable insights for system improvement. Collaborate with ML teams to drive continuous improvements in data, prompts, tool usage, model training, and system performance. Monitor operational metrics such as latency, cost, and reliability to align evaluations with production standards. What you’ll need

Master’s or PhD in Computer Science, Machine Learning, Data Science, Physics, Engineering, or a closely related discipline, with relevant professional experience. At least 3 years of applied industry experience in data science or machine learning roles (5+ years for senior level), with a track record of building and deploying AI/ML solutions in production. Strong expertise in assessing the performance of machine learning models, ideally including large language models, retrieval-augmented generation workflows, or agent-based systems. Demonstrated ability to design evaluation strategies that extend beyond static benchmarks, addressing real-world success criteria, reasoning capability, and model robustness. Practical experience creating and refining datasets for both training and evaluation, including the use of synthetic data generation techniques. Hands-on work with agent-driven approaches (task planning, tool integration, reasoning pipelines), retrieval-based architectures (retrievers, vector databases, reranking components), and orchestration frameworks such as LangGraph or LlamaIndex. Strong analytical mindset and problem-solving ability, with experience handling ambiguity and developing pragmatic solutions for open-ended technical or business challenges. Solid software engineering foundation with proficiency in Python, Docker, Git, and experience building reliable, modular, and scalable ML systems. Familiarity with widely used ML and data toolkits and libraries (e.g., PyTorch, HuggingFace, LangGraph, LlamaIndex, Pandas). Experience working with cloud services, preferably AWS. Employment and eligibility

In accordance with local employment laws, applicants must have current, valid authorisation to work in Spain at the time of application. We are unable to sponsor employment visas for this role. Applications from individuals without existing work authorisation for Spain cannot be considered. How to apply

If this sounds interesting and you’d like to learn more, please apply via the link below or email your CV to

. By applying to this role you understand that we may collect your personal data and store and process it on our systems. For more information please see our Privacy Notice ( Job details

Seniority level: Mid-Senior level Employment type: Contract Job function: Engineering, Information Technology, and Other Industries: Software Development, Computer Hardware Manufacturing, Technology, Information and Media Referrals increase your chances of interviewing at European Tech Recruit by 2x

#J-18808-Ljbffr
👉 Apply Now

Hurry — interviews are being scheduled daily!