🚀 Applied Scientist Intern: Multimodal Conversational AI
Hiring now — limited positions available!
Microsoft Corporation
- 📍 Location: Cambridge
- 📅 Posted: Oct 20, 2025
Microsoft Teams is the hub for teamwork that integrates all the people, content, and tools your team needs to be more engaged and effective. It is core to Microsoftu2019s modern work, modern life & modern education value prop.u202fWe are reinventing the way people communicate and work together across the globe.u202fu202f
We are looking to hire a PhD (or published MSc) candidate for a **12-week internship** (ideally from February 2026) to join CMD Labs u2013 an applied science team within Microsoft Teams u2013 to work on the next generation of live conversational voice agents. Specifically, we are interested in making live conversations with AI u2013 especially in multi-party scenarios - more natural and human like by leveraging real time multi-modal signals e.g. from transcript, voice and video.
The intern will be fully onboarded onto our current science and production code base and be expected to investigate, propose, implement and test new algorithms and approaches in this area u2013 solving problems of direct relevance to product. The intern will also be expected to present results internally at the end of the position and write up the work for publication in a leading academic AI conference (e.g. ICML, NeurIPS, ACL, CVPR, ICCV).
You will partner withu202fresearch, product and engineering teams to invent and deliver the future for Microsoft Teams, Microsoft Copilot and other AI products.u202f
This role is based in **Cambridge (United Kingdom)** .
Our culture is inclusive and collaborative; our team members come from diverse backgrounds, are respectful to one another and achieve impact by building on each otheru2019s strengths and skills. We focus our energy on AI projects that are likely to have high impact on our products and bring high value to our customers. Our team has a strong sense of bias for action and accountability and provides its members with many opportunities for learning and career growth.
**Responsibilities**
Conduct experiments, create and validate metrics, and develop candidate algorithms to improve live voice conversation experiences with AI agents by leveraging real-time multimodal data.
Collaborate closely with CMD Labs researchers and engineers to leverage existing assets, datasets, and ensure results can be leveraged back into the product.
Embody Microsoft culture and values.
**Qualifications**
**Required**
Currently enrolled in a PhD program (or published candidate in MSc program) in Computer Science, Electrical or Computer Engineering, Statistics, or a related field.
Practical experience in training, fine-tuning, transformer models or LLMs e.g., using text, audio and/or images.
Practical Python coding experience leveraging PyTorch or TensorFlow or similar framework
Excellent analytical, coding, communication, and collaborative skills.
**Preferredu202fu202f** u202fu202f
Field of research and publications directly related to multimodal AI, including e.g., computer vision and audio modelling u2013 with an emphasis on live / real-time applications.
Experience in model quantization, pruning or distillation.
Experience working in the domain of live speech processing and conversational AI.
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations ( .
We are looking to hire a PhD (or published MSc) candidate for a **12-week internship** (ideally from February 2026) to join CMD Labs u2013 an applied science team within Microsoft Teams u2013 to work on the next generation of live conversational voice agents. Specifically, we are interested in making live conversations with AI u2013 especially in multi-party scenarios - more natural and human like by leveraging real time multi-modal signals e.g. from transcript, voice and video.
The intern will be fully onboarded onto our current science and production code base and be expected to investigate, propose, implement and test new algorithms and approaches in this area u2013 solving problems of direct relevance to product. The intern will also be expected to present results internally at the end of the position and write up the work for publication in a leading academic AI conference (e.g. ICML, NeurIPS, ACL, CVPR, ICCV).
You will partner withu202fresearch, product and engineering teams to invent and deliver the future for Microsoft Teams, Microsoft Copilot and other AI products.u202f
This role is based in **Cambridge (United Kingdom)** .
Our culture is inclusive and collaborative; our team members come from diverse backgrounds, are respectful to one another and achieve impact by building on each otheru2019s strengths and skills. We focus our energy on AI projects that are likely to have high impact on our products and bring high value to our customers. Our team has a strong sense of bias for action and accountability and provides its members with many opportunities for learning and career growth.
**Responsibilities**
Conduct experiments, create and validate metrics, and develop candidate algorithms to improve live voice conversation experiences with AI agents by leveraging real-time multimodal data.
Collaborate closely with CMD Labs researchers and engineers to leverage existing assets, datasets, and ensure results can be leveraged back into the product.
Embody Microsoft culture and values.
**Qualifications**
**Required**
Currently enrolled in a PhD program (or published candidate in MSc program) in Computer Science, Electrical or Computer Engineering, Statistics, or a related field.
Practical experience in training, fine-tuning, transformer models or LLMs e.g., using text, audio and/or images.
Practical Python coding experience leveraging PyTorch or TensorFlow or similar framework
Excellent analytical, coding, communication, and collaborative skills.
**Preferredu202fu202f** u202fu202f
Field of research and publications directly related to multimodal AI, including e.g., computer vision and audio modelling u2013 with an emphasis on live / real-time applications.
Experience in model quantization, pruning or distillation.
Experience working in the domain of live speech processing and conversational AI.
Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations ( .
👉 Apply Now
Hurry — interviews are being scheduled daily!