Machine Learning Engineer (Advanced Speech & TTS) – Build the Future of Arabic Voice AI
وصف الوظيفة
At Wittify.ai, we are on a mission to build the most advanced Arabic conversational AI, and voice is at the heart of this mission. We are developing human-like, multilingual TTS systems that redefine naturalness, expressiveness, and adaptability for real-world business and consumer applications.
We are looking for a Machine Learning Engineer (Advanced Speech & TTS) to lead the development of next-generation TTS models and voice cloning systems that set new benchmarks in the industry.
Role Overview
As a Machine Learning Engineer (Advanced Speech & TTS), you will design, train, and optimize cutting-edge TTS models using diffusion-based architectures, neural vocoders, and voice cloning techniques. You will collaborate with our speech scientists and AI engineers to build production-ready models that deliver ultra-realistic, controllable, and deployable Arabic speech synthesis.
What You’ll Do
Next-Generation TTS Model Development
Design, implement, and train state-of-the-art TTS models, including:
Diffusion-based architectures for ultra-natural speech
Non-autoregressive models for real-time performance
Multi-speaker TTS and zero-shot voice cloning
Neural Vocoder Integration
Develop and optimize high-fidelity neural vocoders to achieve studio-quality audio with low-latency streaming capabilities
Voice Cloning & Adaptation
Build speaker adaptation and voice cloning pipelines enabling rapid cloning of new voices with minimal data (few-shot or zero-shot approaches)
Implement flexible voice style transfer, emotion conditioning, and controllable prosody for human-like speech generation
Data & Pipeline Engineering
Create scalable pipelines for audio-text alignment, phoneme extraction, and dataset augmentation across multiple Arabic dialects
Utilize forced alignment tools (MFA) and phoneme-based modeling to enhance synthesis accuracy
Evaluation & Benchmarking
Conduct rigorous objective and subjective evaluations, including MOS testing, speaker similarity scoring, and intelligibility assessments
Benchmark models against industry-leading TTS systems to continuously improve performance
Deployment & Optimization
Optimize models for production using ONNX, TensorRT, quantization, and pruning to achieve real-time inference on cloud and edge devices
Research & Innovation
Stay updated with the latest TTS, voice cloning, and speech synthesis research (NeurIPS, ICASSP, Interspeech, arXiv) and integrate breakthroughs into production systems
Required Qualifications
Education & Experience
Master’s or PhD in Machine Learning, Speech Processing, Signal Processing, or related fields
3+ years of hands-on experience building and deploying advanced neural TTS or voice cloning systems
Technical Expertise
Deep proficiency in Python and PyTorch (preferred) or TensorFlow
Proven experience with:
Diffusion-based TTS models
Multi-speaker TTS and voice cloning (speaker embedding, speaker adaptation, zero-shot cloning)
Neural vocoders and deployment optimization for real-time performance
Strong understanding of DSP fundamentals and neural audio modeling
Nice to Have
Experience with multilingual or low-resource TTS adaptation, especially Arabic dialects
Contributions to open-source TTS or speech synthesis projects
Familiarity with LLM-integrated TTS approaches and context-aware speech synthesis
Why Join Us
Lead the development of Arabic-first voice AI systems at the frontier of speech synthesis research
Work in a collaborative environment with world-class AI engineers, linguists, and product teams
Join a mission-driven startup shaping the future of human-computer voice interactions
Enjoy flexible work arrangements, competitive compensation, and opportunities for rapid growth
Apply now and be part of building the future of Arabic voice AI.
Together, let’s wit-tify the future.
Show more Show less