At Wittify.ai, we are on a mission to build the most advanced Arabic conversational AI, and voice is at the heart of this mission. We are developing human-like, multilingual TTS systems that redefine naturalness, expressiveness, and adaptability for real-world business and consumer applications.

We are looking for a Machine Learning Engineer (Advanced Speech & TTS) to lead the development of next-generation TTS models and voice cloning systems that set new benchmarks in the industry.

Role Overview

As a Machine Learning Engineer (Advanced Speech & TTS), you will design, train, and optimize cutting-edge TTS models using diffusion-based architectures, neural vocoders, and voice cloning techniques. You will collaborate with our speech scientists and AI engineers to build production-ready models that deliver ultra-realistic, controllable, and deployable Arabic speech synthesis.

What You’ll Do

Next-Generation TTS Model Development

Design, implement, and train state-of-the-art TTS models, including:

Diffusion-based architectures for ultra-natural speech

Non-autoregressive models for real-time performance

Multi-speaker TTS and zero-shot voice cloning

Neural Vocoder Integration

Develop and optimize high-fidelity neural vocoders to achieve studio-quality audio with low-latency streaming capabilities

Voice Cloning & Adaptation

Build speaker adaptation and voice cloning pipelines enabling rapid cloning of new voices with minimal data (few-shot or zero-shot approaches)

Implement flexible voice style transfer, emotion conditioning, and controllable prosody for human-like speech generation

Data & Pipeline Engineering

Create scalable pipelines for audio-text alignment, phoneme extraction, and dataset augmentation across multiple Arabic dialects

Utilize forced alignment tools (MFA) and phoneme-based modeling to enhance synthesis accuracy

Evaluation & Benchmarking

Conduct rigorous objective and subjective evaluations, including MOS testing, speaker similarity scoring, and intelligibility assessments

Benchmark models against industry-leading TTS systems to continuously improve performance

Deployment & Optimization

Optimize models for production using ONNX, TensorRT, quantization, and pruning to achieve real-time inference on cloud and edge devices

Research & Innovation

Stay updated with the latest TTS, voice cloning, and speech synthesis research (NeurIPS, ICASSP, Interspeech, arXiv) and integrate breakthroughs into production systems

Required Qualifications

Education & Experience

Master’s or PhD in Machine Learning, Speech Processing, Signal Processing, or related fields

3+ years of hands-on experience building and deploying advanced neural TTS or voice cloning systems

Technical Expertise

Deep proficiency in Python and PyTorch (preferred) or TensorFlow

Proven experience with:

Diffusion-based TTS models

Multi-speaker TTS and voice cloning (speaker embedding, speaker adaptation, zero-shot cloning)

Neural vocoders and deployment optimization for real-time performance

Strong understanding of DSP fundamentals and neural audio modeling

Nice to Have

Experience with multilingual or low-resource TTS adaptation, especially Arabic dialects

Contributions to open-source TTS or speech synthesis projects

Familiarity with LLM-integrated TTS approaches and context-aware speech synthesis

Why Join Us

Lead the development of Arabic-first voice AI systems at the frontier of speech synthesis research

Work in a collaborative environment with world-class AI engineers, linguists, and product teams

Join a mission-driven startup shaping the future of human-computer voice interactions

Enjoy flexible work arrangements, competitive compensation, and opportunities for rapid growth

Apply now and be part of building the future of Arabic voice AI.

Together, let’s wit-tify the future.

Show more Show less

Machine Learning Engineer (Advanced Speech & TTS) – Build the Future of Arabic Voice AI

وصف الوظيفة