Microsoft’s VALL-E 2 AI Mimics Human Speech Perfectly, But Public Release Delayed Over Security Concerns
![](http://n.sinaimg.cn/spider20240711/275/w1200h675/20240711/eb4b-6b07bfbf44feeb46221484b4b474906b.jpg)
-
Microsoft’s latest AI text-to-speech program, VALL-E 2, mimics human speech with remarkable accuracy after just three seconds of audio input.
-
VALL-E 2 has surpassed previous models in naturalness and speaker similarity, achieving ‘human parity’ in speech quality.
-
The program showcases impressive capabilities in reproducing speech nuances.
-
Advanced features include Repetition Aware Sampling and Grouped Code Modeling, enhancing inference speed and speech robustness.
-
Potential applications are highlighted in education, translation, and journalism.
-
Concerns about misuse, particularly in voice spoofing and impersonation, have led to delays in public release.
-
Regulators are raising issues about control and data privacy in Microsoft’s AI implementation.
-
Experts recommend precautions like verbal passwords to address security risks.
-
VALL-E 2 remains a cutting-edge research project with its public release on hold until critical concerns are addressed.
Summary based on4 sources