“speech”

이 키워드와 관련된 논문 · GitHub · 뉴스를 한곳에 모았습니다.

논문 10

Semantic Scholar음성·오디오인용 349
확장 가능한 스트리밍 음성 합성을 위한 대규모 언어 모델 기반 CosyVoice 2CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models
OpenAlexML 방법론인용 148
순차 데이터 생성을 위한 잠재 변수 순환 신경망A Recurrent Latent Variable Model for Sequential Data
Semantic Scholar자연어·LLM인용 983
생성형 AI로 환자-의사 대화를 자동으로 SOAP/BIRP 진료 기록으로 변환Intelligent Clinical Documentation: Harnessing Generative AI for Patient-Centric Clinical Note Generation
Semantic Scholar인용 72
Fanar: An Arabic-Centric Multimodal Generative AI Platform
arXiv인용 0
Cross-Modal Masking for Robust Silent Speech Synthesis Using sEMG and Lipreading
arXiv인용 0
OpenBibleTTS: Large-Scale Speech Resources and TTS Models for Low-Resource Languages
arXiv인용 0
MeCo: One-Step MeanFlow-based Corrector for Multi-Channel Speech Separation
arXiv인용 0
TRADE: Transducer-Augmented Decoder for Speech LLM
arXiv인용 0
Whisper Hallucination Detection and Mitigation via Hidden Representation Steering and Sparse AutoEncoders
arXiv인용 0
The Lipreading Gap: Do VSR Models Perceive Visual Speech Like Human Lipreaders?

GitHub 1

음성·오디오Python★ 17.5K
170배 빠른 산업용 음성인식 툴킷, 화자 분리·감정 탐지·스트리밍 지원modelscope/FunASR

뉴스 1

deepmind▲ 0
Fluid, natural voice translation with Gemini 3.5 Live Translate