Awesome Speaker Recognition

Awesome Speaker Recognition

Table of contents

Overview

本文用于记录一些经典的说话人识别相关的论文、数据集、开源软件工具等资源,便于后续的回顾及使用。

本文会长期持续更新。。。

Publications

Time Paper Paper Note Abstract
2019 X-vector DNN Refinement with Full-length Recordings for Speaker Recognition 论文笔记 ETDNN + AMSoftmax + full-length recording refinement,采用cosine metrics,SWIT数据集上实现~28%的EER相对下降。
  Multi-Task Learning with High-Order Statistics for X-vector based Text-Independent Speaker Verification 论文笔记 多任务学习将输入信号的高阶统计量encoding到speaker embedding中,实验证明该方法能给轻微提高文本无关的说话人识别性能。
  An End-to-End Text-independent Speaker Verification Framework with a Keyword Adversarial Network 论文笔记 针对说话人确认这个任务,为了提升文本无关场景下的性能,作者引入ASR adversarial network,使得SE(speaker embedding)网络生成更好的文本无关embedding,实验证明该方法能给大幅提高文本无关的说话人识别性能。
2018 Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification 论文笔记 作者在相关笔记xvector 的基础上,将average pooling 层替换为self-attention层,实验结果显示self-attention在不同duration的场景下都取得了一定性能的提升。
  X-Vectors: Robust DNN Embeddings for Speaker Recognition 论文笔记 采用TDNN提取embedding,然后使用PLDA作为打分后端,相比ivector+PLDA,在性能上可以得到大幅提升。另外,作者还提出使用数据增广的方式,进一步提升性能。
2017 TRISTOUNET: TRIPLET LOSS FOR SPEAKER TURN EMBEDDING    
  Deep Speaker: an End-to-End Neural Speaker Embedding System 论文笔记  
  End-to-End Text-Independent Speaker Verification with Triplet Loss on Short Utterances    
2016 DEEP NEURAL NETWORK-BASED SPEAKER EMBEDDINGS FOR END-TO-END SPEAKER VERIFICATION 论文笔记 传统的说话人识别方案中,一般采用Ivector作为前端,PLDA作为后端打分,本文黄总作者提出了一种end-to-end的说话人验证的方案,采用DNN网络提取前端特征,同时训练后端的打分参数,相比于Ivector,获得了较大的性能提升。另外由于两种方法具有一定的正交性,通过score fusion,性能可以取得进一步的提升。
2015 Time delay deep neural network-based universal background models for speaker recognition 论文笔记 鉴于TDNN在ASR领域中的优秀性能,针对DNN-Ivector的方案,作者采用TDNN来代替GMM计算后验概率,得到充分统计量,取得了50%的EER下降。在保持同样计算量的条件下,获得了20%的EER下降。
  Locally-Connected and Convolutional Neural Networks for Small Footprint Speaker Recognition 论文笔记 (d-vector)中的进一步优化(主要针对模型大小),作者在输入层和第一个隐藏层采用了局部连接(locally-connected)或者CNN的连接方式,而非全连接(fully-connected)的方式,在模型大小减小70%的情况下,保持模型性能基本不变,在保持相同的模型大小情况下,可以得到8%的EER下降。
  End-to-End Text-Dependent Speaker Verification 论文笔记 d-vector进一步优化,将打分的过程集成到网络里面,并对网络结构进行一定的优化,实现了一个有效、高精度、易于维护、small footprint的speaker verification system。
2014 DEEP NEURAL NETWORKS FOR SMALL FOOTPRINT TEXT-DEPENDENT SPEAKER VERIFICATION 论文笔记 DNN + Cosine distance实现speaker verification。以speaker作为分类目标,将所有隐藏层当做特征提取器,最后一个隐藏层的输出作为特征,采用average pooling,生成d-vector,采用cosine distance得到score/confidence.实验表明可以取得和ivector+PLDA相媲美的结果,但在噪声环境下有更好的鲁棒性,对这两种方法做score fusion,在安静和噪声环境下,分别可以得到14%和25%的EER下降。
  NOVEL SCHEME FOR SPEAKER RECOGNITION USING A PHONETICALLY-AWARE DEEP NEURAL NETWORK 论文笔记 作者提出了一个DNN-ivector的方案用于speaker recognition,并取得的比ivector更好的性能。
  Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication    
  From single to multiple enrollment i-vectors: Practical PLDA scoring variants for speaker verification    
2012 A Small Footprint i-Vector Extractor    
  ROBUST SPEAKER RECOGNITION BASED ON LATENT VARIABLE MODELS    
2011 Analysis of I-vector Length Normalization in Speaker Recognition Systems    
2010 Front-End Factor Analysis for Speaker Verification    
Before 2010 Bottleneck Features for Speaker Recognition 论文笔记 借鉴语音识别的bottlenec feature,用于GMM-UBM的训练。实验结果显示可以取得接近GMM-UBM的性能。使用类似Teacher-Student的方案,将Baseline的GMM-UBM方案(未详细说明是GMM-UBM 还是 ivector)作为Teacher训练bottleneck网络,在同mic/不同mic的场景下,EER可以分别获得14%和18%的下降。
  Bottleneck Features for Speaker Recognition    
  A Straightforward and Efficient Implementation of the Factor Analysis Model for Speaker Verification    
  Probabilistic Linear Discriminant Analysis for Inferences About Identity    
  Probabilistic Linear Discriminant Analysis    
  Joint Factor Analysis versus Eigenchannels in Speaker Recognition    
  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms    
  A Study of Inter-Speaker Variability in Speaker Verification    
  Eigenvoice Modeling With Sparse Training Data    
  Support Vector Machines using GMM Supervectors for Speaker Verification    
  Speaker Verification Using Adapted Gaussian Mixture Models 论文笔记 作者提出了一个基于GMM-UBM结果的speaker verification system,并得到了不错的结果。

Software

Datasets

Other learning materials

Products