Awesome Speaker Diarization

Awesome Speaker Diarization

Table of contents

Overview

本文用于记录一些经典的话者分离相关的论文、数据集、开源软件工具等资源,便于后续的回顾及使用。

本文会长期持续更新。。。

Publications

Time Paper Paper Note Abstract
2017 Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings 论文笔记 采用RCNN提取speaker embedding,相比于Baseline,DER有相对30%的下降。
  Speaker Change Detection in Broadcast TV using Bidirectional Long Short-Term Memory Networks 论文笔记 将SCD(Speaker change detection)任务视为序列标注任务,采用Bi-LSTM网络,相比于传统的BIC等方法,具有较大的提升。
  pyannote. metrics: a toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems   A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems
  Speaker Diarization Using Convolutional Neural Network for Statistics Accumulation Refinement 论文笔记 CNN做SCD(Speaker change detection),并将输出的speaker change probability作用到ivector提取的统计量计算上,从而产生更加准确的ivector描述,实验证明DER有约16%的下降。
  SPEAKER DIARIZATION USING DEEP NEURAL NETWORK EMBEDDINGS 论文笔记 JHU的后续方案([1][2],kaldi recipe),采用DNN Based embedding替换ivector
  SPEAKER DIARIZATION WITH LSTM 论文笔记 Google 的Speaker Diarization方案: dvector + spectral clustering
2016 A Speaker Diarization System for Studying Peer-Led Team Learning Groups 论文笔记 独特的场景设计(PLTL,学生小组学习的会议场景,每个学生随身携带录音设备),针对每一路信号,SAD去除non-speech,采用GMM + 类BIC准则做分割,Hausdorff 距离做聚成2类,基于能量区分主说话人和其他说话人,结合多路信号的结果综合finetune得到最终结果。
2015 DIARIZATION RESEGMENTATION IN THE FACTOR ANALYSIS SUBSPACE 论文笔记 SPEAKER DIARIZATION WITH PLDA I-VECTOR SCORING AND UNSUPERVISED CALIBRATION的基础上,采用VB(Variational Bayes) resegmentation,进一步提升性能。
2014 A Study of the Cosine Distance-Based Mean Shift for Telephone Speech Diarization 论文笔记 Recipe: ivector ==> BCCN(between class covariance normalization) ==> LN(length normalization) ==> PCA ==> Mean shift based clustering ==> Resegmentation
  SPEAKER DIARIZATION WITH PLDA I-VECTOR SCORING AND UNSUPERVISED CALIBRATION 论文笔记 kaldi recipe: ivector + Plda
  Artificial neural network features for speaker diarization 论文笔记 训练二分类ANN(输入:two segments,输出:same/different speaker),然后采用ANN的bottleneck feature 和 MFCC 分别作为segment的feature,构建两组GMM,最终发射概率加权求和用于HMM做segmentation,Modified BIC准则做clustering
2013 Unsupervised methods for speaker diarization: An integrated and iterative approach    
2011 PLDA-based Clustering for Speaker Diarization of Broadcast Streams 论文笔记 GMM做SAD, BIC 准则做分割,引入ivector + plda做聚类
  SPEAKER DIARIZATION OF MEETINGS BASED ON SPEAKER ROLE N-GRAM MODELS 论文笔记 引入n-gram语言模型的思想,对会议中的参会任何的发言顺序按n-gram的思想建模,并引入到聚类过程中,有效提升聚类结果。
  Unsupervised methods for speaker diarization: An integrated and iterative approach    
  Artificial neural network features for speaker diarization    
before 2010 Speaker Diarization for Meeting Room Audio 论文笔记 1. 会议场景 2. TODA(Time Difference of Arrival) feature 3. BIC for clustering
  Stream-based speaker segmentation using speaker factors and eigenvoices 论文笔记 stream 方式做话者分离,基于本征音向量 + GMM/HMM 做segmentation和 clustering。
  The LIA-EURECOM RT‘09 Speaker Diarization System 论文笔记 基于GMM-HMM的自顶向下聚类做话者分离的典型方案。
  E-HMM approach for learning and adapting sound models for speaker indexing    
  The ICSI RT07s Speaker Diarization System 论文笔记 基于GMM-HMM的自底向上聚类做话者分离的典型方案。
  An Overview of Automatic Speaker Diarization Systems 论文笔记 2006年的关于话者分离的综述,较为详细的分析了那个时候的话者分离的各种方案,并在RT evolution中比较了各个方案的性能。
  Robust Speaker Diarization for meetings    
  A Spectral Clustering Approach to Speaker Diarization 论文笔记 BIC做segmentation,GMM建模+KL距离做初步聚类,spectral clustering进行二次聚类,然后Cross EM进行refine
  Improved speaker segmentation and segments clustering using the bayesing information criterion 论文笔记 BIC(Basysian Information Criterion)用于做分割和聚类的经典论文

Software

Speaker Diarization

Tools Language Description
VB diarization Python Based on Bayesian Hidden Markov Model
pyannote-metrics Python A toolkit for reproducible evaluation, diagnostic, and error analysis of speaker diarization systems

Datasets

Other learning materials

Products