【Github分享】語音交互、NLP相關資源分享

2021-02-14 語音雜談

導語:本文是關於語音交互和NLP相關的代碼的論文、語料庫、代碼、項目、教學等資源連結。讀完本文需要10分鐘。

該資源是在GitHub上ID為【mxer】的小夥伴分享的語音方向的資源,以下為原地址:https://github.com/mxer/awesome-speech#1.2

以下為目錄,如果喜歡,可以複製連結進行跳轉:

1.page

Xingyu Na

LanguageProcessing and Pattern Recognition in University of Aachen

Fernando de laCalle Silos

2.open source library/toolbox/code

HTK

Py2HTK

parallel-htk

HTK_C_MATLAB_tools

Kaldi:

Kaldi官方文檔(中文版)

Kaldi models

Corpus PhoneticsTutorial

py-kaldi-asr

https://github.com/pykaldi/pykaldi

https://github.com/gooofy/py-kaldi-asr

https://github.com/UFAL-DSG/pykaldi

https://github.com/janchorowski/kaldi-python

Dan's DNNimplementation:

pytorch-kaldi

kaldi-lstm

kaldi-ctc

keras-kaldi

python wrapperfor kaldi-online-decoder

Kaldi+PDNN

tfkaldi

Kaldi_CNTK_AMI

kaldi-io-for-python

kaldi-pyio

kaldi-tree-conv

kaldi-ivector

kaldi-yesno-tutorial

Kaldi nnet3 教程

Josh Meyer'sWebsite

Adapting your ownLanguage Model for Kaldi

Some Kaldi Notes

http://jrmeyer.github.io/asr/2016/02/01/Kaldi-notes.html

http://sentiment-mining.blogspot.com/

http://pages.jh.edu/~echodro1/tutorial/kaldi/

kaldi_tutorial

Online decoderfor Kaldi NNET2 and GMM speech recognition models with Pythonbindings

ResNet-Kaldi-Tensorflow-ASR

Kaldi ASR:Extending the ASpIRE model

FastCGI supportfor Kaldi ASR

alignUsingKaldi

kaldi-readers-for-tensorflow

kaldi-iot

lattice-info

lattice-char-to-word

lattice-word-length-distribution

kaldi-lattice-word-index

kaldi-decoders

lattice-remove-ctc-blank

kaldi-lattice-search

htk2kaldi

parallel-kaldi

kaldi在線中文識別系統搭建

kaldi-docker

CSLT-Sparse-DNN-Toolkit

featxtra

Sphinx

https://cmusphinx.github.io/

https://github.com/cmusphinx

https://github.com/cmusphinx/pocketsphinx

OpenFst

http://www.openfst.org/twiki/bin/view/FST/WebHome

https://github.com/UFAL-DSG/openfst

https://github.com/benob/openfst-utils

https://github.com/vchahun/pyfst

MIT SpokenLanguage Systems

Julius

Bavieca

Simon

SIDEKIT

SRILM

https://www.sri.com/engage/products-solutions/sri-language-modeling-toolkit

http://www.speech.sri.com/projects/srilm/

https://github.com/nuance1979/srilm-python

https://github.com/njsmith/pysrilm

awd-lstm-lm

ISIP

MIT Finite-StateTransducer (FST) Toolkit

MIT LanguageModeling (MITLM) Toolkit

OpenGrm

RNNLM

http://www.fit.vutbr.cz/~imikolov/rnnlm/

https://github.com/IntelLabs/rnnlm

https://github.com/glecorve/rnnlm2wfst

faster-rnnlm

CUED-RNNLMToolkit

Using RNNLMrescoring a sentence in Chinese ASR system

KenLM

rwthlm

word-rnn-tensorflow

tensorlm

SpeechRecognition

SpeechPy

Aalto

google-cloud-speech

apiai

https://pypi.org/project/apiai/

wit

Nabu

asr-study

dejavu

uSpeech

Juicer

PMLS

dragonfly

SPTK

https://github.com/r9y9/SPTK

https://github.com/sp-nitech/SPTK

http://sp-tk.sourceforge.net/

pysptk

RWTH ASR

Palaver

Praat

SpeechRecognition Grammar Specification

Automatic_Speech_Recognition

speech-to-text-wavenet

tensorflow-speech-recognition

tensorflow_end2end_speech_recognition

tensorflow_speech_recognition_demo

AVSR-Deep-Speech

TTS and ASR

CTC + TensorflowExample for ASR

tensorflow-ctc-speech-recognition

speechT

end2endASR

NADU

DTW (Dynamic TimeWarping) python module

Various scriptsand tools for speech recognition model building

基於深度學習的語音識別系統,使用CNN、LSTM和CTC實現的中文語音識別系統

tacotron_asr

ASR_Keras

Kaggle TensorflowSpeech Recognition Challenge

Speechrecognition script for Asterisk that uses google's speech engine

Libraries andscripts for manipulating and handling ASR output/n-bests/etc

Some scripts andcommands for working with ASR

PySpeechGrammar

Python module forevaluating ASR hypotheses

edit-distance

3.dataset

VoxForge

http://www.voxforge.org/home

http://www.voxforge.org/zh

http://www.repository.voxforge1.org/downloads/SpeechCorpus/Trunk/Audio/Main/16kHz_16bit/

ASR Audio DataLinks

The CMUPronouncing Dictionary

TIMIT

https://catalog.ldc.upenn.edu/LDC93S1

https://github.com/syhw/timit_tools

https://github.com/philipperemy/timit

GlobalPhoneLanguage Models

1 Billion WordLanguage Model Benchmark

DaCiDian-Develop

AISHELL

CC-CEDICT

https://www.mdbg.net/chinese/dictionary?page=cc-cedict

TED-LIUM

open-asr-lexicon

4.Tutorial

University ofEdinburgh ASR2017-18

stanford CS224s

NYU asr12

SpeechRecognition with Neural Networks

page

CSTR-Edinburgh

open source library/toolbox

WORLD

HTS

http://hts.sp.nitech.ac.jp/

http://hts-engine.sourceforge.net/

https://github.com/shamidreza/HTS-demo_CMU-ARCTIC-SLT-Formant

https://github.com/MattShannon/HTS-demo_CMU-ARCTIC-SLT-STRAIGHT-AR-decision-tree

Tacotron

https://github.com/Kyubyong/tacotron

https://github.com/Kyubyong/expressive_tacotron

https://github.com/keithito/tacotron

https://github.com/GSByeon/multi-speaker-tacotron-tensorflow

https://github.com/r9y9/tacotron_pytorch

https://github.com/soobinseo/Tacotron-pytorch

Tacotron2

https://github.com/NVIDIA/tacotron2

https://github.com/riverphoenix/tacotron2

https://github.com/A-Jacobson/tacotron2

https://github.com/selap91/Tacotron2

https://github.com/LGizkde/Tacotron2_Tao_Shujie

https://github.com/rlawns1016/Tacotron2

https://github.com/CapstoneInha/Tacotron2-rehearsal

Merlin

mozilla TTS

Flite

Speect

Festival

eSpeak

nnmnkwii

Ossian

gTTS

gnuspeech

supercollider

sc3-plugins

Neural_Network_Voices

pggan-pytorch

cainteoir-engine

loop

nnmnkwii

TTS and ASR

musa_tts

marytts(JAVA)

1.open source library/toolbox

Alize

speaker-recognition-py3

openVP

2.Genderrecognition by voice and speech analysis

page

NTU

Tsung-Hsien Wen

open source library/toolbox

PyDial

alex

ROS 語音交互系統

結合ROS框架的中文語音交互系統

1.Speech Processing

madmom

pydub

kapre: KerasAudio Preprocessors

BTK

EspNet

Signal-Processing

pyroomacoustics

librosa

REAPER

MSD_split_for_tagging

VOICEBOX

liquid-dsp

ffts

mir_eval

aupyom

Pitch Detection

TFTB

maracas

SRMRpy

ssp

iss

asr_preprocessing

asrt

Audio superresolution using NN

RNN training fornoise reduction in robust asr

RNN for audionoise reduction

muda

Efficient samplerate conversion in python

Smarc audio rateconverter

Python scripts tocomputes f0s of a wave file

2.Audio I/O

PortAudio

audiolab

pytorch audio

Digital SpeechDecoder

audioread

audacity.py

3.Sound Source Separation

HARK

Deep RNN forSource Separation

nussl

DNN for MusicSource Separation in Tensorflow

Alexey Ozerov

University ofSurrey CVSSP

source separationusing CNN

4.FeatureExtraction

openSMILE

veles.sound_feature_extraction

vamp-plugin-sdk

Yaafe

py_bank

AuditoryFilterbanks

python_speech_features

VAD

https://github.com/jtkim-kaist/VAD

https://github.com/jtkim-kaist/VAD_DNN

https://github.com/marsbroshok/VAD-python

https://github.com/shiweixingcn/vad

https://github.com/fedden/RenderMan

rVAD

Aurora 2 VAD

IsraelCohen

Python interfaceto the WebRTC Voice Activity Detector

1.code/tool/data

cmusphinx

julius-speech

OpenSLR

List of speechrecognition software

KTH

VERBIO

timeview

Speech at CMU WebPage

CMU Robust SpeechGroup

Speech Softwareat CMU

Aalto SpeechResearch

CMU FestvoxProject

CSTR

Xiph

Brno Universityof Technology Speech Processing Group

SoX

STRAIGHT

Idiap ResearchInstitute

Transcriber

Amirsina Torfi

The SpeechRecognition Virtual Kitchen

SparseRepresentation & Dictionary Learning Algorithms with Applicationsin Denoising, Separation, Localisation and Tracking

Audacity

beetbox

CAQE

UCL Speech FilingSystem

Ryuichi Yamamoto

Kyubyong Park

HideyukiTachibana

Colin Raffel

Paul Dixon

smacpy

c4dm

Matt Shannon

Keunwoo Choi

ADASP

uchicago Speechand Language @ TTIC

justin salamon

COLEA

openAUDIO

Praat

librosa

Essentia

timmahrt

Lefteris Zafiris

audio-to-audioand audio-to-midi alignment

DNN based hotwordand wake word detection toolkit

free-spoken-digit-dataset

中文語言資源聯盟

Institute ofFormal and Applied Linguistics – Dialogue Systems Group

https://github.com/UFAL-DSG

https://github.com/edobashira/speech-language-processing

https://github.com/andabi?tab=repositories

https://code.soundsoftware.ac.uk/projects

2.tutorial

DL for ComputerVision, Speech, and Language

臺大數位語音處理概論

IISc SpeechInformation Processing

paper

https://arxiv.org/search/?query=speech&searchtype=all&source=header

https://www.isca-speech.org/iscaweb/index.php/archive/online-archive

https://www.aclweb.org/anthology/

https://github.com/zzw922cn/awesome-speech-recognition-speech-synthesis-papers

states of thearts and recent results (bibliography) on speech recognition

Dan Povey

cmusphinx

CMU LanguageTechnologies Institute

CMU SPEECH@SV

MitsubishiElectric Research Laboratorie

MIT SpokenLanguage Systems

Brno Universityof Technology Speech Processing Group

IISc

uchicago Speechand Language @ TTIC

RWTH AachenUniversity

TOKUDA andNANKAKU LABORATORY

Institute ofFormal and Applied Linguistics – Dialogue Systems Group

Ohio StateUniversity speech separation

LEAP Laboratory

Hainan Xu

Mark Gales

Karen Livescu

Shubham Toshniwal

Adrien Ycart

Ron Weiss

Yajie Miao

Scott T Wisdom

Alan W Black

Amirsina Torfi

Liang Lu

Zhizheng WU

justin salamon

Karen Livescu

Shubham Toshniwal

Keith Vertanen

Aviv Gabbay

Mehryar Mohri

Jonathan LE ROUX

Suyoun Kim

DeepSound

Lei Xie

該資源是在GitHub上ID為【msgi】的小夥伴分享的NLP相關的資源,以下為原地址:https://github.com/msgi/nlp-journey

以下為目錄,如果喜歡,可以複製連結進行跳轉:

https://github.com/msgi/nlp-journey/blob/master/docs/tools.md

https://github.com/msgi/nlp-journey/blob/master/docs/alg.md

https://github.com/msgi/nlp-journey/blob/master/docs/basic.md

https://github.com/msgi/nlp-journey/blob/master/docs/fq.md

https://github.com/msgi/nlp-journey/blob/master/docs/notes.md

https://pan.baidu.com/share/init?surl=sE_20nHCfej6f9yRaisz7Q

http://www.ituring.com.cn/book/1605

https://www.deeplearningbook.org/

http://neuralnetworksanddeeplearning.com/

https://nndl.github.io/

http://web.stanford.edu/~jurafsky/slp3/ed3book.pdf

http://cs224d.stanford.edu/

1.算法模型與優化

http://www.bioinf.jku.at/publications/older/2604.pdf

https://arxiv.org/pdf/1207.0580.pdf

https://arxiv.org/pdf/1512.03385.pdf

https://arxiv.org/pdf/1502.03167.pdf

2.綜述論文

https://arxiv.org/pdf/1812.08951.pdf

https://arxiv.org/pdf/1803.07133.pdf

3.語言模型

https://www.researchgate.net/publication/221618573_A_Neural_Probabilistic_Language_Model

https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

4.文本增強

https://arxiv.org/pdf/1901.11196.pdf

5.文本預訓練

https://arxiv.org/pdf/1301.3781.pdf

https://arxiv.org/pdf/1405.4053.pdf

地址https://arxiv.org/pdf/1607.04606.pdf

解讀https://www.sohu.com/a/114464910_465975

https://nlp.stanford.edu/projects/glove/

https://arxiv.org/pdf/1802.05365.pdf

https://arxiv.org/pdf/1810.04805.pdf

https://arxiv.org/pdf/1906.08101.pdf

https://arxiv.org/pdf/1906.08237.pdf

6.文本分類

https://arxiv.org/pdf/1510.03820.pdf

https://arxiv.org/pdf/1408.5882.pdf

https://www.aclweb.org/anthology/P16-2034

7.文本生成

https://arxiv.org/pdf/1805.06553.pdf

https://arxiv.org/pdf/1609.05473.pdf

https://arxiv.org/pdf/1605.05396.pdf

8.文本相似性

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.723.6492&rep=rep1&type=pdf

https://www.aclweb.org/anthology/W16-1617

9.短文本匹配

http://papers.nips.cc/paper/5019-a-deep-architecture-for-matching-short-texts.pdf

10.自動問答

https://arxiv.org/pdf/1801.08290.pdf

https://arxiv.org/pdf/1812.08989.pdf

https://arxiv.org/pdf/1702.01932.pdf

https://arxiv.org/pdf/1512.01337v1.pdf

https://arxiv.org/abs/1612.01627

https://arxiv.org/pdf/1806.09102.pdf

https://www.aclweb.org/anthology/P18-1103

11.機器翻譯

https://arxiv.org/pdf/1406.1078v3.pdf

https://arxiv.org/pdf/1706.03762.pdf

https://arxiv.org/pdf/1901.02860.pdf

12.自動摘要

https://arxiv.org/pdf/1704.04368.pdf

13.事件提取

https://pdfs.semanticscholar.org/ca70/480f908ec60438e91a914c1075b9954e7834.pdf

14.推薦系統

https://arxiv.org/pdf/1905.06874.pdf

必讀博文

https://jalammar.github.io/illustrated-transformer/

http://www.wildml.com/2016/01/attention-and-memory-in-deep-learning-and-nlp/

https://www.countbayesie.com/blog/2017/5/9/kullback-leibler-divergence-explained

https://blog.keras.io/building-autoencoders-in-keras.html

https://nlpoverview.com/

https://github.com/msgi/nlp-journey

https://www.cnblogs.com/rucwxb/p/10277217.html

https://zhuanlan.zhihu.com/p/49271699

https://zhuanlan.zhihu.com/p/70257427

https://mp.weixin.qq.com/s?__biz=MzI4MDYzNzg4Mw==&mid=2247488287&idx=2&sn=aa7b045337940886d5a7767f95ab0128&chksm=ebb42bcbdcc3a2ddcfb73fb77bead9655d6608b1a951a8b429fb2c38d56ca92289e97e6decd1&mpshare=1&scene=24&srcid=0930GzGGm3m7uZfJyblgWV3k&key=5b1b221b044835abb8ce952ed69e6acdfe5f30700caa3c560c8fe663354916c6753858e4dbbf1b4d1c2eded3876c67c0983d3d51324c321458405b0cacec9103640c28a7a5c068729172703bf23c0348&ascene=14&uin=Mjk3NzQ2NDczMQ%3D%3D&devicetype=Windows+10&version=62060833&lang=zh_CN&pass_ticket=uQSzwn38HjOIK%2BZwFf5AXCp%2Fk0QiE7budc%2Bl5t1yBFtOXA%2BPvSaFwqUWEwEmyZEd

https://github.com/msgi/nlp-journey/tree/master/nlp/embedding

fasttext(skipgram+cbow)

gensim(word2vec)

https://github.com/msgi/nlp-journey/blob/master/nlp/similaritybilstm+crf

https://github.com/CyberZHG/keras-gpt-2

https://github.com/jiangxinyang227/textClassifier

https://github.com/Lsdefine/attention-is-all-you-need-keras

https://github.com/miroozyx/BERT_with_keras

https://github.com/CyberZHG/keras-bert

https://github.com/iliaschalkidis/ELMo-keras

https://github.com/tyo-yo/SeqGAN

http://www.52nlp.cn/

https://kexue.fm/category/Big-Data

https://www.cnblogs.com/pinard/

https://tobiaslee.top/

https://github.com/msgi/nlp-journey

https://www.jiqizhixin.com/

https://colah.github.io/

https://zhpmatrix.github.io/

http://www.wildml.com/

http://www.shuang0420.com/

https://www.zybuluo.com/hanbingtao/note/433855

https://www.aclweb.org/portal/

https://www.emnlp-ijcnlp2019.org/

https://www.sheffield.ac.uk/dcs/research/groups/nlp/iccl/index#tab00

https://nips.cc/

https://www.aaai.org/

https://www.ijcai.org/

https://icml.cc/

相關焦點

  • 乾貨 | NLP、知識圖譜教程、書籍、網站、工具...(附資源連結)
    [ 導讀 ]本文作者一年前整理了這份關於 NLP 與知識圖譜的參考資源,涵蓋內容與形式也是非常豐富,接下來我們還會繼續努力,分享更多更好的新資源給大家,也期待能與大家多多交流,一起成長。揭開知識庫問答KB-QA的面紗(知識圖譜方面的系列專欄)https://zhuanlan.zhihu.com/kb-qa《語音與語言處理》第三版,NLP和語音合成方面的專著http://web.stanford.edu/~jurafsky/slp3/ed3book.pdfCIPS ATT 2017 文本分析和自然語言課程PPT
  • Awesome-Chinese-NLP:中文自然語言處理相關資料
    推薦Github上一個很棒的中文自然語言處理相關資料的Awesome資源:Awesome-Chinese-NLP ,Github連結地址,點擊文末
  • 【Github】nlp-roadmap:自然語言處理路相關路線圖(思維導圖)和關鍵詞(知識點)
    KEYWORD for students those who have interest in learning NLPGithub連結:https://github.com/graykode/nlp-roadmapgraykode/nlp-roadmap
  • 分享GitHub上一些嵌入式相關的高星開源項目
    現分享一些高星開源項目(像RT-Thread、AWTK等大家都熟知的就不介紹了):Avem 項目連結:https://github.com/avem-labs/Avem項目簡介:這是一個以物聯網項目為主方向分享web開發教程,製作並演示一個物聯網系統是怎麼跑起來的,介紹如何學習相關知識。
  • 一文學會最常見的10種NLP處理技術(附資源&代碼)
    以及相關資源和代碼。在這篇文章中,你將學習到最常見的10個NLP任務,以及相關資源和代碼。為什麼要寫這篇文章?因此,我決定將這些資源集中起來,打造一個對NLP常見任務提供最新相關資源的一站式解決方案。下方是文章中提到的任務列表及相關資源。那就一起開始吧。目錄:1. 詞幹提取2. 詞形還原3. 詞向量化4. 詞性標註5. 命名實體消岐6.
  • 【分享包】最全語音文本數據、工具包大分享,快來下載吧!(II)
    35.bert資源:bert論文中文翻譯:https://github.com/yuanxiaosc/BERT_Paper_Chinese_Translation文本分類實踐: https://github.com/NLPScott/bert-Chinese-classification-task
  • 【設計分享】人機互動的未來:語音交互和觸控交互
    ,語音助手浪潮首先在手機端應用掀起。經過五年的發展,原本通過手動的人機互動模式已衍生出許多口令式應用場景,與此同時,在炙手可熱的智能家居市場中,以語音交互為基礎的人機互動模式也逐步成為標準性功能配置。雖然,語音交互領域儼然被吹捧為又一智能藍海、智能領域的入口之一,國內外科技大佬也因此沒少在其投錢。
  • 一份超全的PyTorch資源列表(GitHub 2.2K Stars)
    ,該列表包含了與 PyTorch 相關的眾多庫、教程與示例、論文實現以及其他資源。在本文中,機器之心對各部分資源進行了介紹,感興趣的同學可收藏、查用。該部分項目涉及語音識別、多說話人語音處理、機器翻譯、共指消解、情感分類、詞嵌入/表徵、語音生成、文本語音轉換、視覺問答等任務,其中有一些是具體論文的 PyTorch 復現,此外還包括一些任務更廣泛的庫、工具集、框架。
  • NLP簡報(Issue#8)
    、博客文章、學習課程、求職相關等內容。Explosion[15]還在Stanza構建了一個包裝器,使你可以將其作為spaCy管道與Stanza模型進行交互。在我們以前的NLP簡報[26]中,我們還分享了一個與JAX相關的資源。5.2 NLP開發人員:單詞嵌入Rachael Tatman發布了一個名為「NLP for Developers」的系列教程,該系列涵蓋了如何應用各種NLP方法的最佳實踐。
  • 【2018最新版】 200個機器學習 && NLP && Python 相關教程
    近年來,機器學習等新最新技術層出不窮,如何跟蹤最新的熱點以及最新資源,作者Robbie Allen列出了一系列相關資源教程列表,包含四個主題:
  • 【數據】CMU大佬分享三類優質數據集:綜合、CV和NLP
    深耕AI脫水乾貨來源 |  知乎作者 |  攸寧編輯 | 九三山人編者按:數據是AI實施的必要條件,沒有數據的AI就是空想,而AI圈的數據很多都可以在網上免費獲取,非常方便,今天分享一個PS:以前我們也分享過一些數據集的資源,感興趣的可以在公眾號歷史文章中搜索查看,數據集系列也會持續更新。一、綜合性機器學習數據集1.
  • 資源 | 史丹福大學發布Stanford.NLP.NET:集合多個NLP工具
    連結:https://sergey-tihon.github.io/Stanford.NLP.NET/該項目包含使用使用 IKVM.NET 將 Stanford NLP.jar 軟體包重新編譯到.NET 中的構建腳本,這些軟體經過測試可以有效工作,該工具包的介紹網站是:https://sergey-tihon.github.io/Stanford.NLP.NET/
  • 2018,語音交互何去何從?
    」 在一次由 Geek2Startup 主辦的小型沙龍活動上,作為分享者的 Google Home 創始團隊成員張逸嘉如是破除了技術迷思。接入百度、考拉等大量音頻資源的叮咚音箱,在 2017 年末突然無法調用歌曲資源,客服表示「曲庫內容調整,部分資源暫時無法收聽」。
  • 支持53種語言預訓練模型,斯坦福發布全新NLP工具包StanfordNLP
    Github: https://github.com/stanfordnlp/stanfordnlp  Paper: https://nlp.stanford.edu/pubs/qi2018universal.pdf  PyPI: https://pypi.org/project/stanfordnlp/  以下內容介紹了
  • 寫給NLP研究者的編程指南
    關注關係抽取與知識圖譜的相關研究。來源 | 赤樂君的知乎專欄轉自 | AI科技大本營公眾號最近AllenNLP在EMNLP2018上做了一個主題分享,名為「寫給NLP研究者的編程指南」(Writing Code for NLP Research)。內容乾貨滿滿,僅僅只是看了slide就知道是非常有意思的一次演講了。
  • 自然語言處理任務相關經典論文、免費書籍、博客、tf代碼整理分享
    本資源整理了自然語言處理常見任務相關的文檔、論文和代碼,包括主題模型、word embedding、命名實體識別、文本分類、文本生成、文本相似性、機器翻譯等領域。所有代碼都在intensorflow 2.0中實現。
  • 打包帶走,競賽必備的NLP庫
    因此本周我們給大家整理了機器學習和競賽相關的NLP庫,方便大家進行使用,建議收藏本文。jieba是Python中的優秀的中文分詞第三方庫,通過幾行代碼就可以完成中文句子的分詞。jieba的分詞精度和性能非常優異,經常用來進行中文分詞的實驗對比。此外jieba還可以很方便的自定義詞典,使用起來非常靈活。
  • 【NLP】競賽必備的NLP庫
    本周我們給大家整理了機器學習和競賽相關的
  • 【福利第2彈】自然語言處理NLP知識資料大全集(一鍵下載!入門/進階/論文/Toolkit/數據/綜述/專家等)
    http://www.cs.ucsb.edu/~william William Wang(王威廉)經常在微博分享關於NLP的最近進展和趣事,幾乎每條都提供高質量的信息。Julia團隊主要在語音領域做一些研究。Michael Collins是從MIT離職後加入哥倫比亞NLP團隊的,其主要研究內容為機器翻譯和parsing。DaveBlei 和Daniel Hsu是機器學習領域翹楚,偶爾也會做一些語言相關的工作。
  • 【專知薈萃02】自然語言處理NLP知識資料大全集(入門/進階/論文/Toolkit/數據/綜述/專家等)(附pdf下載)
    http://www.cs.ucsb.edu/~william William Wang(王威廉)經常在微博分享關於NLP的最近進展和趣事,幾乎每條都提供高質量的信息。Julia團隊主要在語音領域做一些研究。Michael Collins是從MIT離職後加入哥倫比亞NLP團隊的,其主要研究內容為機器翻譯和parsing。DaveBlei 和Daniel Hsu是機器學習領域翹楚,偶爾也會做一些語言相關的工作。