NaturalLanguageProcessingandComputationa.epub - (EPUB全文下载)
文件大小:1.41 mb。
文件格式:epub 格式。
书籍内容:
To get the most out of this book
Follow the listed steps and commands to prepare the system environment:
Python:
Most, if not all, OS come installed with Python. It is already available on Windowns, Ubuntu 14.04 onwards, and macOS
If not, please follow the official wiki documentation: https://wiki.python.org/moin/BeginnersGuide/Download
This is a good time to start migrating all of the code to Python 3.6 (http://python3statement.org/). By 2020, a lot of scientific computing packages (such as NumPy) will be dropping support for python 2.
spaCy:
pip install spacy
Gensim:
pip install gensim
Keras:
pip install keras
scikit-learn:
pip install scikit-learn
Word2Vec, Doc2Vec, and Gensim
We have previously talked about vectors a lot throughout the book – they are used to understand and represent our textual data in a mathematical form, and the basis of all the machine learning methods we use rely on these representations. We will be taking this one step further, and use machine learning techniques to generate vector representations of words that better encapsulate the meaning of a word. This technique is generally referred to as word embeddings, and Word2Vec and Doc2Vec are two popular variations of these.
Word2Vec
Doc2Vec
Other word embeddings
Topic Models
Until now, we dealt with computational linguistics algorithms and spaCy, and we understood how to use these computational linguistic algorithms to annotate our data, as well as understand sentence structure. While these algorithms helped us understand the finer details of our text, we still didn't get a big picture of our data - what kind of words appear more often than others in our corpus? Can we group our data or find underlying themes? We will be attempting to answer these questions and more in this chapter. Following are the topics we will cover in this chapter:
What are topic models?
Topic models in Gensim
Topic models in scikit-learn
References
[1] Latent Semantic Analysis:
https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing
[2] Gensim:
https://radimrehurek.com/gensim/
[3] Latent Dirichlet Allocation:
http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
[4] Introduction to LDA:
http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-allocation/
[5] Explanation of LDA:
https://www.quora.com/What-is-a-good-explanation-of-Latent-Dirichlet-Allocation
[6] Probabilistic Topic Models:
http://www.cs.columbia.edu/~blei/papers/Blei2012.pdf
[7] Ju ............
书籍插图:
以上为书籍内容预览,如需阅读全文内容请下载EPUB源文件,祝您阅读愉快。
书云 Open E-Library » NaturalLanguageProcessingandComputationa.epub - (EPUB全文下载)