Word2vec

Word2vec is a group of related algorithms that are used to produce word embeddings. These embeddings are vectors of real numbers that represent the meaning of words. Word2vec algorithms are trained on a large corpus of text data, and they learn to represent words in a way that captures their semantic relationships to other words.

While Word2vec is primarily designed for whole-word embeddings, there are various approaches and modifications that enable it to handle tokenized text. These methods involve either incorporating subword information or training the model on pre-tokenized data. Additionally, alternative embedding techniques specifically designed for tokenized text, such as FastText and BPE, offer effective solutions for representing both whole words and their constituent parts.

See Also: Tokenization, Vector Space, Vector Database, Word level embedding

<< Return to Glossary

Word2vec

Subscribe to our newsletter

Subscribe to our newsletter