Build a Large Language Model (From Scratch) Reading Note
Chapter 1 GPT是自回归模型:像GPT这样的解码器式模型,它们生成文本的方式是一次预测一个词 (one word at a time)。也就是说,它会根据已经生成的词语序列来预测下一个最可能出现的词语,然后将这个预测到的词语加入到序列中,再进行下一步的预测,如此循环。这种依赖于自身先前输出进行下一步预测的特性,使得这类模型被称为“自回归模型”(Autoregressive model)。 Chapter 2 Word2Vec trained neural network architecture to generate word embeddings by predicting the context of a word given the target word or vice versa. The main idea behind Word2Vec is that words that appear in similar contexts tend to have similar meanings. LLMs commonly produce their own embeddings that are part of the input layer and are updated during training. The advantage of optimizing the embeddings as part of the LLM training instead of using Word2Vec is that the embeddings are optimized to the specific task and data at hand. ...