I loved the article's clarity. It helped me wrap my head around how each development in NLP built on the prior's problems in a sequential history.

One question: you cite the difficulty of transfer learning with the LSTM as a core flaws, emphasizing the difference between the style of texts as making pretrained embeddings difficult. However, I think that problem was generally alleviated with ULMFiT, in which we fine-tune the Wikipedia pretrained embeddings with a language model on the corpus we are working with. Consequently, we can get really good embeddings even before we train the model on the intended task. Would you agree with me that the LSTM's drawbacks lie much more in its attention span than its capability for transfer learning? I would love your take on this.

Thinking about AI & epistemology. Researching CV & ML as published Assistant Researcher. Studying CS @ Columbia Engineering.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Mehdi

Thinking about AI & epistemology. Researching CV & ML as published Assistant Researcher. Studying CS @ Columbia Engineering.