Here I’ll post all the papers, articles, books, courses, YT videos, reels or tiktoks that I’ve consumed and consider important to remember.

All my posts will have this one common reference/bibliography section.

Papers (Code “PA”)

  1. Turing Paper (1936) Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1),
    230-265. https://doi.org/10.1112/plms/s2-42.1.230.
  2. McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5(4), 115-133. https://doi.org/10.1007/BF02478259
  3. Long Short-Term Memory (LSTM original) Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
  4. Learning Phrase Representations using RNN Encoder-Decoder Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp.
    1724-1734). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179
  5. Neural Machine Translation (Attention Mechanism) Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on
    Learning Representations, ICLR 2015. arXiv:1409.0473. https://arxiv.org/abs/1409.0473
  6. Sequence to Sequence Learning Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems
    (Vol. 27, pp. 3104-3112). Curran Associates, Inc. https://arxiv.org/abs/1409.3215
  7. Generative Adversarial Networks Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In
    Advances in Neural Information Processing Systems (Vol. 27, pp. 2672-2680). Curran Associates, Inc. https://arxiv.org/abs/1406.2661
  8. A Decomposable Attention Model Parikh, A., Täckström, O., Das, D., & Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In Proceedings of the 2016
    Conference on Empirical Methods in Natural Language Processing (pp. 2249-2255). Association for Computational Linguistics.
    https://doi.org/10.18653/v1/D16-1244
  9. Attention is All You Need (Transformers) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 6000-6010). Curran Associates, Inc. https://arxiv.org/abs/1706.03762
  10. LSTM: A Search Space Odyssey Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks
    and Learning Systems, 28(10), 2222-2232. https://doi.org/10.1109/TNNLS.2016.2582924
  11. From Turing to Transformers Cheok, A. D., & Zhang, E. Y. (2023). From Turing to transformers: A comprehensive review and tutorial on the evolution and applications of generative
    transformer models.
    Sci, 5(4), 46. https://doi.org/10.3390/sci5040046
  12. Hierarchical Reasoning Model HRM Team. (2025). Hierarchical reasoning model. arXiv:2506.21734. https://arxiv.org/abs/2506.21734

Books (Code “BO”)

  1. Roger Penrose – Emperor’s New Mind
  2. Roger Penrose – Shadows of the Mind
  3. Aldous Huxley – The Doors of Perception
  4. Beyond AI – J. Storrs Hall, Phd
  5. Mind and matter – Erwin Schrödinger
  6. The black swan – Nassim Nicholas Taleb

— Articles (Code “AR”)

  1. The Unreasonable Effectiveness of Recurrent Neural Networks
  2. Understanding LSTM Networks

— Courses (Code “CO”) —

  1. Hugging Face Agent Course
  2. Hugging Face MCP Course

Posted in

Leave a comment