Here I’ll post all the papers, articles, books, courses, YT videos, reels or tiktoks that I’ve consumed and consider important to remember.
All my posts will have this one common reference/bibliography section.
— Papers (Code “PA”) —
- Turing Paper (1936) Turing, A. M. (1937). On computable numbers, with an application to the Entscheidungsproblem. Proceedings of the London Mathematical Society, s2-42(1),
230-265. https://doi.org/10.1112/plms/s2-42.1.230. - McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5(4), 115-133. https://doi.org/10.1007/BF02478259
- Long Short-Term Memory (LSTM original) Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735-1780. https://doi.org/10.1162/neco.1997.9.8.1735
- Learning Phrase Representations using RNN Encoder-Decoder Cho, K., van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., & Bengio, Y. (2014). Learning phrase representations using RNN encoder-decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp.
1724-1734). Association for Computational Linguistics. https://doi.org/10.3115/v1/D14-1179 - Neural Machine Translation (Attention Mechanism) Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural machine translation by jointly learning to align and translate. In 3rd International Conference on
Learning Representations, ICLR 2015. arXiv:1409.0473. https://arxiv.org/abs/1409.0473 - Sequence to Sequence Learning Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. In Advances in Neural Information Processing Systems
(Vol. 27, pp. 3104-3112). Curran Associates, Inc. https://arxiv.org/abs/1409.3215 - Generative Adversarial Networks Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In
Advances in Neural Information Processing Systems (Vol. 27, pp. 2672-2680). Curran Associates, Inc. https://arxiv.org/abs/1406.2661 - A Decomposable Attention Model Parikh, A., Täckström, O., Das, D., & Uszkoreit, J. (2016). A decomposable attention model for natural language inference. In Proceedings of the 2016
Conference on Empirical Methods in Natural Language Processing (pp. 2249-2255). Association for Computational Linguistics.
https://doi.org/10.18653/v1/D16-1244 - Attention is All You Need (Transformers) Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. In Advances in Neural Information Processing Systems (Vol. 30, pp. 6000-6010). Curran Associates, Inc. https://arxiv.org/abs/1706.03762
- LSTM: A Search Space Odyssey Greff, K., Srivastava, R. K., Koutník, J., Steunebrink, B. R., & Schmidhuber, J. (2017). LSTM: A search space odyssey. IEEE Transactions on Neural Networks
and Learning Systems, 28(10), 2222-2232. https://doi.org/10.1109/TNNLS.2016.2582924 - From Turing to Transformers Cheok, A. D., & Zhang, E. Y. (2023). From Turing to transformers: A comprehensive review and tutorial on the evolution and applications of generative
transformer models. Sci, 5(4), 46. https://doi.org/10.3390/sci5040046 - Hierarchical Reasoning Model HRM Team. (2025). Hierarchical reasoning model. arXiv:2506.21734. https://arxiv.org/abs/2506.21734
— Books (Code “BO”) —
- Roger Penrose – Emperor’s New Mind
- Roger Penrose – Shadows of the Mind
- Aldous Huxley – The Doors of Perception
- Beyond AI – J. Storrs Hall, Phd
- Mind and matter – Erwin Schrödinger
- The black swan – Nassim Nicholas Taleb
— Articles (Code “AR”) —
— Courses (Code “CO”) —

Leave a comment