Deep Neural Networks as Dynamical Systems: Expressivity, Controllability, and High Dimensional Learning

Adrian Kovarik

doi:10.64917/ajdsml/V01I01-005

Open Access

Deep Neural Networks as Dynamical Systems: Expressivity, Controllability, and High Dimensional Learning

https://doi.org/10.64917/ajdsml/V01I01-005

PDF

Adrian Kovarik

Department of Applied Mathematics, Charles University, Czech Republic

Abstract

Deep neural networks have evolved from heuristic pattern recognition tools into mathematically grounded systems whose theoretical understanding spans approximation theory, geometry, dynamical systems, and optimal control. This article develops a unified theoretical framework that interprets classical and modern neural architectures through the lenses of geometric separability, statistical learning, and dynamical systems theory. Drawing exclusively from foundational and contemporary contributions in machine learning, control theory, and high dimensional geometry, we provide a comprehensive synthesis of the evolution from perceptrons and support vector machines to residual networks and neural ordinary differential equations.

The study begins by examining early geometric formulations of classification, including linear threshold units and shattering properties, and situates these within modern capacity analysis. We then analyze multilayer feedforward networks in terms of universal approximation and storage capacity, addressing both width and depth considerations. Special attention is devoted to the power of depth and residual connections, emphasizing the reinterpretation of deep networks as discretized dynamical systems.

A central contribution of this work is an integrative exploration of neural ordinary differential equations and their mean field optimal control formulations. We explain how continuous depth models unify discrete architectures and reveal new insights into controllability, interpolation, and long time behavior. The mean field perspective connects parameter learning with population level dynamics, clarifying the role of measure theoretic interpolation and turnpike phenomena in training trajectories.

We further investigate geometric and topological perspectives, including manifold learning and invertible architectures, demonstrating how controllability conditions determine expressive power in neural ODE frameworks. Stability considerations and identity preserving structures are analyzed to explain empirical success in deep training regimes. Optimization landscape properties, stochastic gradient methods, and automatic differentiation are contextualized within this broader dynamical view.

Finally, the article synthesizes classical statistical learning theory with modern transformer based dynamics and cluster formation in self attention systems, positioning deep learning as a theory of measure evolution under learned flows. By integrating insights from approximation theory, control, geometry, and statistical learning, this work provides a publication ready theoretical narrative that clarifies both the mathematical foundations and the emerging research directions of deep learning as a dynamical systems discipline.

Keywords

Deep learning, neural ordinary differential equations, universal approximation

References

📄 1. Bengio, Y., Ducharme, R., Vincent, P., and Janvin, C. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137 to 1155, 2003.

📄 2. Boland, R. P., and Urrutia, J. Separating collections of points in Euclidean spaces. Information Processing Letters, 53(4):177 to 183, 1995.

📄 3. Bonnet, B., Cipriani, C., Fornasier, M., and Huang, H. A measure theoretical approach to the mean field maximum principle for training NeurODEs. Nonlinear Analysis, 227:113161, 2023.

📄 4. Breiman, L. Random forests. Machine Learning, 45(1):5 to 32, 2001.

📄 5. Chen, R. T. Q., Rubanova, Y., Bettencourt, J., and Duvenaud, D. Neural ordinary differential equations. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, 6572 to 6583, 2018.

📄 6. Cheng, J., Li, Q., Lin, T., and Shen, Z. Interpolation, approximation, and controllability of deep neural networks. SIAM Journal on Control and Optimization, 63(1):625 to 649, 2025.

📄 7. Cortes, C., and Vapnik, V. Support vector networks. Machine Learning, 20(3):273 to 297, 1995.

📄 8. Cover, T. M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Transactions on Electronic Computers, EC 14(3):326 to 334, 1965.

📄 9. Cover, T. M. The number of linearly inducible orderings of points in d space. SIAM Journal on Applied Mathematics, 15(2):434 to 439, 1967.

📄 10. Cuchiero, C., Larsson, M., and Teichmann, J. Deep neural networks, generic universal interpolation, and controlled ODEs. SIAM Journal on Mathematics of Data Science, 2(3):901 to 919, 2020.

📄 11. Donoho, D. High dimensional data analysis: The curses and blessings of dimensionality. AMS Mathematical Challenges Lecture, 1 to 32, 2000.

📄 12. Duch, W. K separability. In ICANN 06 Proceedings of the 16th International Conference on Artificial Neural Networks, Springer, 188 to 197, 2006.

📄 13. E, W. A proposal on machine learning via dynamical systems. Communications in Mathematical Statistics, 5:1 to 11, 2017.

📄 14. E, W., Han, J., and Li, Q. A mean field optimal control formulation of deep learning. Research in the Mathematical Sciences, 6(1):10, 2018.

📄 15. Elamvazhuthi, K., Zhang, X., Oymak, S., and Pasqualetti, F. Learning on manifolds: Universal approximation properties using geometric controllability conditions for neural ODEs. Proceedings of the 5th Annual Learning for Dynamics and Control Conference, 211:1 to 11, 2023.

📄 16. Eldan, R., and Shamir, O. The power of depth for feedforward neural networks. JMLR Workshop and Conference Proceedings, 49:1 to 34, 2015.

📄 17. Esteve Yague, C., and Geshkovski, B. Sparsity in long time control of neural ODEs. Systems and Control Letters, 172:105452, 2023.

📄 18. Fisher, R. A. The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7(7):179 to 188, 1936.

📄 19. Freimer, R., Mitchell, J. S. B., and Piatko, C. On the complexity of shattering using arrangements. Technical Report, Cornell University, 1991.

📄 20. Geshkovski, B., Letrouit, C., Polyanskiy, Y., and Rigollet, P. The emergence of clusters in self attention dynamics. Advances in Neural Information Processing Systems, 36:57026 to 57037, 2023.

📄 21. Geshkovski, B., Rigollet, P., and Ruiz Balet, D. Measure to measure interpolation using transformers. arXiv:2411.04551, 2024.

📄 22. Geshkovski, B., and Zuazua, E. Turnpike in optimal control of PDEs, ResNets, and beyond. Acta Numerica, 31:135 to 263, 2022.

📄 23. Haber, E., and Ruthotto, L. Stable architectures for deep neural networks. Inverse Problems, 34(1):014004, 2017.

📄 24. Hardt, M., and Ma, T. Identity matters in deep learning. International Conference on Learning Representations, 1:1627 to 1640, 2017.

📄 25. He, K., Zhang, X., Ren, S., and Sun, J. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, 770 to 778, 2016.

📄 26. Houle, M. F. Algorithms for weak and wide separation of sets. Discrete Applied Mathematics, 45(2):139 to 159, 1993.

📄 27. Huang, G. B. Learning capability and storage capacity of two hidden layer feedforward networks. IEEE Transactions on Neural Networks, 14(2):274 to 281, 2003.

📄 28. Huang, S. C., and Huang, Y. F. Bounds on the number of hidden neurons in multilayer perceptrons. IEEE Transactions on Neural Networks, 2(1):47 to 55, 1991.

📄 29. Ishikawa, I., Teshima, T., Tojo, K., Oono, K., Ikeda, M., and Sugiyama, M. Universal approximation property of invertible neural networks. Journal of Machine Learning Research, 24(287):1 to 68, 2023.

📄 30. Kingma, D. P., and Ba, J. Adam: A method for stochastic optimization. arXiv:1412.6980, 2014.

📄 31. LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278 to 2324, 1998.

📄 32. LeCun, Y., Bengio, Y., and Hinton, G. Deep learning. Nature, 521:436 to 444, 2015.

📄 33. Li, Q., Chen, L., Tai, C., and E, W. Maximum principle based algorithms for deep learning. Journal of Machine Learning Research, 18(165):1 to 29, 2018.

📄 34. Li, Q., Lin, T., and Shen, Z. Deep learning via dynamical systems: An approximation perspective. Journal of the European Mathematical Society, 25(5):1671 to 1709, 2022.

📄 35. Lin, H., and Jegelka, S. ResNet with one neuron hidden layers is a universal approximator. Advances in Neural Information Processing Systems, 31:6169 to 6178, 2018.

📄 36. Massaroli, S., Poli, M., Park, J., Yamashita, A., and Asama, H. Dissecting neural ODEs. Advances in Neural Information Processing Systems, 33:3952 to 3963, 2020.

📄 37. Mumford, D., Fogarty, J., and Kirwan, F. Geometric Invariant Theory. Springer, 1994.

📄 38. Nguyen, Q., and Hein, M. Optimization landscape and expressivity of deep CNNs. Proceedings of the 35th International Conference on Machine Learning, 3730 to 3739, 2018.

📄 39. Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., and Lerer, A. Automatic differentiation in PyTorch. NIPS 2017 Workshop on Autodiff, 2017.

📄 40. Rosenblatt, F. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6):386, 1958.

Views: 0 Downloads: 0

Views

Downloads

American Journal of Data Science and Machine Learning

Deep Neural Networks as Dynamical Systems: Expressivity, Controllability, and High Dimensional Learning

Abstract

Keywords

References