COMPARATIVE ANALYSIS OF PREDICTION ALGORITHMS ON TABULAR DATA: LINEAR REGRESSION, RANDOM FOREST, GRADIENT BOOSTING, AND MLP NEURAL NETWORKS
Abstract
This work presents a comparative analysis of four machine learning algorithms applied to a supervised regression problem on real tabular data: Linear Regression, Random Forest, Gradient Boosting, and Multilayer Perceptron (MLP) Neural Network. The study adopts a methodological protocol, including preprocessing with strict partition isolation, K-Fold cross-validation, and differentiated feature scaling per model. The models are evaluated by the coefficient of determination (R²) and the mean squared error (MSE). The results demonstrate the superiority of the non-linear methods: Random Forest and MLP achieved R² ≈ 0.66, compared to R² ≈ 0.18 for Linear Regression. The findings provide empirical support for algorithmic selection in regression tasks and point to perspectives for future work with hyperparameter optimization and extension to multiple datasets.
References
BREIMAN, Leo. Random forests. Machine Learning, New York, v. 45, n. 1, p. 5-32, out. 2001 DOI: https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324
BREIMAN, Leo. Bagging predictors. Machine Learning, New York, v. 24, n. 2, p. 123-140, ago. 1996 DOI: https://doi.org/10.1007/BF00058655 DOI: https://doi.org/10.1023/A:1018054314350
FRIEDMAN, Jerome H. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, Beachwood, v. 29, n. 5, p. 1189-1232, out. 2001 DOI: https://doi.org/10.1214/aos/1013203451 DOI: https://doi.org/10.1214/aos/1013203451
GÉRON, Aurélien. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 3. ed. Sebastopol: O'Reilly Media, 2025.
HASTIE, Trevor; TIBSHIRANI, Robert; FRIEDMAN, Jerome. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2. ed. New York: Springer, 2009. DOI: https://doi.org/10.1007/978-0-387-84858-7
KUHN, Max; JOHNSON, Kjell. Applied Predictive Modeling. New York: Springer, 2013. DOI: https://doi.org/10.1007/978-1-4614-6849-3
RUMELHART, David E.; HINTON, Geoffrey E.; WILLIAMS, Ronald J. Learning representations by back-propagating errors. Nature, London, v. 323, n. 6088, p. 533-536, out. 1986 DOI: https://doi.org/10.1038/323533a0 DOI: https://doi.org/10.1038/323533a0
WOLPERT, David H. The Lack of A Priori Distinctions Between Learning Algorithms. Neural Computation, Cambridge, v. 8, n. 7, p. 1341-1390, 1996. DOI: https://doi.org/10.1162/neco.1996.8.7.1341
