COMPARATIVE ANALYSIS OF PREDICTION ALGORITHMS ON TABULAR DATA: LINEAR REGRESSION, RANDOM FOREST, GRADIENT BOOSTING, AND MLP NEURAL NETWORKS

Abstract

This work presents a comparative analysis of four machine learning algorithms applied to a supervised regression problem on real tabular data: Linear Regression, Random Forest, Gradient Boosting, and Multilayer Perceptron (MLP) Neural Network. The study adopts a methodological protocol, including preprocessing with strict partition isolation, K-Fold cross-validation, and differentiated feature scaling per model. The models are evaluated by the coefficient of determination (R²) and the mean squared error (MSE). The results demonstrate the superiority of the non-linear methods: Random Forest and MLP achieved R² ≈ 0.66, compared to R² ≈ 0.18 for Linear Regression. The findings provide empirical support for algorithmic selection in regression tasks and point to perspectives for future work with hyperparameter optimization and extension to multiple datasets.

References

BREIMAN, Leo. Random forests. Machine Learning, New York, v. 45, n. 1, p. 5-32, out. 2001 DOI: https://doi.org/10.1023/A:1010933404324 DOI: https://doi.org/10.1023/A:1010933404324

BREIMAN, Leo. Bagging predictors. Machine Learning, New York, v. 24, n. 2, p. 123-140, ago. 1996 DOI: https://doi.org/10.1007/BF00058655 DOI: https://doi.org/10.1023/A:1018054314350

FRIEDMAN, Jerome H. Greedy function approximation: a gradient boosting machine. The Annals of Statistics, Beachwood, v. 29, n. 5, p. 1189-1232, out. 2001 DOI: https://doi.org/10.1214/aos/1013203451 DOI: https://doi.org/10.1214/aos/1013203451

GÉRON, Aurélien. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. 3. ed. Sebastopol: O'Reilly Media, 2025.

HASTIE, Trevor; TIBSHIRANI, Robert; FRIEDMAN, Jerome. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2. ed. New York: Springer, 2009. DOI: https://doi.org/10.1007/978-0-387-84858-7

KUHN, Max; JOHNSON, Kjell. Applied Predictive Modeling. New York: Springer, 2013. DOI: https://doi.org/10.1007/978-1-4614-6849-3

RUMELHART, David E.; HINTON, Geoffrey E.; WILLIAMS, Ronald J. Learning representations by back-propagating errors. Nature, London, v. 323, n. 6088, p. 533-536, out. 1986 DOI: https://doi.org/10.1038/323533a0 DOI: https://doi.org/10.1038/323533a0

WOLPERT, David H. The Lack of A Priori Distinctions Between Learning Algorithms. Neural Computation, Cambridge, v. 8, n. 7, p. 1341-1390, 1996. DOI: https://doi.org/10.1162/neco.1996.8.7.1341

How to Cite

Preuss, E. ., Argenta, I. P., Franciscatto, R., & Pertile, S. D. L. . (2026). COMPARATIVE ANALYSIS OF PREDICTION ALGORITHMS ON TABULAR DATA: LINEAR REGRESSION, RANDOM FOREST, GRADIENT BOOSTING, AND MLP NEURAL NETWORKS. RECIMA21 - Revista Científica Multidisciplinar - ISSN 2675-6218, 7(7), e778276. https://doi.org/10.47820/recima21.v7i7.8276