Abstract:
The surge in high-frequency financial data has transformed the landscape of market forecasting, driving a shift toward machine learning (ML) integration. This research provides a critical performance assessment of Linear Regression (LR), Random Forest (RF), and XGBoost in predicting the closing prices of premier securities listed on the Pakistan Stock Exchange (PSX). Utilizing a decadal dataset (2014–2024), the investigation encompasses the KSE-100 benchmark index and high-volatility assets including OGDCL, Lucky Cement, and Fauji Fertilizer. The models were constructed using Python-based frameworks and rigorously validated through Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and the Coefficient of Determination (R^2).The empirical findings reveal a notable "simplicity paradox" within the PSX environment. Linear Regression demonstrated remarkable predictive precision, consistently yielding R^2 values exceeding 0.99 across the majority of the portfolio. In sharp contrast, the advanced ensemble methods—Random Forest and XGBoost—proved highly susceptible to overfitting, particularly in the context of the KSE-100 Index, where they produced negative R^2 values (-0.01 and -0.02 respectively). This indicates that such models struggle to distinguish signal from noise in frontier markets characterized by non-linear economic shocks. The study concludes that for short-term forecasting in the Pakistani context, parsimonious linear models provide superior interpretability and structural stability. These results offer actionable intelligence for risk managers and investors, advocating for the strategic use of objective, data-driven tools to navigate emerging market volatility.