This study proposes and evaluates a novel hybrid ensemble model that combines AdaBoost and Gradient Boosting for short-term electricity consumption forecasting. The model is designed to address the challenges posed by nonlinear load fluctuations influenced by meteorological and operational factors, which often lead to reduced forecasting accuracy, grid instability, and inefficient resource utilization. To enhance prediction performance, the dataset undergoes comprehensive preprocessing, including removal of missing target values, median imputation of feature gaps, and standardization for linear and SVR models. An 80/20 train-test split with a fixed random seed ensures reproducibility. Baseline models—Linear Regression, SVR, Random Forest, Gradient Boosting, and AdaBoost—alongside hybrid configurations such as Gradient Boosting + Random Forest and a two-stage voting ensemble, are developed using the scikit-learn framework. The proposed hybrid model integrates AdaBoost and Gradient Boosting within a VotingRegressor architecture, with manually tuned ensemble weights ranging from 0.2 to 0.8 to optimize the R² score. Experimental results indicate that the hybrid AdaBoost + Gradient Boosting model achieves the best overall performance (R² = 0.153, RMSE = 61.888, Accuracy = 77.34%), outperforming all other models. The study’s key contributions include an effective weight-tuning strategy for ensemble learning, empirical validation through quantitative and visual analyses, and practical guidelines for deploying hybrid ensemble models in real-world energy forecasting systems.
PROPOSED METHOD
This research adopts an experimental quantitative methodology to investigate the effectiveness of a hybrid ensemble model combining AdaBoost and Gradient Boosting for short-term energy consumption forecasting. The methodological workflow comprises four main stages: (1) data preprocessing, (2) dataset partitioning, (3) model development—including both baseline and hybrid models—and (4) model evaluation using standard performance metrics. Each stage is designed to ensure reproducibility, robustness, and fair comparison across models.
The proposed method introduces a systematically designed hybrid ensemble framework that integrates AdaBoost and Gradient Boosting within a weighted VotingRegressor. The model aims to optimize short-term energy consumption forecasting by capturing both nonlinear interactions and difficult-to-predict fluctuations through adaptive ensemble learning. The approach comprises four main components: dataset partitioning, preprocessing, base model initialization, and hybrid model construction with manual ensemble weight tuning.
A. Dataset Partitioning
Let D={(xi,yi)|i=1,2,3,…n} represent the original dataset, where xi denotes the feature vector and yi is the target energy consumption. The dataset is randomly split into training and testing subsets using an 80:20 ratio. A fixed random_state = 42 (the favorite number for some geeks ;-)) is applied to ensure reproducibility across experiments. The training set is used exclusively for model learning, while the test set is reserved for out-of-sample evaluation.
B. Preprocessing Pipeline
Prior to model training, data preprocessing is performed to enhance model robustness and stability:
- Target Cleansing: All rows with missing values in the target variable y are removed to eliminate label noise.
- Feature Imputation: Missing values in input features are imputed using the median of each respective column. Median imputation is chosen for its resilience against skewness and outliers.
- Feature Standardization: For models sensitive to feature scale—namely, Linear Regression and SVR feature values are standardized using the z-score formula as in (1)
RESULTS AND DISCUSION
A. Comparative Model Performance
The results are summarized in Table II, which shows that the hybrid AdaBoost + Gradient Boosting ensemble consistently outperforms all other models, achieving the highest R² score (0.153), the lowest RMSE (61.888), and a competitive accuracy level of 77.34%. This performance suggests that the hybrid approach successfully captures the nonlinear and volatile nature of short-term energy consumption patterns, particularly due to the complementary strengths of AdaBoost’s adaptive weighting and Gradient Boosting’s sequential error correction.
The three top-performing models are all hybrid ensembles, reaffirming the hypothesis that multi-algorithmic integration enhances forecasting capability in nonlinear time series data. In contrast, all linear models (Linear Regression, Lasso, Ridge, ElasticNet) exhibit negative R² scores, reflecting their poor fit to the complex fluctuation patterns inherent in energy consumption data.
B. Model Comparisons
Figure 3 presents a comparative bar chart of the R² scores across all evaluated models. The figure clearly illustrates the performance hierarchy, with the Hybrid AdaBoost + Gradient Boosting ensemble achieving the highest R² value (0.153), thereby outperforming all other models in terms of variance explanation. This is followed closely by the Voting GB + (AdaBoost + GB) ensemble and the GB + RF hybrid, both registering identical R² scores (0.134). The fourth-best performer is the standalone Gradient Boosting Regressor, which, although not hybridized, maintains a competitive R² of 0.083. In stark contrast, all linear models—including Linear Regression, Ridge, Lasso, and ElasticNet—yield negative R² scores, indicating that these models perform worse than a naive mean predictor. The bar chart thereby reinforces the central claim of this study: hybrid ensemble methods significantly improve predictive accuracy and model generalization in short-term energy forecasting tasks, especially in the presence of nonlinear load fluctuations.
Download Full Paper

0 comments:
Post a Comment