IMPACT OF SAMPLING TECHNIQUES ON EXPLORATORY LANDSCAPE ANALYSIS
DOI:
https://doi.org/10.18522/2311-3103-2026-1-%25pKeywords:
Evolutionary algorithms, parameter tuning, landscape analysisAbstract
Exploratory Landscape Analysis (ELA) features are numerical descriptors of a problem's fitness landscape, often used to recommend optimal algorithm parameters. This study investigates the critical impact of sampling methods and sample size on the approximation of ELA features and the subsequent performance of machine learning models. The research demonstrates that these feature approximations are not absolute characteristics of the landscape but are significantly influenced by the method used to generate the sample points. While increasing the sample size reduces the variance of feature estimates, the choice of sampling strategy itself introduces substantial bias, leading to statistically different feature values across methods like Mersenne Twister, Latin Hypercube Sampling (LHS), and Faure sequences. The core experiment involved predicting the parameters of the tunable W-model problem using regression models trained on ELA features. The results showed that models trained and tested on data from the same sampling method performed best, highlighting a lack of interoperability between different sampling techniques. Notably, the Faure quasirandom sequences consistently yielded the lowest regression error, outperforming common methods like uniform random sampling and LHS. Furthermore, cross-sampling validation revealed that models, especially those trained on Faure sequences, suffered a significant performance drop when tested on data from any other method, confirming that the sampling strategy imparts a specific "fingerprint" on the feature data. In conclusion, the findings challenge the default use of common sampling methods in ELA. The accuracy of machine learning models for algorithm selection and configuration is highly sensitive to the sampling strategy employed for feature extraction. Therefore, ensuring consistency between the sampling methods used during model training and application is crucial. The superior performance of Faure sequences suggests that low-discrepancy sequences are a promising avenue for future research in making ELA-based models more robust and accurate
References
1. Pikalov M.V., Pis'merov A.M. Nastroyka parametrov geneticheskogo algoritma pri pomoshchi analiza landshafta funktsii prisposoblennosti i mashinnogo obucheniya [Tuning genetic algorithm parameters us-ing fitness landscape analysis and machine learning], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2024, No. 2 (238), pp. 221-228.
2. Lobo F.J., Lima C.F., Michalewicz Z. (ed.). Parameter setting in evolutionary algorithms. Springer Sci-ence & Business Media, 2007, Vol. 54.
3. Mersmann O. et al. Exploratory landscape analysis, Proceedings of the 13th annual conference on Ge-netic and evolutionary computation, 2011, pp. 829-836.
4. Ochoa G., Malan K. Recent advances in fitness landscape analysis, Proceedings of the genetic and evo-lutionary computation conference companion, 2019, pp. 1077-1094.
5. Kerschke P., Trautmann H. Automated algorithm selection on continuous black-box problems by com-bining exploratory landscape analysis and machine learning.
6. Kerschke P., Trautmann H. The R-Package FLACCO for exploratory landscape analysis with applica-tions to multi-objective optimization problems, 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2016, pp. 5262-5269.
7. Mersmann O., Preuss M., Trautmann H. Benchmarking evolutionary algorithms: Towards exploratory landscape analysis, 2010.
8. Huang C., Li Y., Yao X. A survey of automatic parameter tuning methods for metaheuristics, IEEE transactions on evolutionary computation, 2019, Vol. 24, No. 2, pp. 201-216.
9. Weise T., Wu Z. Difficult features of combinatorial optimization problems and the tunable w-model benchmark problem for simulating them, Proceedings of the Genetic and Evolutionary Computation Conference Companion, 2018, pp. 1769-1776.
10. McDonald G.C. Ridge regression, Wiley Interdisciplinary Reviews: Computational Statistics, 2009, Vol. 1, No. 1, pp. 93-100.
11. Friedman J.H. Greedy function approximation: a gradient boosting machine, Annals of statistics, 2001, pp. 1189-1232.
12. Breiman L. Random forests, Machine learning, 2001, Vol. 45, No. 1, pp. 5-32.
13. Matsumoto M., Nishimura T. Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator, ACM Transactions on Modeling and Computer Simulation (TOMACS), 1998, Vol. 8, No. 1, pp. 3-30.
14. Marsaglia G. Random numbers fall mainly in the planes, Proceedings of the National Academy of sci-ences, 1968, Vol. 61, No. 1, pp. 25-28.
15. Helton J.C., Davis F.J. Latin hypercube sampling and the propagation of uncertainty in analyses of com-plex systems, Reliability Engineering & System Safety, 2003, Vol. 81, No. 1, pp. 23-69.
16. Huntington D.E., Lyrintzis C.S. Improvements to and limitations of Latin hypercube sampling, Probabil-istic engineering mechanics, 1998, Vol. 13, No. 4, pp. 245-253.
17. Faure H., Lemieux C. Generalized Halton sequences in 2008: A comparative study, ACM Transactions on Modeling and Computer Simulation (TOMACS), 2009, Vol. 19, No. 4, pp. 1-31.
18. Wang X., Hickernell F.J. Randomized halton sequences, Mathematical and Computer Modelling, 2000, Vol. 32, No. 7-8, pp. 887-899.
19. Faure H., Lemieux C. Generalized Halton sequences in 2008: A comparative study, ACM Transactions on Modeling and Computer Simulation (TOMACS), 2009, Vol. 19, No. 4, pp. 1-31.
20. Pitzer E., Affenzeller M. A comprehensive survey on fitness landscape analysis, Recent advances in intelligent engineering systems, 2012, pp. 161-191








