NASA Logo

NTRS

NTRS - NASA Technical Reports Server

Back to Results
Bayesian Model Selection for Reducing Bloat and Overfitting in Genetic Programming for Symbolic RegressionWhen performing symbolic regression using genetic programming, overfitting and bloat can negatively impact generalizability and interpretability of the resulting equations as well as increase computation times. A Bayesian fitness metric is introduced and its impact on bloat and overfitting during population evolution is studied and compared to common alternatives in the literature. The proposed approach was found to be more robust to noise and data sparsity in numerical experiments, guiding evolution to a level of complexity appropriate to the dataset. Further evolution of the population resulted not in overfitting or bloat, but rather in slight simplifications in model form. The ability to identify an equation of complexity appropriate to the scale of noise in the training data was also demonstrated. In general, the Bayesian model selection algorithm was shown to be an effective means of regularization which resulted in less bloat and overfitting when any amount of noise was present in the training data.
Document ID
20220009748
Acquisition Source
Langley Research Center
Document Type
Poster
Authors
Geoffrey F. Bomarito
(Langley Research Center Hampton, Virginia, United States)
Patrick E. Leser
(Langley Research Center Hampton, Virginia, United States)
Nolan C.M. Strauss
(University of Utah Salt Lake City, Utah, United States)
Karl M. Garbrecht
(University of Utah Salt Lake City, Utah, United States)
Jacob D. Hochhalter
(University of Utah Salt Lake City, Utah, United States)
Date Acquired
June 23, 2022
Subject Category
Mathematical And Computer Sciences (General)
Meeting Information
Meeting: Genetic and Evolutionary Computation Conference (GECCO)
Location: Boston, MA
Country: US
Start Date: July 9, 2022
End Date: July 13, 2022
Sponsors: Association for Computing Machinery Special Interest Group on Genetic and Evolutionary Computation (SIGEVO)
Funding Number(s)
WBS: 981698.03.04.23.55
CONTRACT_GRANT: NNX13AJ46A
CONTRACT_GRANT: 80LARC17C0003
Distribution Limits
Public
Copyright
Use by or on behalf of the US Gov. Permitted.
Technical Review
Single Expert
Keywords
Uncertainty quantification
symbolic regression
bayesian model selection
No Preview Available