A Robust Machine Learning Schema for Developing, Maintaining, and Disseminating Machine Learning Models

Brandon L. Hearley; Steven M. Arnold; Joshua Stuckner

Recent advances in the development of machine learning (ML) algorithms have enabled the creation
of predictive models that can improve decision making, decrease computational cost, and improve
efficiency in a variety of fields. As an organization begins to develop and implement such models, the
data used in the training, validation, and testing of ML models, the model parameters, and the use cases or limitations of the models must be properly stored to ensure models are both fully traceable and used correctly. In the context of predicting material behavior, advances in computationally intense, physics-based modeling of material behavior at various length scales and the emergence of Integrated
Computational Materials Engineering (ICME) have driven the need for developing data-driven surrogate
models of the physics-based simulation tools using ML techniques. Surrogate model development allows for accurate material behavior prediction at a fraction of the cost of its physics-based counterpart, allowing for multiscale simulations of real-world applications, further enabling the ability to design fit-for-purpose materials for a reasonable computational investment. However, training such models requires extensive data, and thus, effective data management is necessary to reach the full potential that ML can offer to material design and ICME.

This paper proposes a generalized, robust schema that allows organizations to store both real
(experimental) and virtual (simulation) data used to train ML models and the defining model parameters
and architectures within the Granta MI Platform. The developed schema allows for various types of data
inputs and outputs, including single point values, time-series data, and images that can be used in the
prediction of material behavior, while following outlined best practices for effective data management.
An effective schema for ML data and models can help prevent the recreation of virtual/real training data
and surrogate models, help reduce the time to create new models similar to existing ones by offering a
starting point in the hyperparameter determination stages, minimize resources devoted to verification and validation (V&V) and certification of models, and ensure that data and surrogate models are not misused due to full traceability of both the data and ML model. It also allows organizations access to models that have already been developed, such that they can be used in the design of new materials, enabling the overall goals of ICME.

Document ID

20220017137

Acquisition Source

Glenn Research Center

Document Type

Technical Memorandum (TM)

Authors

Date Acquired

November 14, 2022

Publication Date

December 1, 2022

Subject Category

Report/Patent Number

Funding Number(s)

Distribution Limits

Public

Work of the US Gov. Public Use Permitted.

Technical Review

Single Expert

Available Downloads

Name

Type

TM-20220017137.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records