Automated Knowledge Discovery From Simulators

Burl, Michael; DeCoste, Dennis; Mazzoni, Dominic; Scharenbroich, Lucas; Enke, Brian; Merline, William

The auto‑search feature has been disabled based on user feedback. Enter a search term/phrase and click “Search” to begin.

Back to Results

Automated Knowledge Discovery From Simulators

A computational method, SimLearn, has been devised to facilitate efficient knowledge discovery from simulators. Simulators are complex computer programs used in science and engineering to model diverse phenomena such as fluid flow, gravitational interactions, coupled mechanical systems, and nuclear, chemical, and biological processes. SimLearn uses active-learning techniques to efficiently address the "landscape characterization problem." In particular, SimLearn tries to determine which regions in "input space" lead to a given output from the simulator, where "input space" refers to an abstraction of all the variables going into the simulator, e.g., initial conditions, parameters, and interaction equations. Landscape characterization can be viewed as an attempt to invert the forward mapping of the simulator and recover the inputs that produce a particular output. Given that a single simulation run can take days or weeks to complete even on a large computing cluster, SimLearn attempts to reduce costs by reducing the number of simulations needed to effect discoveries. Unlike conventional data-mining methods that are applied to static predefined datasets, SimLearn involves an iterative process in which a most informative dataset is constructed dynamically by using the simulator as an oracle. On each iteration, the algorithm models the knowledge it has gained through previous simulation trials and then chooses which simulation trials to run next. Running these trials through the simulator produces new data in the form of input-output pairs. The overall process is embodied in an algorithm that combines support vector machines (SVMs) with active learning. SVMs use learning from examples (the examples are the input-output pairs generated by running the simulator) and a principle called maximum margin to derive predictors that generalize well to new inputs. In SimLearn, the SVM plays the role of modeling the knowledge that has been gained through previous simulation trials. Active learning is used to determine which new input points would be most informative if their output were known. The selected input points are run through the simulator to generate new information that can be used to refine the SVM. The process is then repeated. SimLearn carefully balances exploration (semi-randomly searching around the input space) versus exploitation (using the current state of knowledge to conduct a tightly focused search). During each iteration, SimLearn uses not one, but an ensemble of SVMs. Each SVM in the ensemble is characterized by different hyper-parameters that control various aspects of the learned predictor - for example, whether the predictor is constrained to be very smooth (nearby points in input space lead to similar output predictions) or whether the predictor is allowed to be "bumpy." The various SVMs will have different preferences about which input points they would like to run through the simulator next. SimLearn includes a formal mechanism for balancing the ensemble SVM preferences so that a single choice can be made for the next set of trials.

Document ID

20100002826

Acquisition Source

Jet Propulsion Laboratory

Document Type

Other - NASA Tech Brief

External Source(s)

http://www.embeddedtechmag.com/component/content/article/2035

Authors

Date Acquired

August 25, 2013

Publication Date

July 1, 2007

Publication Information

Publication: NASA Tech Briefs, July 2007

Subject Category

Report/Patent Number

Distribution Limits

Public

Public Use Permitted.

Available Downloads

Name

Type

20100002826.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records