Aviation-Specific Large Language Model Fine-Tuning and LLM-as-a-Judge Evaluation

Kathleen Ge; William J Coupe

This paper presents a scalable approach to improve domain-specific understanding by developing an aviation-focused large language model (LLM). While recent LLM advancements, like supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF),have improved general conversational abilities, these methods are costly and difficult to scale in aviation due to limited labeled data. To address these challenges, we applied a self-supervised fine-tuning approach to provide responses to aviation-related query generation without requiring human-labeled question-answer pairs. During fine-tuning, we incorporated the LLM-as-a-judge framework to automatically identify the optimal training parameter combinations. Additionally, to evaluate model performance, we leveraged the same LLM-as-a-judge framework by utilizing a larger LLM to assess the responses generated by both the base and fine-tuned models on aviation-specific questions. This methodology provides a scalable, automated, and high-quality solution for domain-specific language modeling in aviation.

Document ID

20250005989

Acquisition Source

Ames Research Center

Document Type

Conference Paper

Authors

Date Acquired

June 6, 2025

Subject Category

Meeting Information

Meeting: AIAA AVIATION Forum

Location: Las Vegas, NV

Country: US

Start Date: July 21, 2025

End Date: July 25, 2025

Sponsors: American Institute of Aeronautics and Astronautics

Funding Number(s)

Distribution Limits

Public

Work of the US Gov. Public Use Permitted.

Technical Review

NASA Technical Management

Keywords

Available Downloads

Name

Type

AIAA_Aviation_Paper.pdf

STI

No Preview Available

NTRS

NTRS - NASA Technical Reports Server

Available Downloads

Related Records