A Systems Theoretic Perspective of the Outer Alignment Problem

EasyChair Preprint no. 11431

13 pagesDate: December 1, 2023


The problem of ensuring that an artificial intelligence (AI) system’s objectives and actions match the intended outcome is known as the problem of alignment. Alignment can be divided into inner alignment, which relates to how well a system accomplishes its function, and outer alignment, which relates to the alignment of a system with the human values and preferences that underlie the system’s purpose and goals. Outer alignment is a challenging and important problem for AI systems, especially as they become more complex and advanced in their interactions with their environment, increasing the possibility for emergent behavior that may not be anticipated or desired by the system’s stakeholders. This paper seeks to bridge the gap between the AI and systems engineering fields by demonstrating why outer alignment is a systems engineering problem and articulating it using systems theory. This paper formally defines inner alignment, outer alignment, and emergent behavior by building on the past work in systems theory.. A motivating example of a personalized AI companion that is based on GPT4 or other Extremely Large Language Model (ELLM) AI product is presented to showcase misaligned emergent behavior. This paper also discusses the sources and types of undesired emergences that could arise from such a product and proposes a possible framework for creating an aligned AI system that involves a human on the loop system that leverages preference specification language.

Keyphrases: AI ethics, alignment, emergence, outer alignment, perverse incentives, scientific foundations of systems engineering, SE4AI, systems theory, theory of systems engineering

