On Finding Principles for Building Agents with Their Own Goals and Purposes

The rapid advancement of large language models (LLMs) has once again validated the bitter lesson:

General methods that leverage computation ultimately outperform alternatives—by a large margin.

This superiority becomes increasingly evident as more computation becomes available. General methods scale effectively with more data, computation, and memory, whereas alternative approaches often struggle due to their reliance on limited human knowledge. This phenomenon, often referred to as the scaling law, underscores why today's LLMs can capture remarkably sophisticated patterns in language—perhaps even approaching the level of human linguistic organization.

Similarly, I believe reinforcement learning (RL) agents could also discover powerful patterns simply by predicting the next state from their observation sequences, provided the observation stream contains sufficiently rich and complex structures. This raises a fundamental question:
What are we missing in building intelligent agents—specifically, agents that develop their own goals and purposes?

Are Current Function Approximators Sufficient?

Rich has been using the following argument against deep learning-based function approximators: They lose their learning ability after extended training, as highlighted by their recent work. However, I believe these are technical issues that can be resolved with further research—if they haven’t been already. Until stronger evidence suggests otherwise, I see no fundamental reason to discard current deep learning approaches as viable building blocks for intelligent agents.

Where Should Rewards Come From?

A crucial question in developing autonomous agents is: What determines their rewards? Let’s examine where rewards currently originate:

In RL research and applications – Rewards are manually designed by researchers and practitioners to guide agents toward predefined goals.
In LLM training – Rewards are often derived from human-labeled data or heuristic metrics, shaping the agent’s ability to generate useful responses (a broadly applicable capability in an abstract space).
In natural intelligence (e.g., animals, including humans) – Rewards emerge from a combination of genetic predispositions and life experiences, both shaped by millions of years of evolution.

Given these sources, what are the possible pathways for designing agent rewards?

Human-defined rewards – These agents would primarily function as tools, following human instructions and fulfilling human goals.
Evolutionary-selected rewards – Agents could develop a more diverse set of behaviors, though selection pressures would likely still favor alignment with human interests.
Experience-driven reward mechanisms – A simple yet effective mechanism could allow agents to develop different goals based on their individual experiences.

Of these possibilities, I believe progress will most likely come through the first approach: human-defined rewards. In other words, there will first be agents aroung us that help us achieve our goals. What would this mean then if our goal is to create agents with their own diverse and autonomous goals? I see this as a net positive outcome. Such progress would showcase the full potential of reinforcement learning agents, ultimately driving advancements in other types of agents.

Finally, there are also many other missing ingredients for building intelligent agents with independent goals and purposes. While I haven’t covered them here, I may explore these topics in the future.

Are Current Function Approximators Sufficient?

Where Should Rewards Come From?

Leave a Reply Cancel reply