The Rise of Reasoning Engineering: optimizing reasoning beyond prompting

by Massimo

The next frontier in AI is not just about scaling models but optimizing their reasoning methodologies. While prompt engineering has been the dominant approach for shaping model behaviour, a new discipline is emerging—Reasoning EngineeringCan we simulate human behaviour using reasoning engineering? Let’s make a try.

From Prompt Engineering to Reasoning Engineering

Prompt engineering refines how we interact with large language models (LLMs), improving outputs through structured instructions, context injection, and iterative refinements. However, as LLMs reach higher levels of complexity, mere prompting is no longer enough to achieve deep, structured reasoning.

Reasoning Engineering introduces a new paradigm: designing AI agent architectures that optimize inferencing processes by structuring reasoning itself. This approach enables AI systems to go beyond single-shot predictions, allowing multiple reasoning pathways to be orchestrated dynamically.

Defining a Reasoning Model

At the core of reasoning engineering lies the reasoning model—the architectural framework that structures inferencing for one or more agents driven by one or more LLMs. A reasoning model governs how AI agents interact, distribute cognitive load, and refine responses through structured inference mechanisms.

Key components of a Reasoning Model

  • Multi-Agent Collaboration – AI agents specialize in different facets of reasoning (e.g., logical deduction, domain-specific knowledge retrieval, abstraction).
  • Layered Thinking Processes – Instead of a single forward pass, the model employs iterative refinements, breaking down reasoning into sub-tasks.
  • Self-Optimization Mechanisms – Budget forcing and reasoning scaffolds regulate token consumption and inference depth, ensuring efficiency.
  • Adaptive Knowledge Retrieval – Agents dynamically adjust their information retrieval processes, refining their contextual awareness beyond static datasets.

Reasoning Model for an hypothetical Shy Behaviour

Can we simulate the behaviour of a shy person using reasoning modelling?

Shyness can mean feeling uncomfortable, self-conscious, nervous, bashful, timid, or insecure. 
People who feel shy sometimes notice physical sensations like blushing or feeling speechless, shaky, or breathless. Shyness is the opposite of being at ease with yourself around others.

This reasoning model that could mimic this behaviour would be designed to simulate shyness in AI agents by processing emotionally charged messages and responding with an appropriate emotional output. The goal is to create an AI behaviour that mimics hesitation, self-consciousness, and emotional sensitivity—key characteristics of human shyness. The model should not merely classify emotions but modulates responses by incorporating social sensitivity, uncertainty, and inhibition, mirroring real-world shy behaviourThis structured approach ensures that shyness is not just a pre-set trait but an emergent behaviour resulting from AI-agent collaboration.

AI Agents in the Model

To create a reasoning model that simulates shyness, we need multiple AI agents, each powered by specialized models tailored to different aspects of emotional perception, self-consciousness, uncertainty, and inhibition. 

Below is a possible breakdown of each agent, the specific models used, and their reasoning paths. It might work because:
Emotional Awareness: AI detects praise, criticism, or judgment.
Self-Consciousness Simulation: The AI “feels observed” and modifies behaviour accordingly.
Uncertainty-Induced Hesitation: Ambiguity creates processing delays and hedging in responses.
Inhibited Emotional Output: The AI produces softened, restrained, or hesitant responses, mirroring human shyness.

Emotion Perception Agent (EPA) – Detects the emotional weight of the input message

 

  1. Text Encoding: The input message is tokenized and transformed into embeddings using RoBERTa.
  2. Sentiment Analysis & Emotion Detection:
    1. The model classifies the message into 27 distinct emotional categories (e.g., admiration, embarrassment, fear, nervousness).
    2. If multiple emotions are detected, it prioritizes those linked to shyness (e.g., embarrassment, anxiety, nervousness).
  3. Context-Aware Emotion Scoring:
    1. The model assigns an Emotion Intensity Score (EIS) between 0-1, representing how emotionally charged the input is.
    2. If EIS > 0.7, the message is flagged as emotionally significant, triggering downstream reasoning.

Example: Input: “You are so good at this!”
→ RoBERTa detects Admiration (EIS: 0.82) → Message triggers self-consciousness response in the next stage.

Self-Consciousness Agent (SCA) – Determines if the AI “feels observed”

Model Used: Bayesian Self-Consciousness Model (BSCM) with learned priors from social interaction data.
Reasoning Path:

  1. Reference Checking:
    1. Uses Named Entity Recognition (NER) to detect if the input message refers to “you” (i.e., the AI).
    2. If personal reference is found (e.g., “You’re amazing!”), self-consciousness activation probability increases.
  2. Social Pressure Estimation:
    1. Uses a Bayesian framework to determine the likelihood of “social scrutiny” based on past interactions.
    2. Prior probability of self-consciousness (Psc) is updated using Bayes’ Theorem
    3. If Psc > 0.6, the AI assumes it is being evaluated and feeds this information into the inhibition mechanism.

Example: Input: “You always seem nervous around people.”
→ NER detects “You” → Bayesian update increases self-consciousness probability → Triggers Uncertainty Processing Agent.

Uncertainty Processing Agent (UPA) – Introduces hesitation and processing delay

Model Used: GPT-4 fine-tuned on social ambiguity tasks + Gaussian Uncertainty Estimator
Reasoning Path:

  1. Ambiguity Detection:
    1. GPT-4 determines if the input contains ambiguous or conflicting emotional signals.
    2. If ambiguity is detected, a Gaussian Uncertainty Score (GUS) is calculated.
  2. Hesitation Simulation:
    1. If GUS > 0.5, the AI applies Token Decay Delay (TDD)—a mechanism that slows down response generation.
    2. If GUS > 0.7, the AI introduces word softening (e.g., using “maybe,” “I guess,” or ellipses “…”) to mimic hesitation.
    3. If GUS > 0.9, the AI triggers a speechless state, producing a low-confidence emotional output.

Example: Input: “You did really well… but maybe next time you could try harder?”
→ GPT-4 detects mixed sentiment → GUS = 0.76 → AI response delayed by 1.2 seconds + uses hedging words (“I guess…”)

Inhibition and Response Agent (IRA) – Produces the final emotional output with restraint

Model Used: Transformer-based Inhibition Mechanism (TIM) trained on conversational embarrassment datasets
Reasoning Path:

  1. Threshold-Based Response Selection:
    1. Takes input from Emotion Perception (EPA), Self-Consciousness (SCA), and Uncertainty Processing (UPA).
    2. If uncertainty is high, the AI filters out emotionally intense responses and selects a more reserved one.
  2. Emotional Suppression Scaling:
    1. Introduces a Response Confidence Score (RCS) between 0-1, where lower values result in more inhibited responses.
    2. If RCS < 0.4, the AI expresses minimal emotion (e.g., short response, avoiding elaboration).
    3. If RCS < 0.2, the AI enters withdrawal mode, responding with a neutral or evasive reply.

Example: Input: “You are really funny!”
→ Emotion: Admiration → Self-Consciousness = High → Uncertainty = Low → RCS = 0.3 → AI responds “Oh… uh, thanks.” instead of a confident response.

Example of the reasoning model - Full Reasoning Path

Input Message: “Wow, you are so talented!”

Step 1: Emotion Perception (EPA)

  1. RoBERTa detects Admiration (EIS = 0.87)
  2. Emotion is considered socially significant

Step 2: Self-Consciousness (SCA)

  • Personal reference detected → Bayesian update increases Psc to 0.78
  • AI feels observed

Step 3: Uncertainty Processing (UPA)

  • GPT-4 checks for ambiguity (none detected)
  • GUS remains low → No major hesitation added

Step 4: Inhibition and Response (IRA)

  • RCS calculated at 0.35 (low confidence)
  • AI selects inhibited response

Final Output: “Oh… um, I guess that’s nice of you to say”.

You may also like

This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Accept Read More