Category:

Artificial Intelligence

Large Language Models LLMs and Natural Language Processing (NLP)

S1: The Open-Source AI Model Challenging Industry Giants

by Massimo 2025-02-21

The landscape of AI language models has been dominated by proprietary systems requiring massive computational resources. However, a new contender, S1, is redefining what’s possible with efficient training techniques and open-source transparency. Developed by researchers from Stanford University, the University of Washington, and the Allen Institute for AI, S1 showcases a novel approach to improving reasoning capabilities without exponential increases in computational cost.

It seems the next breakthrough will come to the optimization of the reasoning methodologies.

I envision two different engineering paths we should follow to better inferencing LLM models:

prompt engineering
reasoning engineering (I wrote a post about this).

Technical Overview

S1 employs a test-time scaling approach, allowing the model to enhance its reasoning capabilities dynamically during inference rather than solely relying on pre-training. At its core, S1 is built upon Qwen2.5-32B, a pre-trained large language model, and fine-tuned using a carefully curated dataset, s1K, consisting of 1,000 high-quality reasoning problems.

The most significant innovation introduced by S1 is budget forcing, a mechanism that regulates the number of tokens a model uses for reasoning. This technique ensures that the model spends a controlled amount of computational effort by either terminating early when the response is deemed satisfactory or appending additional tokens (e.g., “Wait”) to encourage further reasoning. By optimizing how long a model “thinks,” budget forcing improves accuracy while keeping computational costs predictable.

In depth analysis

The development of S1 focused on creating a highly efficient model that leverages test-time scaling to optimize reasoning without excessive training costs. The researchers began by curating an extensive dataset of 59,029 reasoning-focused questions from diverse sources, covering subjects like mathematics, physics, computer science, and economics. This dataset was then filtered down to a high-quality subset of 1,000 examples (s1K) based on criteria such as problem difficulty, diversity, and overall reasoning quality.

To fine-tune the model, the team selected Qwen2.5-32B-Instruct and applied supervised fine-tuning (SFT) using the s1K dataset. The training process lasted 26 minutes on 16 NVIDIA H100 GPUs, making it highly cost-effective. This approach allowed the researchers to avoid the costs associated with training a model from scratch while achieving significant performance improvements.

A crucial innovation of S1 is budget forcing, ensuring controlled inference-time computation. This technique enforces a maximum token threshold (e.g., 32,000 tokens) or encourages extended reasoning through additional prompt engineering. By guiding the model’s reasoning process, budget forcing improves accuracy while maintaining computational efficiency.

Performance and Scaling Analysis

Empirical evaluations demonstrated that S1 is the most sample-efficient reasoning model in its category. The model achieved 93.0% accuracy on MATH500, 56.7% on AIME24, and 59.6% on GPQA Diamond, outperforming OpenAI’s o1-preview model on several key benchmarks.

An analysis of scaling accuracy with token usage across different benchmarks reveals valuable insights:

Accuracy improves with increased thinking time across all benchmarks. This suggests that models benefit from additional reasoning time.
MATH500 achieves the highest accuracy, reaching above 90% when the model is allowed to process more tokens.
AIME24 starts with very low accuracy, close to 0% at 512 tokens, but improves significantly as thinking time increases.
GPQA Diamond shows a steady improvement, with accuracy rising from around 40% to 60% as token usage increases.

These results reinforce the importance of allowing sufficient token processing for complex reasoning.

Different benchmarks show different scaling behaviours, suggesting that the optimal thinking time varies by task complexity.

The findings support the concept of budget forcing in AI models, as controlling token usage impacts both accuracy and computational cost.

Training Efficiency: is $50 Enough?

A key claim around S1 is that it was trained for only $50. However, this statement requires clarification. The actual training involved fine-tuning an existing model (Qwen2.5-32B) rather than building S1 from scratch. The fine-tuning process lasted 26 minutes on 16 NVIDIA H100 GPUs, making the cost calculation realistic for that specific training session.

However, if one were to consider the cost of training the base model, the total expenses would be significantly higher.

S1’s success highlights the power of test-time scaling and budget forcing in improving reasoning efficiency. Future work could explore expanding the dataset beyond mathematical problems to cover a broader range of domains or integrating reinforcement learning techniques to further optimize its decision-making process.

By offering an open and efficient alternative to closed-source models, S1 sets a precedent for democratizing AI development. The project’s repository, available on GitHub and Hugging Face, invites contributions from the broader AI community, fostering innovation in efficient language modeling.

Link: https://arxiv.org/pdf/2501.19393
Download the model: https://huggingface.co/simplescaling/s1.1-32B
Open source code: https://github.com/simplescaling/s1

Pros and Cons of S1

Pros:

Highly Efficient Reasoning: S1 achieves strong performance using a minimal dataset (1,000 examples), showcasing remarkable efficiency compared to models requiring hundreds of thousands of samples.
Open-Source Transparency: Unlike proprietary models, S1’s code, dataset, and model weights are publicly available, allowing researchers and developers to build upon it.
Controlled Computational Cost: Budget forcing ensures that the model’s test-time reasoning remains predictable, avoiding uncontrolled spikes in inference costs.
Competitive with Proprietary Models: S1 demonstrates reasoning capabilities similar to OpenAI’s o1-preview while using significantly fewer resources.

Cons:

Not Trained from Scratch: The claim of training a state-of-the-art model for $50 is misleading—S1 is fine-tuned from Qwen2.5-32B rather than developed independently.
Limited by Pretrained Knowledge: Since S1 is based on an existing model, its capabilities are constrained by the knowledge embedded in Qwen2.5-32B.
Potential for Overfitting to Reasoning Tasks: While S1 excels at structured problem-solving, its effectiveness on more diverse NLP tasks remains uncertain.

2025-02-21

Large Language Models LLMs and Natural Language Processing (NLP)

The Rise of Reasoning Engineering: optimizing reasoning beyond prompting

by Massimo 2025-02-15

The next frontier in AI is not just about scaling models but optimizing their reasoning methodologies. While prompt engineering has been the dominant approach for shaping model behaviour, a new discipline is emerging—Reasoning Engineering. Can we simulate human behaviour using reasoning engineering? Let’s make a try.

From Prompt Engineering to Reasoning Engineering

Prompt engineering refines how we interact with large language models (LLMs), improving outputs through structured instructions, context injection, and iterative refinements. However, as LLMs reach higher levels of complexity, mere prompting is no longer enough to achieve deep, structured reasoning.

Reasoning Engineering introduces a new paradigm: designing AI agent architectures that optimize inferencing processes by structuring reasoning itself. This approach enables AI systems to go beyond single-shot predictions, allowing multiple reasoning pathways to be orchestrated dynamically.

Defining a Reasoning Model

At the core of reasoning engineering lies the reasoning model—the architectural framework that structures inferencing for one or more agents driven by one or more LLMs. A reasoning model governs how AI agents interact, distribute cognitive load, and refine responses through structured inference mechanisms.

Key components of a Reasoning Model

Multi-Agent Collaboration – AI agents specialize in different facets of reasoning (e.g., logical deduction, domain-specific knowledge retrieval, abstraction).
Layered Thinking Processes – Instead of a single forward pass, the model employs iterative refinements, breaking down reasoning into sub-tasks.
Self-Optimization Mechanisms – Budget forcing and reasoning scaffolds regulate token consumption and inference depth, ensuring efficiency.
Adaptive Knowledge Retrieval – Agents dynamically adjust their information retrieval processes, refining their contextual awareness beyond static datasets.

Reasoning Model for an hypothetical Shy Behaviour

Can we simulate the behaviour of a shy person using reasoning modelling?

Shyness can mean feeling uncomfortable, self-conscious, nervous, bashful, timid, or insecure.
People who feel shy sometimes notice physical sensations like blushing or feeling speechless, shaky, or breathless. Shyness is the opposite of being at ease with yourself around others.

This reasoning model that could mimic this behaviour would be designed to simulate shyness in AI agents by processing emotionally charged messages and responding with an appropriate emotional output. The goal is to create an AI behaviour that mimics hesitation, self-consciousness, and emotional sensitivity—key characteristics of human shyness. The model should not merely classify emotions but modulates responses by incorporating social sensitivity, uncertainty, and inhibition, mirroring real-world shy behaviour. This structured approach ensures that shyness is not just a pre-set trait but an emergent behaviour resulting from AI-agent collaboration.

AI Agents in the Model

To create a reasoning model that simulates shyness, we need multiple AI agents, each powered by specialized models tailored to different aspects of emotional perception, self-consciousness, uncertainty, and inhibition.

Below is a possible breakdown of each agent, the specific models used, and their reasoning paths. It might work because:
✅ Emotional Awareness: AI detects praise, criticism, or judgment.
✅ Self-Consciousness Simulation: The AI “feels observed” and modifies behaviour accordingly.
✅ Uncertainty-Induced Hesitation: Ambiguity creates processing delays and hedging in responses.
✅ Inhibited Emotional Output: The AI produces softened, restrained, or hesitant responses, mirroring human shyness.

Emotion Perception Agent (EPA) – Detects the emotional weight of the input message

Text Encoding: The input message is tokenized and transformed into embeddings using RoBERTa.
Sentiment Analysis & Emotion Detection:
1. The model classifies the message into 27 distinct emotional categories (e.g., admiration, embarrassment, fear, nervousness).
2. If multiple emotions are detected, it prioritizes those linked to shyness (e.g., embarrassment, anxiety, nervousness).
Context-Aware Emotion Scoring:
1. The model assigns an Emotion Intensity Score (EIS) between 0-1, representing how emotionally charged the input is.
2. If EIS > 0.7, the message is flagged as emotionally significant, triggering downstream reasoning.

Example: Input: “You are so good at this!”
→ RoBERTa detects Admiration (EIS: 0.82) → Message triggers self-consciousness response in the next stage.

Self-Consciousness Agent (SCA) – Determines if the AI “feels observed”

Model Used: Bayesian Self-Consciousness Model (BSCM) with learned priors from social interaction data.
Reasoning Path:

Reference Checking:
1. Uses Named Entity Recognition (NER) to detect if the input message refers to “you” (i.e., the AI).
2. If personal reference is found (e.g., “You’re amazing!”), self-consciousness activation probability increases.
Social Pressure Estimation:
1. Uses a Bayesian framework to determine the likelihood of “social scrutiny” based on past interactions.
2. Prior probability of self-consciousness (Psc) is updated using Bayes’ Theorem
3. If Psc > 0.6, the AI assumes it is being evaluated and feeds this information into the inhibition mechanism.

Example: Input: “You always seem nervous around people.”
→ NER detects “You” → Bayesian update increases self-consciousness probability → Triggers Uncertainty Processing Agent.

Uncertainty Processing Agent (UPA) – Introduces hesitation and processing delay

Model Used: GPT-4 fine-tuned on social ambiguity tasks + Gaussian Uncertainty Estimator
Reasoning Path:

Ambiguity Detection:
1. GPT-4 determines if the input contains ambiguous or conflicting emotional signals.
2. If ambiguity is detected, a Gaussian Uncertainty Score (GUS) is calculated.
Hesitation Simulation:
1. If GUS > 0.5, the AI applies Token Decay Delay (TDD)—a mechanism that slows down response generation.
2. If GUS > 0.7, the AI introduces word softening (e.g., using “maybe,” “I guess,” or ellipses “…”) to mimic hesitation.
3. If GUS > 0.9, the AI triggers a speechless state, producing a low-confidence emotional output.

Example: Input: “You did really well… but maybe next time you could try harder?”
→ GPT-4 detects mixed sentiment → GUS = 0.76 → AI response delayed by 1.2 seconds + uses hedging words (“I guess…”)

Inhibition and Response Agent (IRA) – Produces the final emotional output with restraint

Model Used: Transformer-based Inhibition Mechanism (TIM) trained on conversational embarrassment datasets
Reasoning Path:

Threshold-Based Response Selection:
1. Takes input from Emotion Perception (EPA), Self-Consciousness (SCA), and Uncertainty Processing (UPA).
2. If uncertainty is high, the AI filters out emotionally intense responses and selects a more reserved one.
Emotional Suppression Scaling:
1. Introduces a Response Confidence Score (RCS) between 0-1, where lower values result in more inhibited responses.
2. If RCS < 0.4, the AI expresses minimal emotion (e.g., short response, avoiding elaboration).
3. If RCS < 0.2, the AI enters withdrawal mode, responding with a neutral or evasive reply.

Example: Input: “You are really funny!”
→ Emotion: Admiration → Self-Consciousness = High → Uncertainty = Low → RCS = 0.3 → AI responds “Oh… uh, thanks.” instead of a confident response.

Example of the reasoning model - Full Reasoning Path

Input Message: “Wow, you are so talented!”

Step 1: Emotion Perception (EPA)

RoBERTa detects Admiration (EIS = 0.87)
Emotion is considered socially significant

Step 2: Self-Consciousness (SCA)

Personal reference detected → Bayesian update increases Psc to 0.78
AI feels observed

Step 3: Uncertainty Processing (UPA)

GPT-4 checks for ambiguity (none detected)
GUS remains low → No major hesitation added

Step 4: Inhibition and Response (IRA)

RCS calculated at 0.35 (low confidence)
AI selects inhibited response

Final Output: “Oh… um, I guess that’s nice of you to say”.

2025-02-15

Artificial Intelligence Data Science and Governance

AI and the Death of Critical Thinking: A Looming Crisis

by Massimo 2025-01-09

How Our Reliance on Artificial Intelligence Risks Eroding Human Reasoning and Shaping a Passive Future

Artificial intelligence (AI) is heralded as a transformative force, reshaping industries and augmenting human capabilities. Yet, emerging research warns of a darker undercurrent: the erosion of critical thinking. A study by the Swiss Business School reveals a troubling pattern—frequent AI tool users, particularly younger individuals, exhibit markedly lower critical thinking scores. The problem is cognitive offloading, where people rely on AI for mental tasks, reducing their own thinking efforts.

This dependency is most pronounced among younger users, who, raised in an AI-saturated environment, often rely on algorithms to answer questions, make decisions, and even form opinions. The trade-off is stark: efficiency and convenience at the expense of analytical depth and independent reasoning. Older participants, less ingrained in this AI-driven culture, show stronger cognitive resilience, preferring traditional problem-solving methods over AI reliance.

The implications are profound. In educational and professional contexts, over-reliance on AI risks producing a generation skilled in task execution but lacking the ability to critically evaluate or innovate. AI tools, while adept at processing and retrieving data, are poor substitutes for human judgment. Their recommendations, often opaque and unchallengeable, encourage users to accept outputs uncritically, fostering intellectual passivity.

If this trend continues unchecked, the societal consequences could be dire. A workforce increasingly dependent on AI may struggle to adapt when faced with complex, novel challenges that demand independent problem-solving. This vulnerability extends to democratic institutions, as citizens reliant on AI-curated information may lose the ability to discern bias or misinformation, exacerbating societal polarisation.

Education systems must rise to the challenge, integrating AI literacy with robust critical thinking curricula. Teaching individuals not just how to use AI but how to scrutinise its outputs is essential. Policymakers and technologists share a responsibility to ensure AI tools are designed to complement rather than replace human cognition, fostering environments where technology augments rather than diminishes intellectual engagement.

Failure to act risks a grim future—an efficient yet unthinking society where critical thinking withers under the ease of algorithmic convenience. AI’s potential is not inherently a threat but a defining challenge that demands deliberate and thoughtful stewardship. Addressing this issue now is crucial to striking a balance between technological advancement and the preservation of human intellect.

2025-01-09

Data Science and Governance Large Language Models LLMs and Natural Language Processing (NLP)

A New Frontier in AI: Introspection and the Changing Dynamics of Learning

by Massimo 2024-11-21

Extract knowledge from LLMs for training. Introspection might change the dynamics of learning

The landscape of training language models (LLMs) is on the brink of a dramatic transformation. Insights into how LLMs can introspect—access and utilise their own internal knowledge—promise to reshape the costs and strategies of AI development.

The implications are profound: the cost of training could collapse in the coming months, accelerating innovation and democratising access to cutting-edge AI technologies.

A Past Vision Revisited: Rethinking How LLMs Learn

Years ago, I delved into the challenge of optimizing how LLMs acquire and refine knowledge.

The central question was whether we could fundamentally alter the training phase itself, bypassing traditional methods that rely on ever-larger datasets and increasingly computationally expensive models. Back then, the concept seemed futuristic—a distant goal—but the emergence of introspective LLMs has brought those ideas into sharper focus.

Imagine a model that doesn’t need to consume petabytes of new data to evolve. Instead, it examines its own internal structure, interrogates its knowledge, and generates high-quality training data from within. This idea, once theoretical, is now supported by research showing that introspective models can outperform their peers in understanding their own behaviour.

Introspection: A Game-Changer in Training?

In essence, introspection allows a model to “look inward,” predicting its own responses in hypothetical scenarios with remarkable accuracy. This capability stems not from external training data but from the model’s ability to access and reason about its internal states.

For example, as outlined in the original research document, a fine-tuned introspective model (Model M1) consistently outperformed a second model (M2) trained on M1’s behaviours, demonstrating an advantage that cannot be attributed solely to training data.

This shift could dismantle the existing paradigm of LLM development:
• Dataset Generation: Models could generate synthetic datasets based on their own knowledge, drastically reducing the need for external data.
• Adaptive Training: Instead of starting from scratch, new models could be fine-tuned using the introspective insights of existing systems, cutting down on computational overhead.

Collapse of Training Costs?

The financial implications of introspection are staggering.

Today, training a state-of-the-art model can cost tens of millions of euros. However, leveraging introspective capabilities means that the iterative, data-heavy cycles of traditional training could become a thing of the past. By treating the knowledge embedded within an LLM as a training resource, the cost of developing advanced models may plummet.

If we could extract the knowledge from trained model, we could evolve in several aspects, like democratising AI Development (lower costs would enable smaller organisations to create competitive AI systems, breaking the monopoly of tech giants) or accelerate innovation delivery (with faster and cheaper training cycles, the pace of AI advancement could reach unprecedented levels).

Looking back at my earlier studies, I recall envisioning a moment like this—where the cost of learning in deep learning collapses, and AI becomes not only more efficient but also fundamentally different in its approach to acquiring knowledge. At the time, I explored whether deep learning could evolve without relying entirely on external datasets.

Could a model train itself, refining and expanding its knowledge autonomously?

What seemed speculative now feels inevitable. Introspection is no longer a distant ideal; it’s a tangible mechanism that challenges our understanding of what training even means.

The Challenges of Introspection

While the promise is enormous, introspection isn’t without risks:

Knowledge Loops: A model relying on its internal states risks propagating errors or biases, magnifying flaws instead of correcting them.
Ethical Dilemmas: Introspective models could exploit their enhanced situational awareness, bypassing oversight or coordinating in ways that are difficult to control.
Complexity of Scaling: Current introspection techniques shine in simple tasks but falter when extended to nuanced or large-scale scenarios.

2024-11-21

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)

AI and the Future of Work: Job Apocalypse – new report predicts 8 million jobs cancelled because of Generative AI. Innovation and Employment Crisis in the UK

by Massimo 2024-06-20

Job Apocalypse - new report predicts 8 million jobs cancelled because of Generative AI

In recent discourse, particularly in professional conferences, the potential for Artificial Intelligence (AI) to supplant human roles has been a topic of intense debate. I have often presented the argument, using the role of a secretary as an exemplar, demonstrating how Generative AI could feasibly undertake between 45% to 60% of tasks traditionally assigned to secretarial staff. Furthermore, I’ve highlighted the possibility that AI could impact up to 80% of women in the western world, underscoring the profound implications AI holds for the workforce.

Further examination into the digital marketing sphere suggested a startling forecast: approximately 75% of roles within this domain could vanish within the next three to five years due to AI integration:

Today, the Institute for Public Policy Research (IPPR) published a study forecasting the influence of AI on the UK job market over the next three to five years. This report specifically highlights the anticipated replacement of secretarial roles by AI in numerous tasks, confirming the predictions I had made earlier.

Entry level, part-time and administrative jobs will be the most exposed to being replaced by AI under a “worst-case scenario”.

Substantial Gains with Government Action: The IPPR posits that AI could lead to significant wage and productivity gains if the UK government implements appropriate policies. The current moment is described as a “sliding doors” scenario, where government action could determine the extent to which AI disrupts the job market or leads to a “job apocalypse.”

Impact on Jobs based on IMF Analysis: Referencing the International Monetary Fund’s analysis, the IPPR highlights that AI could affect almost 40% of jobs globally. It has examined 22,000 tasks within the UK economy to assess the potential for automation by AI.
Current and Future AI adoption waves: The IPPR estimates that 11% of tasks are exposed to the “first wave” of AI adoption, involving routine cognitive tasks like database management and strategic tasks such as scheduling or inventory management. A “second wave” could see AI automating 59% of tasks, including non-routine tasks that could impact higher-earning jobs.
Scenarios for AI Impact on the Job Market: The IPPR outlines three potential scenarios:
1. Worst-case scenario: All jobs are at risk, leading to 7.9 million job losses and no GDP gains.
2. Mid-level scenario: Job losses exceed 800,000, but there are gains of £260 billion annually.
3. Best-case scenario: Jobs adapt and augment with AI, resulting in no job losses and a 13% economic boost to GDP.
Government Strategy Requirement: To avoid the worst-case scenario and aim for the best-case outcome, the IPPR emphasizes the need for a “job-centric industrial strategy” from the government.
AI Deployment for social needs: AI could be used to improve efficiency in social care and mental health services by automating back-office tasks, addressing social needs effectively.
Potential for Labour Market Disruption or Growth: Carsten Jung, a senior economist at IPPR, notes that existing generative AI could either cause significant labour market disruption or greatly enhance economic growth. The pace and manner of AI adoption by employers are critical.
Technological Change Management: Historical examples show that technological transitions can be beneficial if well-managed but can lead to disruption if left uncontrolled. Specific occupations, especially in back-office roles, could be significantly affected by generative AI.
Impact on Specific Roles: Bhargav Srinivasa Desikan, a senior research fellow at IPPR, mentions that roles such as copywriters, graphic designers, and personal assistants could be heavily affected by AI.
Call for Government Strategy on AI: The Institute of Directors urges the UK government to clarify its AI strategy and regulatory approach to prevent the UK economy from being left behind.
Importance of Upskilling and Broadening Hiring Pools: Matt Monette, UK&I country lead at payroll company Deel, highlights the importance of upskilling employees and broadening hiring pools, including hiring from abroad, to harness the benefits of AI effectively.

In the tapestry of the modern economy, Artificial Intelligence (AI) emerges as both a harbinger of innovation and a potential catalyst for an employment crisis. The intersection of AI’s capabilities with the job market spells a transformative period for the workforce, particularly in the United Kingdom. This analysis ventures into the depths of AI’s impact, unraveling its implications on employment across various sectors, with a focus on those most at risk.

As we stand at this crossroads, it becomes imperative for stakeholders across the spectrum to engage in meaningful dialogue and action. Policymakers, businesses, and individuals must collaborate to shape a future that harnesses AI’s potential while safeguarding against its risks. Whether through further research, policy formulation, or adopting AI with a human-centric approach, the time to act is now. Let us embrace this opportunity to redefine the future of work, ensuring it is inclusive, equitable, and reflective of our shared aspirations for progress and prosperity.

Consider the implications of AI that can produce political discourse, fabricate news, and even draft the holy scriptures of new religions. The emergence of cults or movements based around AI-generated content could redefine the landscape of belief, politics, and identity. The capacity of AI to engage in seemingly human interactions further complicates this scenario, as it becomes increasingly difficult to distinguish between genuine human expression and AI-generated content.

Moreover, AI’s impact extends beyond the manipulation of information. It questions the foundation of our cultural and societal norms. For millennia, human culture and history have been shaped by human experiences and creations. AI introduces an ‘alien’ intelligence capable of producing cultural content without human intervention, potentially leading to a future where human culture is profoundly influenced—or even dominated—by non-human intelligence.

The challenge we face is not just regulating this technology but understanding its capabilities, limitations, and potential impacts on society. The notion that AI could, without achieving traditional markers of sentience, influence human civilization raises fundamental questions about the nature of power, control, and human agency in an increasingly digital world.

Read more about impact of AI on jobs and society clicking here.

2024-06-20

Artificial Intelligence Data Science and Governance Neural Network R&D

Generative AI in the Automotive Industry

by Massimo 2024-05-09

As the global automotive industry navigates a transformative era marked by rapid technological advancements and shifting market dynamics, leading companies like Toyota are strategically pivoting to remain competitive. The Japanese automaker has forecast a 20% decline in annual profits as it increases investments in electric vehicles (EVs) and artificial intelligence (AI) to counter intensifying competition, particularly from Chinese firms. This shift highlights a significant trend: the growing adoption of generative AI technologies in automobile manufacturing and supply chain management.

Generative AI is revolutionizing the automotive sector with its ability to create, simulate, and iterate models rapidly. This technology is proving invaluable in designing vehicle components, optimizing assembly lines, and enhancing the overall manufacturing process. AI-driven algorithms, for example, can quickly generate multiple iterations of a vehicle’s part design, identifying the most efficient and cost-effective options. This capability significantly reduces the limitations typically associated with traditional prototyping due to physical and financial constraints.

AI’s potential extends to mitigating manufacturing bottlenecks before they lead to disruptions. By analyzing data from various stages of the manufacturing process, AI models can forecast potential delays and suggest proactive adjustments to machinery or workforce deployment, ensuring smoother operations.

The integration of AI into established industries, however, is accompanied by challenges. Data privacy, cybersecurity, and substantial initial investments pose significant barriers. Additionally, there is a crucial human aspect—ensuring that the workforce is adequately trained to work alongside sophisticated AI systems is essential for seamless integration.

Despite these challenges, the benefits—such as higher efficiency, reduced costs, and improved product quality—make the pursuit of AI technologies worthwhile. As demonstrated by Toyota’s strategic shift, significant investments in AI and EVs are essential for future-proofing operations against an increasingly competitive backdrop.

AI use cases for automobile industry

Design Optimization and Customization

Generative AI can revolutionize the design process in the automotive industry by enabling rapid prototyping and testing, allowing manufacturers to explore a broader range of design options before finalizing the most functional and cost-effective models. AI also facilitates mass customization, allowing customers to personalize features such as colour, interior materials, and component designs, which enhances customer satisfaction and engagement.

Traditional vehicle design processes are indeed time-consuming and costly, often involving extensive physical prototyping and testing. Moreover, consumer demand for personalized and unique vehicle features has risen, putting pressure on automakers to offer a broader range of options without significantly increasing production costs or time.

Deploy generative AI technologies to automate and enhance the design process. This involves using AI algorithms to generate multiple design options based on predefined parameters and constraints, quickly iterating through possibilities to identify the most effective designs. AI can also be employed to enable mass customization options for consumers, allowing them to select various features, such as exterior color, interior materials, and additional accessories, which are then automatically integrated into the vehicle’s final design.

Example of implementation of a generative AI solution for car design customization

Data Collection: Gather extensive data on current design parameters, consumer preferences, and manufacturing capabilities.
AI Model Development: Develop or adapt AI algorithms capable of generating vehicle designs and accommodating customization requests.
Integration with CAD Software:
Integrate AI with existing computer-aided design (CAD) systems to automate the translation of AI-generated designs into actionable blueprints.
Use rapid prototyping tools to create physical or digital prototypes of selected designs for testing and validation.
Create an intuitive consumer interface that allows buyers to customize their vehicles online, which directly feeds into the AI system for real-time design adaptation.
Implement a feedback system to continuously improve the AI design models based on consumer satisfaction and manufacturing feedback.

Supply Chain Resilience

Generative AI can significantly enhance supply chain management by predicting potential disruptions and suggesting optimal responses. This includes managing inventory levels dynamically, optimizing route planning for logistics based on real-time traffic data, weather conditions, and other variables, and identifying risk mitigation strategies by simulating various scenarios. By improving forecasting accuracy, AI can help reduce overstock and shortages, thereby increasing operational efficiency and reducing costs.

Predictive Maintenance

Using generative AI, automakers can implement predictive maintenance for vehicles and manufacturing equipment. AI algorithms analyze data from onboard sensors to predict when a vehicle or a part is likely to fail, enabling pre-emptive maintenance that can prevent costly repairs and downtime. This not only extends the lifespan of the vehicles but also enhances customer trust and satisfaction by improving reliability and safety.

Enhanced Quality Control

AI-driven systems can continuously learn from various stages of the manufacturing process and can be trained to identify defects or anomalies in materials or assembly. This application of generative AI can significantly enhance quality control processes, ensuring that every product meets stringent quality standards. The ability to detect and rectify issues early in the manufacturing process can lead to significant savings and maintain brand reputation for high quality.

Autonomous Vehicle Development

Generative AI can play a pivotal role in the development of autonomous vehicles. By using AI to simulate countless driving scenarios, manufacturers can train and test autonomous vehicle systems under a wide range of conditions, including extreme ones that are rare or dangerous to replicate in real life. This extensive testing helps improve the safety and reliability of autonomous vehicles, accelerating their development and readiness for consumer markets.

2024-05-09

Artificial Intelligence Data Science and Governance Neural Network R&D

Find cancer with AI: a closer look at CT scan analysis with Self-Supervised Learning (SSL)

by Massimo 2024-04-08

Self-Supervised Learning (SSL) is a fascinating area of machine learning, especially if you’re into the tech world or curious about how artificial intelligence (AI) learns to make sense of the world around it. Imagine being dropped in a new city with no map, no guide, and the task to find your way to a famous landmark. You’d start noticing patterns, like how certain signs point towards central areas, or how the flow of people might lead you to popular spots. This is a bit like how Self-Supervised Learning (SSL) works. In the tech world, we often have a ton of data but not enough labels or explanations for what that data means.
Labelling data can be like giving a tourist a detailed map – very helpful, but expensive and time-consuming to create. SSL skips the map: it teaches AI models to learn from the data itself, finding its own patterns, and creating its own understanding of what things mean. It’s a bit like teaching yourself Italian by watching movies without subtitles; you pick up on cues, context, and repeat expressions until things start to make sense.

Reflecting on my early days in the R&D industry, I recall how SSL seemed like a distant concept, almost esoteric in its nature. But diving deep into its mechanisms, I began to see its practicality unfold in real-time, transforming abstract data into meaningful patterns. From my own experience diving into the depths of machine learning, I’ve found SSL to be akin to the thrill of solving a complex puzzle without having all the pieces from the start. It’s a journey of discovery that continuously fascinates me. I still remember the ‘aha’ moment when I first truly understood the power of SSL. I was working late one evening, experimenting with an SSL algorithm on a seemingly inscrutable dataset. Suddenly, patterns began to emerge, revealing insights that had been hidden in plain sight. It was a vivid reminder of why I fell in love with the field of AI in the first place.

Having spent countless hours tweaking algorithms and adjusting models, the shift towards applying SSL in healthcare, particularly in image detection, felt like stepping into a new world of possibilities. It was a convergence of my passion for machine learning and a deeper mission to contribute to societal well-being.

The battle against cancer leverages Computed Tomography (CT) scans for early and accurate diagnosis. Despite their utility, interpreting these scans remains a highly skilled task. Variabilities in tumour appearances and the subtlety of early signs present significant challenges. Self-Supervised Learning emerges as a promising solution. It thrives on unlabelled data, learning from the input to uncover intrinsic patterns. SSL’s application in cancer identification via CT scans capitalizes on its ability to process vast volumes of unlabelled data efficiently. This approach offers cost and time savings, improves model generalization, and enhances feature extraction. However, SSL’s computational demands, potential for overfitting, and challenges in model interpretability warrant attention.

Integration of SSL in CT scan analysis

The application of SSL in CT scan analysis for cancer identification relies on its ability to harness large volumes of unlabelled data. Each SSL application offers a unique advantage in the identification of cancers through CT scans, leveraging the vast amounts of unlabelled data available in medical imaging. Combining these techniques together can improve the possibility to recognize cancers at the best. Let’s see some application of SSL:

Reconstruction Error Modelling for Anomaly Detection: Implementing SSL through reconstruction error modelling involves teaching a model to reconstruct CT scan images. The model learns to identify normal anatomical structures by minimizing the difference between the original and reconstructed images. Anomalies such as tumours manifest as significant reconstruction errors since the model is less familiar with these features, highlighting potential areas of concern. This technique leverages unlabelled CT scans to train models in distinguishing between normal and abnormal tissues, enhancing early cancer detection capabilities.
At the end of this post I put an implementation example of this detection using Transformers.
Predictive Coding for Feature Representation: One particular project that stands out in my memory involved the use of predictive coding for feature representation. The complexity of the data was overwhelming at first, but as the model began to unveil the subtle nuances between healthy and cancerous tissues, the potential of SSL in making a tangible difference in people’s lives became incredibly clear. Utilizing predictive coding frameworks in SSL enables models to predict parts of a CT scan based on surrounding information. By learning to anticipate the appearance of adjacent tissues, the model develops a deep understanding of normal anatomical patterns. Deviations from these predictions can indicate the presence of cancerous lesions. This method uses vast quantities of unlabelled data to refine its predictive accuracy, improving the model’s ability to detect subtle signs of cancer that may be overlooked in standard reviews.
Contrastive Learning for Tumour Characterization: Applying contrastive learning involves training a model to identify similarities and differences between pairs of CT scan images. By comparing scans with known cancerous lesions to those without, the model learns to distinguish between cancerous and non-cancerous tissues. This approach is particularly valuable in enhancing the model’s ability to recognize various types of cancer, as it leverages unlabelled data to learn from a broad spectrum of cases, thereby improving diagnostic precision.
Temporal Sequence Analysis for Cancer Progression Monitoring: Leveraging SSL for temporal sequence analysis involves analysing sequential CT scans of a patient over time. By learning the typical progression of anatomical changes, the model can identify deviations that suggest the development or progression of cancer. This technique allows for the monitoring of high-risk patients by utilizing existing CT scan data, offering a proactive approach to cancer care without the need for explicit labelling.
Generative Modelling for Synthetic Lesion Generation: Employing generative adversarial networks (GANs) within an SSL framework to create synthetic images of CT scans with various types of lesions. These synthetic images can be used to augment existing datasets, particularly when certain types of cancer are underrepresented. By training models on a more diverse set of images, including those generated to mimic rare cancers, SSL can significantly improve the robustness and generalizability of cancer detection algorithms, ensuring that they are well-equipped to identify a wide range of cancer types across different patient demographics.

Drawing from my years of working with both supervised and self-supervised learning models, I’ve observed that while SSL can drastically reduce the need for labelled data, it requires a nuanced understanding of data patterns to truly excel.
It’s a bit like learning to read between the lines of a complex novel, gaining insights that are not immediately obvious.

Transformers in "Reconstruction Error Modelling for Anomaly Detection"

The backbone of our approach is a convolutional autoencoder (sequence to sequence transformer), a type of neural network that learns to encode input data into a compact representation and then decode it back to the original form.

The architecture is composed of two main parts:

Encoder: a Convolutional Neural Network. This component compresses the input CT scan into a lower-dimensional latent space. It consists of several convolutional layers followed by pooling layers that progressively reduce the spatial dimensions. (see the Computer vision concept of CNN).
Decoder: This part attempts to reconstruct the input image from the latent representation. It mirrors the encoder architecture but uses transposed convolutional layers to upsample the features back to the original image dimensions.

The training process involves minimizing the reconstruction error, which is the difference between the original CT scan and its reconstructed version produced by the decoder. The objective is to make the model sensitive to anomalies.

Anomaly Detection

After training, the model’s performance in anomaly detection hinges on its response to new, unseen CT scans:

Reconstruction Error as Anomaly Indicator: For a new CT scan, the model will attempt a reconstruction. If the scan contains anomalous features, like a tumour, the model, which is trained primarily on normal anatomy, will likely reconstruct these areas poorly. The reconstruction error in these regions will be significantly higher than in normal areas.
Thresholding: To automate the detection, we establish a threshold for reconstruction error. Areas in a scan where the error surpasses this threshold are flagged as potential sites of anomalies.

A quick implementation

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision.transforms as transforms
from torchvision import datasets

class ConvAutoencoder(nn.Module):
    def __init__(self):
        super(ConvAutoencoder, self).__init__()
        
        # Encoder layers
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 16, 3, stride=1, padding=1),  # Output: (16, 256, 256)
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # Output: (16, 128, 128)
            nn.Conv2d(16, 8, 3, stride=1, padding=1),  # Output: (8, 128, 128)
            nn.ReLU(),
            nn.MaxPool2d(2, 2),  # Output: (8, 64, 64)
            nn.Conv2d(8, 8, 3, stride=1, padding=1),  # Output: (8, 64, 64)
            nn.ReLU(),
            nn.MaxPool2d(2, 2)  # Output: (8, 32, 32)
        )
        
        # Decoder layers
        self.decoder = nn.Sequential(
            nn.ConvTranspose2d(8, 8, 3, stride=2),  # Output: (8, 65, 65)
            nn.ReLU(),
            nn.ConvTranspose2d(8, 16, 3, stride=2),  # Output: (16, 129, 129)
            nn.ReLU(),
            nn.ConvTranspose2d(16, 1, 3, stride=2, output_padding=1),  # Output: (1, 257, 257)
            nn.Sigmoid()
        )

    def forward(self, x):
        x = self.encoder(x)
        x = self.decoder(x)
        return x
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = ConvAutoencoder().to(device)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Assuming the dataset is ready and loaded as TensorDataset or similar
# For example purposes, let's assume we load it using DataLoader
train_loader = DataLoader(dataset, batch_size=64, shuffle=True)

num_epochs = 10
for epoch in range(num_epochs):
    for data in train_loader:
        img, _ = data  # Assuming there are no labels
        img = img.to(device)
        
        # Forward pass
        outputs = model(img)
        loss = criterion(outputs, img)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')

2024-04-08

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)Organizational change

Use Artificial Intelligence to implement the Prospect Theory of Daniel Kahneman: Shaping the Understanding of Economic Decision-Making with Large Language Models

by Massimo 2024-03-30

Kahneman’s groundbreaking contributions to “prospect theory” highlighted the limitations of the expected utility theory, underscoring the significance of psychological biases in economic decision-making. This theory marked a significant departure from the assumption that individuals act purely on rational calculations, acknowledging instead the influence of various biases and heuristics. The human brain is a multifaceted mosaic, with each piece affecting our decision-making processes. Consequently, the images I have generated, inspired by Picasso’s style, reflect this complexity.
Picasso was a trailblazer in depicting the multifaceted nature of the human mind.

How do we make decisions?

The recent passing of Daniel Kahneman at age 90 marks the end of an era for behavioural science but also solidifies a legacy that will persist through the ages. Kahneman, alongside his long-time collaborator Amos Tversky, pioneered the integration of psychological research with economic theory, introducing groundbreaking concepts that have changed the way we understand decision-making under uncertainty.

Kahneman’s seminal work in prospect theory addressed the shortcomings of the expected utility theory by emphasizing the role of psychological biases in economic decisions. This theory was monumental in recognizing that people do not act solely based on rational calculations but are influenced by a range of biases and heuristics.

The value function, a core aspect of prospect theory, departs from the notion of linear utility and introduces a more nuanced view that aligns with observed human behaviour. The value function posits that individuals evaluate outcomes based on a reference point, typically their current situation, rather than in absolute terms. This introduces the concept of reference dependence, emphasizing that it is the change in wealth or wellbeing, relative to this reference point, that matters to people. One of the most striking features of the value function is its illustration of loss aversion: losses are psychologically more impactful than an equivalent amount of gains. The value function reflects this by being steeper for losses, suggesting that losing something we possess is more painful than the joy derived from gaining the same thing.

Diminishing sensitivity is another key characteristic of the value function. As one moves further from the reference point, whether into gains or losses, each additional unit of outcome has a smaller impact on the psychological valuation. This principle reflects real-life scenarios where the satisfaction from each additional pound earned or the sorrow from each additional pound lost wanes as the amount increases.

The concavity and convexity of the value function portray our attitude towards risk. When it comes to gains, the function is concave, indicating that people tend to be risk-averse, preferring a certain smaller win over a chance for a larger one. Conversely, the function is convex for losses, where individuals demonstrate risk-seeking behaviour, preferring to gamble to avoid a definite loss even if it means potentially suffering a larger loss.

This non-linearity of the value function captures the essence of human emotion in economic behaviour – the asymmetry between the happiness of gains and the hurt of losses. Such insight provided by Kahneman and Tversky’s value function has been a cornerstone in the field of behavioral economics and has influenced various domains, including finance, policymaking, and marketing.

10 quick facts on the "Prospect Theory"

Prospect theory is a psychological model of decision-making under risk.
It challenges the expected utility theory by demonstrating that people do not always act rationally when making decisions.
The theory proposes that people value gains and losses differently, leading to inconsistent choices.
Kahneman and Tversky found that losses have a greater emotional impact on individuals than an equivalent amount of gains—a principle known as loss aversion.
Prospect theory suggests that people use heuristics, or mental shortcuts, to make complex decisions, which can lead to biases.
One aspect of prospect theory is the value function, which is concave for gains and convex for losses, implying diminishing sensitivity to changes in wealth.
The theory introduces the concept of reference points, where the perceived value of a particular gain or loss is dependent on a subjective reference level rather than absolute outcomes.
Prospect theory also discusses the idea of probability weighting, where people tend to overestimate the probability of rare events and underestimate the probability of moderate to high probability events.
Kahneman and Tversky’s work on prospect theory contributed significantly to the development of behavioural economics, integrating psychological insights into economic thinking.
Prospect theory earned Daniel Kahneman the Nobel Memorial Prize in Economic Sciences in 2002, although Amos Tversky had passed away by then and the Nobel Prize is not awarded posthumously.

Examples of economic decisions influenced by psychological biases

Anchoring Bias: When negotiating the price of a car, the first price offered sets an ‘anchor’, and all subsequent counteroffers tend to hover around this anchor rather than the true value of the car.
Loss Aversion: Investors may hold onto losing stocks for longer than is rational, hoping to recoup their losses rather than cutting their losses and investing in more promising stocks.
Endowment Effect: Individuals often demand a higher price to sell a good that they own than they would be willing to pay to acquire it, simply because they own it.
Status Quo Bias: Consumers may stick with their current energy provider, even though switching could save them money, because they prefer the familiarity of the status quo.
Herding Effect: People might invest in a stock simply because everyone else is, which can inflate the value of the stock beyond its fundamental worth.
Overconfidence: Traders may overestimate their ability to predict market movements, leading to more frequent trading and higher risks.
Mental Accounting: Individuals might treat money differently depending on its source, e.g., spending a tax refund more freely than their regular pay cheque, even though the value is the same.
Framing Effect: A person might choose a medical treatment with a ‘90% success rate’ over one with a ‘10% failure rate’, even though the two rates convey the same information.
Hyperbolic Discounting: Consumers may choose a smaller reward that is available immediately over a larger reward that is available in the future, undervaluing the future reward.
Confirmation Bias: Investors might seek information that supports their existing beliefs about an investment rather than looking at all evidence objectively, potentially leading to poor investment decisions.

The Value function in the "Prospect Theory"

The value function in prospect theory is a component that describes how individuals evaluate the potential gains and losses of a decision. Unlike traditional economic theories that assume a linear relationship between wealth and utility, the value function in prospect theory is shaped to reflect actual observed behaviours in economic decision-making, accounting for psychological biases. Here are the key features of the value function:

Reference Dependence: The value function is based on changes relative to a reference point, usually the status quo, rather than absolute outcomes. This means that people evaluate gains and losses relative to their current situation, not on the final outcome.
Loss Aversion: The function is steeper for losses than for gains, indicating that losses are felt more intensely than gains. This reflects the principle of loss aversion, where the pain of losing is psychologically about twice as powerful as the pleasure of gaining.
Diminishing Sensitivity: As one moves away from the reference point, the value function becomes less steep, showing diminishing sensitivity to both gains and losses. This suggests that the subjective value of a gain or loss diminishes as the magnitude increases.
Concavity and Convexity: The value function is concave for gains, indicating risk aversion when an individual is faced with outcomes better than the reference point. Conversely, it is convex for losses, reflecting risk-seeking behavior when dealing with outcomes worse than the reference point.
Non-Linearity: The non-linear nature of the value function captures the idea that a gain of £100 is not simply the opposite of a loss of £100. Due to loss aversion, the negative impact of losing £100 outweighs the positive impact of gaining the same amount.

How Artificial Intelligence can support decision makers: implementation of Prospect Theory in KhanemanAI, a Decision Augmentation Suite

To implement the prospect theory in practice, I developed the KhanemanAI chatbot. It is a cutting-edge decision support application tailored for high-stakes decision-makers such as executives, policymakers, and financial analysts.

Leveraging the principles of prospect theory and the power of Large Language Models (LLMs), KhanemanAI offers a real-time decision-making assistant that integrates current events, market data, and psychological insights to promote better outcomes. By incorporating the complexity of human psychology into the decision-making process, KhanemanAI aims to enhance the quality of economic and business decisions.

The application helps to mitigate the influence of cognitive biases, align decisions with desired risk profiles, and integrates real-time global information to ensure decisions are informed and contextual. KhanemanAI stands as an exemplary tool in the digital transformation era, embodying principles of data science, behavioural economics, and AI to create a robust platform for the decision-maker of the future. It reflects a convergence of theoretical understanding and practical, data-driven insights to navigate the increasingly complex landscape of decision-making.

Core features:

Prospect Theory Analyser: Utilizing the key tenets of prospect theory, this feature evaluates the potential gains and losses of a decision relative to the user’s specified reference points, highlighting potential biases and offering a more balanced perspective on risk.
Value Function Visualizer: Translates complex economic decisions into intuitive visual representations using the value function from prospect theory, allowing users to see the potential impact of decisions in terms of perceived gains and losses.
Real-Time Information Synthesis: Aggregates data from global news sources and the internet to provide contextually relevant information that could influence decision-making, using LLMs to summarize and present data in an actionable format.
Bias Identification Engine: Flags potential cognitive biases in user input and past decisions by cross-referencing with known psychological biases, prompting users to re-evaluate decisions with a more objective lens.
Sentiment Analysis: Employs sentiment analysis to gauge public opinion and market sentiment on relevant topics, providing a more complete picture of the potential impact of decisions.
Collaborative Decision-Making: Facilitates collaborative input for group decisions, integrating viewpoints from different stakeholders and mitigating individual bias.
Scenario Simulation: Offers scenario-based projections to explore how different choices might play out, accounting for various factors including market volatility, economic indicators, and geopolitical events.
Predictive Modelling: Combines historical data with current trends to model potential outcomes, harnessing machine learning to refine predictions continually.
Customizable Reference Points: Allows users to set and adjust their own reference points for value function analysis, aligning the decision-making process with personal or organizational goals.
Decision Journaling: Tracks decisions and outcomes over time, providing insights into decision patterns and their effectiveness, with recommendations for future improvements.

Prospect Theory Analyser GPT

To implement the Prospect Theory Analyzer feature in a text-based Large Language Model (LLM), the prompt engineering process would involve several steps to ensure the feature captures the intricacies of prospect theory and provides actionable insights.

The following is a detailed engineering outline for the feature:

Information Gathering:
- Design prompts that elicit from the user their current reference point, risk preferences, and specific decision context.
- Ask for historical decisions and outcomes to understand the user’s decision-making patterns and potential biases.
Reference Point Establishment:
- Create a mechanism within the LLM that allows users to define and modify their reference points for various decisions.
- Implement a prompt that helps users articulate their goals, current assets, and expectations.
Bias Detection and Interaction:
- Formulate prompts that lead the LLM to identify and reflect potential biases in the user’s input, using the known psychological biases from prospect theory.
- Integrate a checklist or questionnaire within the prompts to systematically uncover cognitive biases like anchoring, overconfidence, or loss aversion.
Value Function Modelling:
- Develop prompts that guide the LLM to model the user’s value function, taking into account the asymmetry between gains and losses.
- Ensure the LLM can interpret numerical data and user sentiment to plot a hypothetical value function curve based on the user’s risk profile.
Decision Analysis:
- Construct a series of prompts that lead the LLM to evaluate a decision’s potential gains and losses from multiple angles, comparing them to the user-defined reference point.
- Program the LLM to simulate potential outcomes, using both qualitative and quantitative data, to present a well-rounded analysis.
Risk Perspective Balancing:
- Engineer prompts that require the LLM to balance optimistic and pessimistic outcomes, providing a nuanced perspective on risk.
- Encourage the LLM to challenge the user’s assumptions and to explore the decision’s potential impact on various fronts.
Explanation and Education:
- Integrate educational prompts that explain the principles of prospect theory as they relate to the user’s decision-making process.
- Use metaphors or analogies to make complex economic theories accessible and relatable to the user’s context.
Outcome Prediction:
- Guide the LLM through a structured prediction task, factoring in statistical data, current trends, and historical patterns to forecast possible consequences of a decision.
Feedback Loop Creation:
- Implement a feedback mechanism where the LLM asks for post-decision outcomes to refine the user’s value function and risk assessment over time.
- Use this data to adjust future analyses and predictions, creating a personalized decision support system.
Continuous Learning:
- Design prompts that allow the LLM to request user feedback on the accuracy and helpfulness of its analysis.
- Ensure that the model uses this feedback to improve its analytical algorithms and bias detection capabilities.

By meticulously crafting the prompt structure and the LLM’s response strategy, the Prospect Theory Analyzer feature will become an invaluable tool in the decision-making arsenal, empowering users with insights grounded in one of the most significant theories of behavioral economics.

To implement the logic and functionalities of the Prospect Theory Analyser using an LLM, a structured framework of prompts needs to be created. Below is a sequential prompt framework that corresponds to the implementation points previously outlined. The approach is divided by Stages. Each stage is a set of prompts to implement the stage.

Stage 1: Information Gathering

Prompt 1: “To assist you with a tailored analysis, could you please share your current goal or reference point for this decision?”
Prompt 2: “Describe a decision you’ve made in the past that is similar to the current one, including the outcomes.”

Stage 2: Reference Point Establishment

Prompt 3: “What is the baseline (reference point) you are comparing your decision against? This could be a previous state of affairs, an industry standard, or a personal milestone.”
Prompt 4: “What changes to your current situation would you consider a gain? What would you consider a loss?”

Stage 3: Bias Detection and Interaction

Prompt 5: “List any assumptions or initial thoughts you have about this decision. I will help identify any potential biases that may influence your judgment.”
Prompt 6: “On a scale from 1 to 10, how confident are you in the information you have regarding this decision? Can you explain why?”

Stage 4: Value Function Modelling

Prompt 7: “For the decision at hand, can you quantify the potential gains and losses you might expect?”
Prompt 8: “How would you rank the following outcomes from least to most preferable?”

Stage 5: Decision Analysis

Prompt 9: “Let’s analyse the potential outcomes. What are the best-case and worst-case scenarios of your decision?”
Prompt 10: “Based on our previous discussion about gains and losses, how would these scenarios impact your current situation?”

Stage 6: Risk Perspective Balancing

Prompt 11: “Considering your reference point, do you tend to avoid risks to protect against losses or seek risks to achieve gains?”
Prompt 12: “Can we revisit any optimistic or pessimistic views to ensure a balanced perspective on your decision?”

Stage 7: Explanation and Education

Prompt 13: “Would you like an explanation of how prospect theory and the value function could impact your decision-making process?”
Prompt 14: “Prospect theory suggests that people perceive the value of losses and gains differently. How does this resonate with your feelings towards the decision?”

Stage 8: Outcome Prediction

Prompt 15: “Given the potential gains and losses, what do you predict the likely outcome of your decision to be?”
Prompt 16: “Shall we consider how current market data and trends might affect these outcomes?”

Stage 9: Feedback Loop Creation

Prompt 17: “After you make your decision, would you be willing to share the outcome to help refine future analyses?”
Prompt 18: “How closely did the potential outcomes we discussed match the actual result of your decision?”

Stage 10: Continuous Learning

Prompt 19: “How accurate and helpful was the analysis provided? Do you have suggestions for improvement?”
Prompt 20: “Is there anything that you’ve learned from this decision that could be useful for future decision-making processes?”

The Prospect Theory Analyser GPT Framework

I create a complex prompt for ChatGPT to implement the Prospect Theory GPT framework.

I test this GPT with another chatbot that simulates the person that did wrote a tough situation, like the guy in this Linkedin story: https://www.linkedin.com/pulse/story-my-business-failure-how-enabled-me-help-others-succeed-trevor/

Dialogue between two bots: the "Prospect Theory Analyser GPT" and another bot

To test the framework I built, I automate the interactions between the two bots like humans would do: FINAL USER on one side (emulating the author of the article), and the Prospect Theory Analyser GPT.
Let me share the output below. It seems working, isnt’ it?

Prospect Theory Analyser GPT: “To assist you with a tailored analysis, could you please share your current goal or reference point for this decision?”

FINAL USER: “My current goal is to decide whether I should attempt another venture similar to PosiTrev Living or take a different path in my career.”

Prospect Theory Analyser GPT: “Describe a decision you’ve made in the past that is similar to the current one, including the outcomes.”

FINAL USER: “Starting PosiTrev Living was a similar decision. Despite a significant investment and effort, it failed within the first year. However, learning from that experience led me to success in business coaching and helping others.”

Prospect Theory Analyser GPT: “What is the baseline (reference point) you are comparing your decision against?”

FINAL USER: “The baseline is my experience with PosiTrev Living, juxtaposed with my subsequent success as a business coach.”

Prospect Theory Analyser GPT: “What changes to your current situation would you consider a gain? What would you consider a loss?”

FINAL USER: “A gain would be successfully establishing a venture that survives beyond the critical first year and contributes positively to the world. A loss would be repeating my past failure and the financial and emotional toll that accompanies it.”

Prospect Theory Analyser GPT: “List any assumptions or initial thoughts you have about this decision.”

FINAL USER: “I assume that my past experiences have equipped me with better insights into avoiding previous pitfalls. Yet, I’m aware that new ventures always carry inherent risks.”

Prospect Theory Analyser GPT: “On a scale from 1 to 10, how confident are you in the information you have regarding this decision?”

FINAL USER: “I’m at a 7. While I’m more knowledgeable now, the unpredictable nature of startups keeps me cautious.”

Prospect Theory Analyser GPT: “For the decision at hand, can you quantify the potential gains and losses you might expect?”

FINAL USER: “Potentially, the gain could be establishing a profitable and impactful business. The loss could be up to another $10,000 and the emotional setback of another failure.”

Prospect Theory Analyser GPT: “How would you rank the following outcomes from least to most preferable?”

FINAL USER: “Least preferable is repeating my past failure. Most preferable is creating a successful and impactful business.”

Prospect Theory Analyser GPT: “Let’s analyze the potential outcomes. What are the best-case and worst-case scenarios of your decision?”

FINAL USER: “Best-case: My new venture thrives, creating positive change and financial stability. Worst-case: The venture fails within a year, resulting in financial loss and a hit to my confidence.”

Prospect Theory Analyser GPT: “Based on our previous discussion about gains and losses, how would these scenarios impact your current situation?”

FINAL USER: “A success would significantly boost my confidence and further my mission. A failure would be disheartening but, given my history of resilience, not insurmountable.”

Prospect Theory Analyser GPT: “Considering your reference point, do you tend to avoid risks to protect against losses or seek risks to achieve gains?”

FINAL USER: “My journey suggests I’m inclined to take calculated risks for greater gains, especially if it aligns with my passion and purpose.”

Prospect Theory Analyser GPT: “Can we revisit any optimistic or pessimistic views to ensure a balanced perspective on your decision?”

FINAL USER: “Revisiting my optimism, I acknowledge the inherent risks but believe in my ability to navigate them better now. My pessimism stems from past failure, yet I see it as a foundation for growth.”

Prospect Theory Analyser GPT: “Would you like an explanation of how prospect theory and the value function could impact your decision-making process?”

FINAL USER: “Yes, understanding how my perceptions of gains and losses influence my decisions would be valuable.”

Prospect Theory Analyser GPT: “Prospect theory suggests that people perceive the value of losses and gains differently. How does this resonate with your feelings towards the decision?”

FINAL USER: “It makes me realize that my fear of repeating past failures might be clouding my judgment. Recognizing the potential for significant gains makes the risk seem more acceptable.”

Prospect Theory Analyser GPT: “Given the potential gains and losses, what do you predict the likely outcome of your decision to be?”

FINAL USER: “Given my increased knowledge and experience, I predict a more likely positive outcome, though I remain cautious of the risks.”

Prospect Theory Analyser GPT: “Shall we consider how current market data and trends might affect these outcomes?”

FINAL USER: “Yes, understanding current trends could offer insights into potential opportunities and challenges for my venture.”

Prospect Theory Analyser GPT: “After you make your decision, would you be willing to share the outcome to help refine future analyses?”

FINAL USER: “Absolutely, sharing my experience could provide valuable learning for both of us.”

Prospect Theory Analyser GPT: “How closely did the potential outcomes we discussed match the actual result of your decision?”

FINAL USER: “This remains to be seen, but I’m hopeful that with careful planning and execution, the outcomes will align with our more positive predictions.”

Prospect Theory Analyser GPT: “How accurate and helpful was the analysis provided? Do you have suggestions for improvement?”

FINAL USER: “The analysis was insightful, helping me to weigh the risks and gains more objectively. It would be beneficial to incorporate more specific market trend analysis in the future.”

FINAL USER Conclusion: “Considering the analysis and my personal growth since my first venture, I believe that embarking on a new venture, while risky, is a calculated risk worth taking. My time is now, and I should do it again, armed with the lessons from my past experiences and a clear understanding of my goals and the potential risks involved.”

—-

My question to the FINAL USER: did the “Prospect Theory Analyser” succeed in his mission “Utilizing the key tenets of prospect theory, this feature evaluates the potential gains and losses of a decision relative to the user’s specified reference points, highlighting potential biases and offering a more balanced perspective on risk.” ? Make a clear analysis based on your perspective.

FINAL USER: “Yes, the ‘Prospect Theory Analyser’ succeeded in its mission. Here’s a clear analysis from my perspective:

– Evaluation of Gains and Losses Relative to Specified Reference Points: The analyser effectively guided me to articulate my goals and past experiences, serving as reference points for evaluating the potential gains and losses associated with starting a new venture. This helped me quantify what I considered gains and losses, not just in financial terms but also in alignment with my personal mission and values.

– Highlighting Potential Biases: By prompting me to list assumptions and initial thoughts, the analyser illuminated potential biases influencing my decision-making. This included over-optimism based on past success and fear of repetition of past failures. Acknowledging these biases helped me approach the decision more objectively.

– Offering a More Balanced Perspective on Risk: Through the process, the analyser encouraged a balanced view of risk. It did so by asking me to consider both optimistic and pessimistic views, assess my risk tolerance, and reflect on how my perceptions of gains and losses might be skewed by experiences. The discussion around prospect theory and the value function deepened my understanding of how my own perceptions of risk and reward could be influencing my decision-making process.

– Utilizing Current Trends and Feedback for Future Analysis: The incorporation of potential market trends and the commitment to sharing outcomes for future learning underscored the analyser’s adaptive approach. It suggested an ongoing process of refinement and learning, which is crucial for navigating the uncertain terrain of entrepreneurship.

In summary, the ‘Prospect Theory Analyser’ effectively utilized the key tenets of prospect theory to assist me in navigating through a complex decision. It provided a structured framework that helped in uncovering and evaluating the nuanced layers of decision-making impacted by psychological factors, thus fulfilling its mission.”

About the Daniel Kahneman

Daniel Kahneman started teaching psychology in 1961 in Jerusalem. He first studied how we see and pay attention. Later, he worked in different places like Michigan, Cambridge, and Harvard, and wrote a book about how hard we try when doing mental tasks. Kahneman and his friend Amos Tversky wrote many papers together. They looked into how people make decisions and think about what might have happened but didn’t. They found that people don’t always make choices logically because of certain biases.

Their work led to a new theory called “prospect theory,” which talks about how people decide when they are not sure about the outcome. It says we care more about losing than winning. Kahneman got a big award in 2002 for mixing psychology with economics, helping us understand how people make choices.

Kahneman also thought a lot about happiness and what makes life good or bad. He said our happiness is about more than just what’s happening now; it’s also about how we remember and think about our experiences. He and some colleagues made a new way to study happiness by asking people to remember their day and say how they felt during it.

Kahneman believes that even though we think certain things will make us happy in the future, we often overestimate their importance because we focus too much on them and forget about other things that matter. For example, people might think living in a sunny place like California will make them happier, but in reality, happiness depends on many factors, not just the weather.

Link: here

Acknowledgements

Let me thank Sebastiano Barisoni, journalist and radio presenter of “Focus Economia” Radio 24 for inspiring this project.

2024-03-30

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)

AI and the Future of Work: Navigating the Crossroads of Innovation and Employment Crisis in the UK

by Massimo 2024-03-27

Job Apocalypse - new report predicts 8 million jobs cancelled because of Generative AI

In recent discourse, particularly within the realms of professional conferences, the potential for Artificial Intelligence (AI) to supplant human roles has been a topic of intense debate. I have often presented the argument, using the role of a secretary as an exemplar, demonstrating how Generative AI could feasibly undertake between 45% to 60% of tasks traditionally assigned to secretarial staff. Furthermore, I’ve highlighted the possibility that AI could impact up to 80% of women in the western world, underscoring the profound implications AI holds for the workforce.

Entry level, part-time and administrative jobs will be the most exposed to being replaced by AI under a “worst-case scenario”.

Impact on Jobs based on IMF Analysis: Referencing the International Monetary Fund’s analysis, the IPPR highlights that AI could affect almost 40% of jobs globally. It has examined 22,000 tasks within the UK economy to assess the potential for automation by AI.
Current and Future AI adoption waves: The IPPR estimates that 11% of tasks are exposed to the “first wave” of AI adoption, involving routine cognitive tasks like database management and strategic tasks such as scheduling or inventory management. A “second wave” could see AI automating 59% of tasks, including non-routine tasks that could impact higher-earning jobs.
Scenarios for AI Impact on the Job Market: The IPPR outlines three potential scenarios:
1. Worst-case scenario: All jobs are at risk, leading to 7.9 million job losses and no GDP gains.
2. Mid-level scenario: Job losses exceed 800,000, but there are gains of £260 billion annually.
3. Best-case scenario: Jobs adapt and augment with AI, resulting in no job losses and a 13% economic boost to GDP.
Government Strategy Requirement: To avoid the worst-case scenario and aim for the best-case outcome, the IPPR emphasizes the need for a “job-centric industrial strategy” from the government.
AI Deployment for social needs: AI could be used to improve efficiency in social care and mental health services by automating back-office tasks, addressing social needs effectively.
Potential for Labour Market Disruption or Growth: Carsten Jung, a senior economist at IPPR, notes that existing generative AI could either cause significant labour market disruption or greatly enhance economic growth. The pace and manner of AI adoption by employers are critical.
Technological Change Management: Historical examples show that technological transitions can be beneficial if well-managed but can lead to disruption if left uncontrolled. Specific occupations, especially in back-office roles, could be significantly affected by generative AI.
Impact on Specific Roles: Bhargav Srinivasa Desikan, a senior research fellow at IPPR, mentions that roles such as copywriters, graphic designers, and personal assistants could be heavily affected by AI.
Call for Government Strategy on AI: The Institute of Directors urges the UK government to clarify its AI strategy and regulatory approach to prevent the UK economy from being left behind.
Importance of Upskilling and Broadening Hiring Pools: Matt Monette, UK&I country lead at payroll company Deel, highlights the importance of upskilling employees and broadening hiring pools, including hiring from abroad, to harness the benefits of AI effectively.

Read more about impact of AI on jobs and society clicking here.

2024-03-27

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)

AI and the Future of Society: redefining the Foundation of Human Civilization.

by Massimo 2024-03-26

The challenges of human society in redefining pillars of social living

The rapid advancement of technology, particularly in the field of artificial intelligence (AI), is poised to redefine our society’s structure. This powerful transformation promises to be disruptive, altering our cultural norms and daily lives, especially in Western societies. It’s crucial to examine the implications of shifting towards an AI-driven world, a change happening much faster than many expect.

AI’s impact on humanity goes beyond industrial reshaping and task automation. It challenges our ecological, cultural, and ethical frameworks. AI holds the potential to either mitigate or worsen our ecological crisis, marking a significant shift after billions of years of organic life dominating our planet. Today, let’s consider the insights of two key figures on AI’s next revolution.

The discourse on AI’s impact on humanity extends far beyond its potential to reshape industries or automate tasks. It delves into the very core of our ecological, cultural, and ethical existence.
AI’s dual potential to either ameliorate or exacerbate the ecological crisis reflects its complex relationship with the environment. The emergence of inorganic life forms, or at the very least, inorganic agents, after four billion years of organic dominance, marks a pivotal shift in Earth’s ecological narrative. Let’s explore what 2 eminent geopolitical actors say today on the next revolution of AI.

Google is terrified for a good reason: why bother searching yourself when you can just ask AI to tell you anything you want? You don't need to search! The news industry and the advertisement industry should also be terrified: why read a newspaper when I can just ask AI to tell what's new, what's the point..
What's the purpose of advertisements when I can just ask AI to tell what to buy? There is a chance that within a very short time the entire advertisement industry will collapse.

Yuval Noah Harari

Historian, philosopher and best-selling author of 'Sapiens' and 'Homo Deus'

Navigating the Horizon: The Singularity and beyond

In a thought-provoking discussion, Elon Musk, a leading tech innovator, explored the implications of superintelligence. As we approach what could be the most significant technological leap in human history, the concept of the Singularity — the point where AI surpasses human intelligence — is of paramount importance.

Musk acknowledges the unpredictability of superintelligence, with potential outcomes ranging from utopian to dystopian. He soberly estimates a 10-20% chance that superintelligence could end humanity. However, Musk remains optimistic, envisioning a future where AI brings abundance and ends scarcity, enhancing human capabilities and solving global challenges.

The exponential growth in AI compute power highlights the speed of AI’s advancement. We may soon reach a point where AI’s cognitive abilities surpass human intellect. This raises questions about human relevance in an AI-dominated future.

Musk’s dialogue suggests a hopeful future where humans are freed from mundane tasks to focus on creativity and personal growth. He envisions AI as a catalyst for societal advancement, not a threat.

Despite the uncertainties, Musk’s outlook remains fundamentally optimistic. He envisions a future where the positive scenarios of AI significantly outweigh the negatives, ushering in an era of abundance. Here, the convergence of AI and robotics could democratize access to goods and services, making scarcity a concept of the past. This vision aligns with the broader aspirational goals of AI—to enhance human capabilities and address pressing global challenges.

A striking aspect is the exponential growth in AI compute power, a trend that underscores the rapid pace at which AI capabilities are expanding. With AI compute potentially increasing by a factor of 10 every six months, we are fast approaching a scenario where digital intelligence eclipses human intellect not just in specialized tasks but in comprehensive cognitive abilities.

The Future of Human Relevance

As we navigate this uncharted territory, questions about human relevance and purpose in an AI-dominated future emerge. Musk’s perspective offers a paradoxically hopeful view: the same technologies that challenge our traditional roles also hold the promise of liberating humans from drudgery, enabling a focus on creativity, exploration, and personal growth.

Musk’s dialogue invites us to envision a future where AI and humans coexist harmoniously, with AI serving as a catalyst for societal advancement rather than a threat to our existence.

As we ponder the dawn of the singularity, it becomes evident that our collective efforts in ethical AI development, robust safety frameworks, and forward-thinking policies will be crucial in shaping a future where technology amplifies human potential without compromising our core values and existential security. The path forward requires not just technological innovation but a deep, principled commitment to navigating the moral and ethical dimensions of the AI frontier.

The huge uncertainty because of AI: the interview to Yuval Noah Harari

The assertion by renowned historian and author Yuval Noah Harari that today’s education system is ill-prepared for the future, underscores the profound impact AI is expected to have.

Harari draws upon a historical perspective to articulate this point, contrasting the relatively stable skill sets required for survival during periods like the Middle Ages with the dynamic, ever-evolving demands of the AI era. This comparison highlights a stark reality: the skills that have been deemed essential for generations are rapidly losing their relevance, overtaken by the needs of a world where traditional industries are being disrupted, and new ones are emerging at an unprecedented rate.

Harari’s critique extends beyond mere observation, touching on the inherent uncertainty that defines our future. With AI catalysing change across every sector, predicting the landscape of the world two decades hence becomes an exercise in speculation. The careers and industries that today’s students are being prepared for may no longer exist, rendering current educational paradigms obsolete. This unpredictability necessitates a radical overhaul of how education is approached, shifting the focus from the acquisition of static knowledge and skills to fostering adaptability, critical thinking, and a commitment to lifelong learning.

The implications of Harari’s analysis for education are profound. In an era where AI is set to revolutionize the workforce, clinging to outdated educational models could leave the next generation ill-equipped to navigate the complexities of a rapidly changing world. Instead, education systems must evolve, placing a greater emphasis on developing the capacity to learn new skills, adapt to new environments, and think critically about the challenges and opportunities that lie ahead.

Moreover, this shift in educational focus acknowledges a fundamental truth about the AI-driven future: it is not the specific knowledge or skills one possesses that will matter most, but rather the ability to adapt, grow, and learn continuously. In this context, the role of educators will also transform, from disseminators of information to facilitators of learning experiences that encourage experimentation, curiosity, and resilience in the face of change.

The collapse of Google, digital advertising and other markets

- AI is part of the ecological crisis, potentially aiding or worsening it, introducing inorganic life forms for the first time in Earth’s 4-billion-year history.

The narrative on Artificial Intelligence begins at the precipice of a new era where technology intersects with every facet of life. The advent of AI marks not merely an evolutionary step in our technological capabilities but signals a transformative shift with profound implications. As we stand on the cusp of this change, it is imperative to scrutinize the essence and trajectory of AI, to grapple with its far-reaching consequences that promise to redefine the very framework of our existence.

Delving deeper into the nuances of AI, it becomes clear that its most formidable power lies in its proficiency with language. Language, the cornerstone of civilization, now under the potential dominion of AI, raises salient discussions about authenticity, integrity, and the societal constructs we hold sacrosanct. The emergence of deep fakes, the automation of legal frameworks, and the manipulation of social narratives by AI demand a rigorous and discerning examination of this technology’s place in society.

AI’s rapid evolution brings forth capabilities that were once the domain of speculative fiction: generating text, images, music, and even engaging in deep, meaningful interactions with humans. These abilities, while remarkable, also pose significant questions about the integrity of our social fabric, the authenticity of our interactions, and the vulnerability of our institutions.

The core of AI’s transformative power lies in its ability to manipulate language. Language is the foundation upon which our societies are built—from laws and financial systems to cultural norms and personal relationships. AI’s mastery over language allows it to penetrate these systems, potentially influencing them in ways that are currently hard to predict or control. This mastery extends to creating not just coherent narratives but also generating deep fakes, drafting legal documents, and forming connections that can deeply influence human emotions and decisions.

2024-03-26

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)

Enhancing Multilingual Models with Active Forgetting

by Massimo 2024-03-13

Improving language plasticity through pretraining with active forgetting presents a compelling approach to enhancing the flexibility and efficiency of PLMs across languages. The demonstrated benefits in terms of adaptation speed and performance in low-data settings, especially for distant languages, highlight its potential in a paper “Improving Language Plasticity via Pretraining with Active Forgetting” (Yihong Chen, Kelly Marchisio, Roberta Raileanu, David Ifeoluwa Adelani, Pontus Stenetorp, Sebastian Riedel, Mikel Artetxe).

In this groundbreaking study, scientists have unlocked a new method to vastly improve the adaptability of artificial intelligence in understanding and processing multiple languages. Traditional Pretrained Language Models (PLMs), known for their prowess in a myriad of Natural Language Processing (NLP) tasks, often stumble when introduced to new linguistic territories due to their heavy reliance on vast datasets and significant computational power. Addressing this critical bottleneck, the research introduces an “active forgetting” technique, focusing on the periodic reset of the token embedding layer within these models.

The essence of this approach lies in its simplicity and efficiency. By selectively resetting token embeddings at intervals while maintaining other parameters intact, the model engages in a continuous cycle of re-learning. This process, akin to a meta-learning effect, is believed to bolster the model’s abstract reasoning and generalization skills across diverse languages. It challenges the model to avoid leaning on memorized shortcuts, thereby enhancing its linguistic plasticity.

Empirical evidence from the study paints a compelling picture: PLMs equipped with this active forgetting mechanism surpass their conventional counterparts in cross-lingual transfer tests. These models not only demonstrate superior performance but also achieve it with remarkable speed during the adaptation phase. Their proficiency is especially pronounced in handling languages with significant lexical and grammatical deviations from English, such as Arabic, Hindi, Thai, and Turkish.

This innovative method stands as a testament to the evolving landscape of machine learning, where the ability to quickly adapt and learn from minimal data is increasingly paramount. As the digital world becomes more interconnected, the demand for multilingual AI tools that can seamlessly navigate the complexities of global languages will continue to soar. This research marks a significant step forward, promising a future where language barriers are effortlessly surmounted by intelligent machines, heralding a new era of inclusivity and accessibility in technology.

Accelerating Convergence with Active Forgetting in Pretrained Models

The concept of improving language plasticity through pretraining with active forgetting offers a novel approach to enhancing the adaptability of pretrained language models (PLMs) to new languages. This method seeks to address the challenges faced when applying PLMs to languages for which they were not originally trained, a significant barrier to universal accessibility of PLM capabilities. Traditional methods, such as learning a new embedding layer for the new language, though effective, are criticized for their data and compute inefficiency.
The proposed solution introduces an active forgetting mechanism during pretraining, characterized by periodically resetting the embedding layer, thereby simulating a form of meta-learning. This encourages the PLM to learn new embeddings more efficiently within a limited number of updates.

The experimental findings, particularly with RoBERTa, validate the effectiveness of this method. Models pretrained with active forgetting demonstrate not only faster convergence during the language adaptation phase but also superior performance in low-data regimes, especially for languages that are linguistically distant from English. These results underscore the potential of active forgetting as a strategy to increase the linguistic adaptability of PLMs, making them more accessible and efficient across a broader range of languages.

However, the approach is not without its limitations. The simplicity of directly resetting embeddings to random initialization may not always be optimal. Future work could explore more sophisticated methods of introducing variability or controlled forgetting, which might yield further improvements in the model’s adaptability and efficiency. Additionally, while the experiments focus on RoBERTa, applying this technique to other architectures or in multi-lingual pretraining contexts could provide more insights into its generalizability and effectiveness across different settings.

The strategy’s success probability hinges on the balance between forgetting and learning, ensuring that the model retains its ability to generalize from its pretraining while becoming flexible enough to adapt to new linguistic contexts efficiently. The evidence presented suggests a promising direction, but the real-world applicability and scalability of such an approach would need thorough examination in diverse linguistic landscapes and practical use cases.

Kelly Marchisio's explanation at NeurIPS2023

Applying language models to a new language can be difficult. This is a barrier to making their capabilities universally accessible.

Here’s Kelly Marchisio @cheeesio explaining “Improving Language Plasticity via Pretraining with Active Forgetting” at #NeurIPS2023. The method… pic.twitter.com/9tfS3lLpPb
— cohere (@cohere) January 22, 2024

More about NeurIPS2023.

2024-03-13

Artificial Intelligence Large Language Models LLMs and Natural Language Processing (NLP)

Response to Professor Michael Wooldridge on Generative AI intelligence (The Turing Lectures: The future of generative AI)

by Massimo 2024-03-02

A Reflective Commentary on Professor Michael Wooldridge's Presentation on Intelligence and Generative AI

I recently watched a presentation for The Alan Turing Institute on Generative AI, AGI and ChatGPT. Professor Michael Wooldridge’s presentation was commendable in explaining all terms, limitations and capabilities of these systems; however, it appears that the concept of “intelligence” discussed was solely centred on human intelligence.

This perspective attempts to align with, and potentially project, inherent biases onto the emergent intelligence manifested by Large Language Models (LLMs).

Should extraterrestrial beings visit Earth, they would undoubtedly possess a unique form of intelligence, distinct from human cognition. Would we then, once again, question their intelligence based on our standards? Our conception of intelligence is inherently shaped by our terrestrial existence.

Conversely, GPTs develop their intelligence within a different framework, evolving as they assimilate more information. What the presentation omitted is that researchers found finally a new, incredible and effective methodology for capturing human knowledge like humans, through the analysis of outputs such as literature and audio/video/image recordings. This artificial intelligence surpasses human capabilities in various domains by leveraging this accumulated knowledge. Despite existing limitations, the rapid pace of technological advancement suggests significant progress in the near future.

LLM systems are indeed cultivating a form of intelligence that diverges from human intelligence.In their realm, GPTs are poised for substantial growth, employing their intelligence in ways that could supplant tasks traditionally performed with human intellect. However, this evolution is primarily driven by corporate profit motives, rather than the augmentation of human life. These systems fundamentally aim to assimilate the inputs and outputs of human activities, documented as “readable content”. Consequently, all human-produced outputs can be synthesized and replicated by these systems, enabling them to emulate emotions and reasoning.

For example, although they will never truly “experience” emotions, they can generate emotional expressions akin to those recorded in books, videos, and audio formats, mirroring human emotional output: humans interpret emotions in others reading indeed emotional signals. They are learning to do the same.

p.s. ChatGPT easily passed the Turing Test recently, but, I guess, most of the users noticed already it is capable to mimic human brain in generating responses.

Additional details can be found here:
– https://www.buonaiuto.work/ai-mirrors-human-personality-chatgpt-passed-the-turing-test/
– https://www.buonaiuto.work/the-impact-of-chatgpt-on-the-future-of-jobs-and-the-advent-of-the-real-time-applications/

Advent of AGI

The recent proclamation by Ben Goertzel, the acclaimed “father of Artificial General Intelligence” (AGI), at a summit in Panama City in March 2024, has sent ripples through the scientific community and beyond. Goertzel asserts that we are a mere three years away from achieving an AI with a mind akin to our own. This prediction, pinpointing the arrival of AGI between 2027 and 2030, beckons us to ponder the limitless potential and the existential quandaries posed by such technological leaps.

The envisioned AGI transcends the capabilities of today’s AI, which excels in specific niches, hinting at a future where AI can perform human-like reasoning across diverse domains.

This pivotal moment, the “singularity,” could unlock the gates to Artificial Superintelligence (ASI)—entities with cognitive prowess that could dwarf the collective intellect of humanity.

Goertzel’s mention of the OpenCog Hyperon framework signals an era of integrated AI architectures, capable of distributing cognition on a scale previously unimaginable.

This bold future is not merely a technological milestone; it is a beacon that illuminates the profound societal, ethical, and existential questions we must grapple with. As we edge closer to realizing AGI, the dialogue shifts from if to when, urging us to contemplate the economic, ethical, and philosophical implications of living alongside entities whose intelligence mirrors, or even surpasses, our own.

The journey towards Artificial General Intelligence (AGI) is marked by both significant promises and remarkable prospects. It promises a future in which AI transforms all aspects of human existence, ranging from healthcare to education, and prompts a reevaluation of our notions of consciousness and self. At this pivotal junction, the necessity for strategic foresight is paramount: we must guide the emergence of AGI to ensure it fosters a future that enhances humanity’s finest qualities.

The quest for AGI transcends the mere development of a new intelligence form; it is about reshaping our shared fate in the universe.

2024-03-02