ARCHETYPE AI

0

Connecting the Dots: How AI Can Make Sense of the Real World

Archetype AI Team
  • 4 mins
  • December 12, 2024

Unlike machines, humans have a sixth sense — a unique ability to instantly connect and interpret scattered signals and turn them into a cohesive narrative. Yet our smart devices, despite access to billions of sensors, haven't been able to match this basic human ability. By fusing simple sensor data with contextual awareness, Archetype AI's large behavior model can reach contextual understanding of events in the physical world — much like humans do. Read on to learn how Newton "learns" to make sense of complex situations without relying on cameras, and deliver better experiences for humans.

Learning from Human Intelligence

Humans have an extraordinary ability to derive meaning and behaviors from scattered cues and incomplete information. For example, seeing someone biking with children on a weekday morning naturally leads us to infer that they are likely heading to school. Similarly, the sound of broken glass followed by footsteps immediately suggests the possibility of an accident or an intruder, depending on the time and place where the event occurs.

Our brains can seamlessly and effortlessly connect disparate sensory cues and contextual information and make quick, intuitive — often subconscious — decisions.

This capability to "connect the dots" is central to human intelligence. It emerges from our brain's ability to effortlessly fuse scattered details, such as perceptual cues (e.g., seeing kids on bikes) and high-level contextual knowledge (e.g., weekday mornings typically involve school runs), into a cohesive understanding of the physical world in the present moment. Importantly, this capability allows us to predict, make decisions, and plan our next steps even when confronted with specific situations that we have never seen before.

Using data from just two sensors — a microphone to detect a few predefined sounds and a radar for capturing human presence — Newton can integrate spatial and temporal contexts to analyze sequences of events and generate endless interpretations of reality.

Working with Infineon, a global semiconductor manufacturer and leader in sensing and IoT, we are exploring how such powerful human-like functions can be developed and deployed in real-world applications using generative physical AI models like Newton. These models seamlessly integrate real-time events captured by simple, ubiquitous sensors — such as radars, microphones, proximity sensors, and environmental sensors — with high-level contextual information to generate rich and detailed interpretations of real-world behaviors. Importantly, this is achieved without requiring developers to explicitly define such interpretations or relying on complex, expensive, and privacy-invasive sensors like cameras.

Understanding the Sensor Fusion Challenge

The real world — buildings, appliances, vehicles, factory floors, and electrical grids — runs on sensor data. Hundreds of billions of sensors operate around us, capturing various aspects of physical reality. While interpreting a single sensor signal in isolation is relatively straightforward, fusing data from a multitude of sensors distributed across a physical space into a single actionable interpretation remains a significant challenge.

Today's sensors are interpreted separately, missing dynamic spatial patterns of behavior. Fusing sensor data is key to unlocking the full potential of smart environments.

The challenge lies in the exponential growth of possible interpretations as the number of sensors, locations, and event histories increases. Even for very simple systems, the number of potential interpretations rapidly becomes overwhelming, far beyond what humans can handle. For example, the system of two binary sensors shown in the figure below would generate 1,536* possible scenarios. Adding just one more binary sensor skyrockets the number of interpretable contexts to 24,576!

Previous events matter in physical systems. Two binary signals—like "presence and absence" or "alarm and no alarm"—can have completely different meanings depending on the sequence of earlier events, even with just four events in the history.

This happens because most real-world processes are non-Markovian, meaning they depend on both current observations and past events: identical events can mean entirely different things based on what happened before, as shown in the figure above. This requires combining sensor data with its history, greatly increasing the number of possible interpretations. To conclude, manually programming such systems to be robust and comprehensive has been practically impossible, until now.

Unlocking Sensor and Context Fusion with AI

Generative physical AI models, such as Newton, are able to overcome these challenges for the first time, unlocking a boundless range of applications. We explored Newton's ability to interpret real-world context and human activities by combining radar and microphone data. In our demo scenarios, Newton powers a home assistant in a kitchen setting, helping a user through their morning routine in one situation and in another helping to keep residents safe when the smoke alarm goes off.

The model can infer an unlimited number of interpretations directly from sensor data across timelines of any length. By combining multiple simple, privacy-friendly sensors into a larger network, it can generate detailed descriptions of dynamic scenes and contexts.

When fused with additional contextual data — such as location, time, day of the week, weather, news, or user preferences — Newton can provide personalized and relevant recommendations or services. This capability makes it possible to go beyond basic sensor interpretations, offering meaningful insights tailored to the needs of individual users or organizations.

When Newton is provided with time of day context, it can recognize, for example, that a nighttime alarm needs a different interpretation than a daytime alarm. In this video when someone leaves the kitchen without turning off the alarm, Newton suggests notifying other residents.

Newton's ability to seamlessly fuse large amounts of sensor data and contextual information at scale opens the door to a wide range of exciting applications. Here are a few examples:

  • Smart Homes: Detect human activities in a privacy-respecting way without relying on cameras, and deliver personalized services such as security, safety, wellness, and entertainment.
  • Automotive: Monitor driver behavior, such as signs of drowsiness or health emergencies. Understand in-car context to provide passengers with mobility-related services tailored to their needs.
  • Manufacturing: Improve safety, efficiency, and adherence to best practices by integrating outputs from equipment, occupancy, and environmental sensors. This approach can scale from individual machines to entire factory floors and facilities.

These examples illustrate the potential of Newton to revolutionize industries by leveraging sensor data to create intelligent context-aware solutions.

Looking forward and given Newton's fundamental capability to interpret physical event data, a key question arises: can Newton also do "next event prediction" similar to how LLMs do "next word prediction"? Can Newton predict the future evolution of the physical world based on current and past observations?

This project was completed in collaboration with Infineon, a global semiconductor manufacturer and leader in sensing and IoT. To learn more about the partnership, check out how Infineon and Archetype AI are unlocking the future of AI-powered sensors.

*We assume here that the system is deployed in two locations, e.g. kitchen and living room, and three times of the day are provided to Newton as additional context: morning, midday and night.

Recommended posts

Implementing AI in industrial settings comes with significant challenges like ensuring employee safety, estimating productivity, and monitoring hazards—all requiring real-time processing. However, sending sensor data to the cloud for analysis introduces latency and security concerns, driving up costs. The solution? Eliminate the cloud. With Archetype AI’s Newton foundation model, AI can run on local machines using a single off-the-shelf GPU, delivering low latency, high security, and reduced costs in environments like manufacturing, logistics, transportation, and construction.

September 5, 2024

We’re building the first AI foundation model that learns about the physical world directly from sensor data, with the goal of helping humanity understand the complex behavior patterns of the world around us all.

November 1, 2023

At Archetype, we want to use AI to solve real world problems by empowering organizations to build for their own use cases. We aren’t building verticalized solutions –instead, we want to give engineers, developers, and companies the AI tools and platform they need to create their own solutions in the physical world.

We are excited to share a milestone in our journey toward developing a physical AI foundation model. In a recent paper by the Archetype AI team, "A Phenomenological AI Foundation Model for Physical Signals," we demonstrate how an AI foundation model can effectively encode and predict physical behaviors and processes it has never encountered before, without being explicitly taught underlying physical principles. Read on to explore our key findings.

October 17, 2024

Currently, digital companions are the dominant metaphor for understanding AI systems. However, as the field of generative AI continues to evolve, it's crucial to examine how we frame and comprehend these technologies—it will influence how we develop, interact with, and regulate AI. In this blog post, we'll explore different metaphors used in AI products and discuss how they shape our mental models of AI systems.

September 25, 2024