Unlike machines, humans have a sixth sense — a unique ability to instantly connect and interpret scattered signals and turn them into a cohesive narrative. Yet our smart devices, despite access to billions of sensors, haven't been able to match this basic human ability. By fusing simple sensor data with contextual awareness, Archetype AI's large behavior model can reach contextual understanding of events in the physical world — much like humans do. Read on to learn how Newton "learns" to make sense of complex situations without relying on cameras, and deliver better experiences for humans.
Learning from Human Intelligence
Humans have an extraordinary ability to derive meaning and behaviors from scattered cues and incomplete information. For example, seeing someone biking with children on a weekday morning naturally leads us to infer that they are likely heading to school. Similarly, the sound of broken glass followed by footsteps immediately suggests the possibility of an accident or an intruder, depending on the time and place where the event occurs.
This capability to "connect the dots" is central to human intelligence. It emerges from our brain's ability to effortlessly fuse scattered details, such as perceptual cues (e.g., seeing kids on bikes) and high-level contextual knowledge (e.g., weekday mornings typically involve school runs), into a cohesive understanding of the physical world in the present moment. Importantly, this capability allows us to predict, make decisions, and plan our next steps even when confronted with specific situations that we have never seen before.
Using data from just two sensors — a microphone to detect a few predefined sounds and a radar for capturing human presence — Newton can integrate spatial and temporal contexts to analyze sequences of events and generate endless interpretations of reality.
Working with Infineon, a global semiconductor manufacturer and leader in sensing and IoT, we are exploring how such powerful human-like functions can be developed and deployed in real-world applications using generative physical AI models like Newton. These models seamlessly integrate real-time events captured by simple, ubiquitous sensors — such as radars, microphones, proximity sensors, and environmental sensors — with high-level contextual information to generate rich and detailed interpretations of real-world behaviors. Importantly, this is achieved without requiring developers to explicitly define such interpretations or relying on complex, expensive, and privacy-invasive sensors like cameras.
Understanding the Sensor Fusion Challenge
The real world — buildings, appliances, vehicles, factory floors, and electrical grids — runs on sensor data. Hundreds of billions of sensors operate around us, capturing various aspects of physical reality. While interpreting a single sensor signal in isolation is relatively straightforward, fusing data from a multitude of sensors distributed across a physical space into a single actionable interpretation remains a significant challenge.
Today's sensors are interpreted separately, missing dynamic spatial patterns of behavior. Fusing sensor data is key to unlocking the full potential of smart environments.
The challenge lies in the exponential growth of possible interpretations as the number of sensors, locations, and event histories increases. Even for very simple systems, the number of potential interpretations rapidly becomes overwhelming, far beyond what humans can handle. For example, the system of two binary sensors shown in the figure below would generate 1,536* possible scenarios. Adding just one more binary sensor skyrockets the number of interpretable contexts to 24,576!
This happens because most real-world processes are non-Markovian, meaning they depend on both current observations and past events: identical events can mean entirely different things based on what happened before, as shown in the figure above. This requires combining sensor data with its history, greatly increasing the number of possible interpretations. To conclude, manually programming such systems to be robust and comprehensive has been practically impossible, until now.
Unlocking Sensor and Context Fusion with AI
Generative physical AI models, such as Newton, are able to overcome these challenges for the first time, unlocking a boundless range of applications. We explored Newton's ability to interpret real-world context and human activities by combining radar and microphone data. In our demo scenarios, Newton powers a home assistant in a kitchen setting, helping a user through their morning routine in one situation and in another helping to keep residents safe when the smoke alarm goes off.
The model can infer an unlimited number of interpretations directly from sensor data across timelines of any length. By combining multiple simple, privacy-friendly sensors into a larger network, it can generate detailed descriptions of dynamic scenes and contexts.
When fused with additional contextual data — such as location, time, day of the week, weather, news, or user preferences — Newton can provide personalized and relevant recommendations or services. This capability makes it possible to go beyond basic sensor interpretations, offering meaningful insights tailored to the needs of individual users or organizations.
When Newton is provided with time of day context, it can recognize, for example, that a nighttime alarm needs a different interpretation than a daytime alarm. In this video when someone leaves the kitchen without turning off the alarm, Newton suggests notifying other residents.
Newton's ability to seamlessly fuse large amounts of sensor data and contextual information at scale opens the door to a wide range of exciting applications. Here are a few examples:
- Smart Homes: Detect human activities in a privacy-respecting way without relying on cameras, and deliver personalized services such as security, safety, wellness, and entertainment.
- Automotive: Monitor driver behavior, such as signs of drowsiness or health emergencies. Understand in-car context to provide passengers with mobility-related services tailored to their needs.
- Manufacturing: Improve safety, efficiency, and adherence to best practices by integrating outputs from equipment, occupancy, and environmental sensors. This approach can scale from individual machines to entire factory floors and facilities.
These examples illustrate the potential of Newton to revolutionize industries by leveraging sensor data to create intelligent context-aware solutions.
Looking forward and given Newton's fundamental capability to interpret physical event data, a key question arises: can Newton also do "next event prediction" similar to how LLMs do "next word prediction"? Can Newton predict the future evolution of the physical world based on current and past observations?
This project was completed in collaboration with Infineon, a global semiconductor manufacturer and leader in sensing and IoT. To learn more about the partnership, check out how Infineon and Archetype AI are unlocking the future of AI-powered sensors.
*We assume here that the system is deployed in two locations, e.g. kitchen and living room, and three times of the day are provided to Newton as additional context: morning, midday and night.