Asilla has, to date, developed proprietary models in pose estimation and behavior recognition, accumulated training data, and built up extensive knowledge of false positive and false negative patterns through real-world operational experience.
By integrating these assets into a VLM, the company aims to achieve detection accuracy at a practical operational level — something that general-purpose vision-language models struggle to deliver.
.png)
As an initial step, Asilla will work to reduce false alarms and missed detections in its AI security system "AI Security asilla," with a longer-term vision of building an AI platform capable of structurally understanding events in physical spaces.
.png)
In locations frequented by large numbers of unspecified individuals — such as commercial facilities, railway stations, and public spaces — the flow and behavior of people constantly changes, and unpredictable incidents can occur at any time. As the need grows to both prevent accidents and trouble before they happen and to operate efficiently without relying on excessive staffing increases, expectations for video AI continue to rise year after year.
In recent years, the emergence of general-purpose VLMs (Vision-Language Models) has driven rapid advances in AI technologies capable of understanding video and language in an integrated manner. However, while general-purpose VLMs can handle a wide variety of tasks, several challenges remain in domains such as security sites, where high precision and rapid response are essential.
First, their specialized understanding of human behavioral patterns and signs of danger is shallow, limiting their judgment accuracy in security contexts.
Second, they are not optimized for facility-specific conditions such as camera placement environments, lighting conditions, and operational rules.
Third, while feedback data identifying what constitutes a false alarm and what constitutes a missed detection is indispensable for improving AI accuracy to practical operational levels, conventional AI systems have lacked the accumulation of such data.
Asilla has embarked on a product implementation that integrates its proprietary technological assets into a VLM as a unique approach to addressing these limitations of conventional AI.
.png)
The core of differentiation in Asilla's proprietary VLM lies in two technological assets that conventional AI systems do not possess.
Since its founding, Asilla has focused exclusively on research and development of human pose estimation technology and behavior recognition AI. The company's proprietary models — capable of classifying diverse behavioral patterns such as falls, intrusions, suspicious behavior, wandering, and crowding with high accuracy — along with the large-scale training datasets built throughout that development process, form a domain-specific knowledge base that general-purpose VLMs do not possess.
When a VLM interprets video as "meaning," this behavior recognition knowledge base supplements pose-level motion information, enabling more accurate estimation of behavioral intent and degree of danger that would be difficult to determine through general visual information analysis alone.
"AI Security asilla" is deployed across a variety of facilities, including commercial complexes, railway stations, and office buildings, and has accumulated vast amounts of feedback data in the process. Understanding under what circumstances false alarms (False Positives) occur and what types of events result in missed detections (False Negatives) — this real-world operational data is the most valuable asset in improving AI accuracy.
While general-purpose VLMs are capable of broad video understanding, they lack the knowledge of "what is typically misidentified in a security context." By incorporating this knowledge of false positive and false negative patterns into the VLM's learning and inference processes, Asilla aims to achieve the level of precision required for AI that is truly usable in the field.
.png)
A VLM (Vision-Language Model) is an AI model capable of understanding video and natural language in an integrated manner. Whereas conventional AI relied on object detection, posture assessment, and threshold-based anomaly detection, Asilla's VLM — fused with the company's proprietary technological assets — enables contextual semantic understanding such as: "there is a possibility this person is about to fall," "unusual lingering behavior," "behavior that may contain dangerous intent," and "a situation where crowding is beginning to develop."
By fusing the behavior recognition technology it has cultivated over the years with VLM, Asilla aims to evolve from "AI that detects movement" to "AI that understands situations."
As the first phase of this initiative, Asilla will work to suppress false alarms (False Positives) and missed detections (False Negatives) in its existing product, the AI security system "AI Security asilla."
In field operations, the quality of detection alerts is just as important as their quantity. When false alarms occur frequently, unnecessary verification workloads increase, potentially delaying judgment on truly critical events. Missed detections, on the other hand, result in failing to detect events that should have been caught — a serious risk from a safety assurance standpoint.
By combining the proprietary VLM, detection results can be contextually supplemented and re-evaluated, enabling judgment that accounts not only for "what has happened" but also "what is in the process of happening." This will suppress unnecessary alerts while reducing missed detections, thereby improving the reliability of notifications. As a result, Asilla aims to simultaneously achieve greater efficiency in on-site verification work and enhanced reliability in safety response.
Asilla's vision extends beyond merely reducing false alarms and missed detections. By building on a VLM foundation, the company will evolve video from a simple record of events into data that captures "situations" and "the meaning of behavior."
Looking ahead, Asilla aims to realize more advanced video utilization, including sophisticated extraction of early warning signs of danger, automatic summarization of behaviors and events, natural language-based video search, and structured comprehension of facility-wide events.
This will support a shift from a "reactive operational model" — responding after accidents and incidents have already occurred — to a "proactive operational model" that identifies early signs and prepares for them before they escalate. Asilla will continue its efforts to build an AI platform that understands events in physical spaces and supports the safety and sustainability of social infrastructure.

AI Security asilla is a system that uses AI to analyze footage from existing security cameras 24 hours a day, 365 days a year, instantly detecting abnormal behaviors such as violence, falls, and intrusions, as well as attention-warranting behaviors such as wandering, crowding, and signs of physical distress.
Amid the increasingly severe shortage of security personnel, the system detects irregularities that are easily missed through human monitoring, and immediately notifies security guards and facility managers. Since it can leverage existing cameras without additional equipment investment, it is a next-generation security solution that enables high levels of safety to be maintained even with limited personnel.
Representative: Representative Director, CEO & COO: Tsuyoshi Onoe
Headquarters: 1-4-2 Nakamachi, Machida, Tokyo
Business Description: Development and provision of various products and solutions based on behavior recognition AI
Official Website: https://en.asilla.com/