NEC Develops AI Technology for Digitalizing Work Tasks

NEC Develops AI Technology for Digitalizing Work Tasks Without the Need for Pre-Training and Utilizing Video From Multiple Cameras Covering Wide Area Worksites

Contributing to worksite optimization through the recognition of work tasks based solely on text input

NEC announced that it has developed AI technology capable of recognizing and digitalizing the tasks of workers without pre-training and utilizing on-site video from wide area worksites that use multiple cameras, including distribution warehouses, factories, and construction sites.

As this newly developed technology can be installed immediately at industrial workplaces, it will contribute to the visualization of entire worksites, which had previously not been possible, thereby improving productivity, optimizing the allocation of human resources, and streamlining workflows. NEC intends to commercialize this technology by fiscal year 2026.

While labor shortages are intensifying at distribution warehouses, factories, construction sites, and other industrial workplaces, many processes reliant on manual labor remain. As such, there is a growing demand to optimize the allocation of personnel and work processes by visualizing working conditions to utilize the limited labor force more effectively. Although technology for recognizing work tasks from video already exists, in order to recognize specific tasks at workplaces, a great deal of time and effort have been needed for preparations, including collecting on-site video data and training AI models.

Moreover, to digitalize work tasks over an entire wide area worksite using multiple cameras, workers must be identified across video from all cameras, and the work tasks recognition results must be consolidated for each worker. With conventional technology, however, it has been challenging to accurately distinguish workers wearing identical uniforms and to continue identifying the same person without error across multiple cameras.

The features of NEC’s newly developed technology are as follows.

1. Can be installed immediately since it is capable of recognizing work tasks in video from text input alone

Utilizing a vision language model (VLM), NEC has developed AI technology capable of recognizing a wide variety of work tasks without the need for pre-training and using video data. Recognition simply requires text input explaining individual work tasks, such as "retrieving packages from a shelf" for picking tasks or "pushing a cart to transport items" for cart transporting tasks.

In the past, work task recognition required collecting and annotating video data, and conducting AI model training, which could take several weeks to complete. In addition, identifying relevant objects that workers interact with or operate has conventionally proven difficult, thus recognizing tasks from video at industrial sites where various objects are intermingled has been a challenge.

This newly developed technology first (A) utilizes VLM in advance to analyze and extract features from text input describing individual work tasks. When analyzing video, this technology (B) identifies the relevant objects which a worker interacts with or operates using a proprietary AI model* for capturing relationships between people and objects, and then utilizes VLM to extract features from images containing the worker and the identified objects. By comparing and matching features extracted in (A) and (B), work tasks can be recognized from text input alone.

2. Contributes to the optimization of wide area worksites through digitalization of worker movement and work tasks

The utilization of multiple cameras contributes to the optimization of on-site work by identifying workers moving around the entire site without relying on clothing or other visual characteristics, and digitalizing the work tasks of each worker over an extended period of time across the entire worksite.

By estimating the locations of individual workers in a digital twin space (three-dimensional coordinates) by referencing the locations of the workers visible in each camera image (two-dimensional coordinates) and measuring the proximity of their locations and movement patterns on a digital twin, this technology makes it possible to identify the same person across multiple cameras with a high degree of accuracy. Moreover, since the camera parameters (i.e., camera position and orientation) necessary to estimate locations of workers are automatically estimated, the time and effort required for on-site installation can be minimized.

Source: NEC media announcement

Follow @PipelineWire