Modern cityscape surveillance cameras people public spaces privacy safety

Sourcing video data for AI training: overcoming the challenges of scale, safety and representation

Today

By Rahul Yadav, Chief Technology Officer, Milestone Systems

As AI advances, its potential benefits to video security are undeniable. The market for AI-powered video analytics is predicted to grow from $32 billion in 2025 to over $133 billion by 2030, due to its wide-ranging applications in e.g. cities, retail, logistics and manufacturing, residential developments and transport., where it can increase operational efficiency, improve energy efficiency, deliver marketing and sales insights – and much more.

To unlock these benefits, however, organisations must always err on the side of responsible use. Most importantly, this also extends to the data used to power and train AI models.

Understanding data transparency and quality

As a recent Amnesty International report highlighted, there needs to be transparency around the use of AI-powered video surveillance.

However, to do so, it's vital that we go back a stage and closely consider the quality and origin of the data that an AI model has been trained and fine-tuned on. As AI-enabled video is rolled out, developers mustn't fall into the same trap as their colleagues working on the large language models (LLM) on which generative AI depends, namely, the challenge of sourcing enough data for AI model training. LLM developers, unfortunately, find themselves paying increasing amounts for large data sets, due to their scarcity, lack of privacy measures, or facing litigation from rights holders. Bias from unrepresentative data may have also tainted the implementations of AI, with knock-on impacts on people's trust in AI and the insights delivered.

Building accurate video AI models

In order to mitigate many of these challenges before they spread, considering how to gain access to high-quality, responsibly sourced, visual data to train AI video models is crucial. The datasets used to train AI models need to be representative, diverse to ensure accuracy and fairness, and legally sourced to respect data owners' IP rights. This is not a simple task to obtain, especially when dealing with sensors such as cameras that can collect a lot of personal or confidential information.

One solution to this challenge is Project Hafnia; a platform developed by Milestone Systems in partnership with NVIDIA, leveraging NVIDIA NeMo Curator and AI model. Project Hafnia enables data generators to share and utilise their data and allows developers to access traceable and regulatory-compliant annotated video data, which thereby can be used to train AI models.

One of the first data generators to the platform is the American city of Dubuque, Iowa. Along with AI analytics company Vaidio, Milestone built a collaborative visual language model that transformed Dubuque's raw video, via anonymization and curation, into powerful training material, improving AI accuracy from 80% to over 95%. This leap forward has enabled smarter traffic management, quicker emergency responses, and stronger public safety. All done responsibly and without massive infrastructure overhauls.

With Milestone's recent acquisition of brighter AI, a company specializing in anonymization solutions, a further layer of data privacy has been added to Project Hafnia. Thus, brighter AI's technology automatically detects a personal identifier such as a face or a license plate and generates a synthetic replacement.

Consolidating and curating data from multiple data generators is one way for developers to obtain enough visual data on to develop accurate AI models to detect events such as vandalism, vehicle accidents, and traffic flow.

Synthetic data for hard-to-gather data sets

Another solution comes in the form of synthetic data, which is artificially generated or augmented datasets that simulate or generalise real-world conditions. Using synthetic data, AI developers can train models on vast amounts of diverse and representative information while mitigating the ethical and legal concerns surrounding privacy and consent.

For example, in Aalborg Harbour in Denmark, training an AI model to detect individuals falling into the harbour was not possible due to the dangers that would pose to human volunteers. The dataset also needed to include a diversity of human actors such as wheelchair users. Using dummies couldn't fully capture the full complexity, either. The best solution, therefore, was synthetic data that could expand the training dataset with diverse falling scenarios, avoiding safety or ethics concerns. The AI model developed through this process shows promising results to alert rescue teams if and when a person falls into the harbour, increasing the chances of survival by minimising response times and reducing cold water exposure.

Unlocking the potential of AI in video

AI holds great promise for our cities, buildings, and individual safety. Yet, this can only be realised with AI models that fully capture the complexities of our built environments and human behaviour. Video analytics developers should explore their options when trying to build a comprehensive data set for AI model training. New, responsible options are emerging - from consolidated data gathered across many data generators to synthetic data generators. It's just a matter of where to look.

Share on:

Guides

Search

Sourcing video data for AI training: overcoming the challenges of scale, safety and representation

Top stories