Forbes - Why 85% Of Your AI Models May Fail

By Jameel Francis, Forbes Councils Member for Forbes Technology Council

Jameel Francis is the CEO of Kore Technologies, a data automation software firm that helps organizations drive growth and profitability.

Artificial intelligence (AI) has captured the attention of the world. It promises to transform healthcare, spur economic growth, and supercharge innovation. However, the full promise and capabilities of AI are in their infancy stage. Over time, AI could revolutionize all major industries and capital markets similar to how the Industrial Revolution propelled urbanization, capitalism, innovation, and corporate structures starting in the 1760s.

For supply chain organizations—particularly those in distribution, manufacturing, logistics, and MRO industries—AI is expected to help expand the circular economy, enhance product delivery, improve profit margins, and reduce market risks.

Yet as mighty as AI is predicted to become, achieving the expected benefits will be impossible without data quality. According to Gartner (via VentureBeat), 85% of all AI models/projects fail because of poor data quality or little to no relevant data.

AI Models Fail With Flawed Data Inputs

Data quality refers to data conditions based on factors such as accuracy, consistency, completeness, timeliness, relevance, uniqueness, and integrity.

If data quality elements are not maintained when training AI models, organizations will not realize the full potential of AI. As Troy Demmer, co-founder of Gecko Robotics, explained in his testimony before the U.S. House of Representatives Committee on Homeland Security in 2024: "AI applications are only as good as the data they are trained on. Trustworthy AI requires trustworthy data inputs." According to Demmer, even the most sophisticated AI models relying on flawed data will restrict America’s ability to adequately manage and sustain its critical infrastructure.

Poor data quality not only negatively impacts national security but also organizations’ bottom line. For instance, according to a 2021 Gartner report, poor data quality costs organizations an average of $12.9 million per year. Specifically for supply chain organizations, increased operational costs tied to flawed data can be reflected in excess inventory, delivery delays, stockouts or added fuel costs.

Some common AI model failures involving inadequate data include:

• Overfitting: When AI models follow too closely to the algorithm and do not account for untrained data.

• Edge-case neglection: A scenario that occurs infrequently and is neglected by AI models, leading to inferior performance and critical mistakes.

• Correlation dependency: When an AI model makes incorrect assumptions due to superficial correlation, leading to unreliable outcomes.

• Data bias: When AI models are trained on incomplete data and produce results that put a certain group at a disadvantage.

• Underfitting: When AI algorithms are not robust enough to effectively get trained on the provided data.

• Data drift: An AI model’s inability to adjust to changes in data over time.

Building A Robust Foundation For AI Models

Data Integration

The proliferation of digital transformation has brought previously unconnected systems online, dramatically increasing global data production. For instance, according to Statista, worldwide data creation is projected to grow to more than 180 zettabytes by the end of 2025. With so much data available, possessing quality data starts with having a complete picture of the information generated by your organization.

Data growth in recent years is even more profound for supply chain organizations, which are typically slower to adopt modern technologies. Connecting supply chain operators’ disparate data sources, such as ERP, TMS and WMS databases, is made possible by data integration methods. Data integration aggregates various information sources, systems and formats before cleaning and transforming the data into a unified view.

There are several methods of data integration including:

• Middleware-based integration: Bridges real-time data from diverse technologies, databases and tools.

• Extract, transform and load (ETL): Combines large volumes of data from various data sources, in near real time, and stages the data into a single storage environment for analytics.

• Extract, load and transform (ELT): Data is not transformed on entry to a storage environment but rather stored in its original format to enable faster loading times.

• Point-to-point (P2P) integration: Leverages custom code to directly connect two disparate systems.

• Cloud-based integration: Enables near real-time data exchange between cloud-based applications and on-premise systems or among multiple cloud-based applications.

The data integration method you select depends on factors such as your hosting environment, business and technical requirements, and budget. Nevertheless, the ability to integrate different data sources allows your organization to have the quality dataset required to develop and leverage AI models and tools.

Data Quality Management (DQM)

In addition to leveraging a data integration method, your organization must take a comprehensive approach to ensure AI models use data of the highest quality. This is where data quality management (DQM) comes into play. According to the SAS Institute, DQM helps by aggregating organizational culture, policies, technology, and data to produce results that are accurate and useful.

The first step in implementing DQM in your team is establishing data governance covering responsibilities, standards, and roles. Your firm’s data governance should regard the type of data and regions where information is stored or processed. For example, U.S. health-related data should account for Health Insurance Portability and Accountability Act (HIPAA) guidelines, whereas organizations in Europe should consider General Data Protection Regulation (GDPR) rules.

Secondly, your team must create a culture that prioritizes producing quality data. This starts with your organization’s leadership and extends to data analysts managing data systems. If quality data is not viewed as mission-critical in your organization, issues such as incomplete and outdated data will become the norm. Finally, your organization should adopt technology that assists with data cleansing, validation, quality monitoring, and issue resolution.

Conclusion

Having quality data is a requirement to develop accurate AI models and tools. As such, your organization must ensure that the data selected to train AI models is accurate, complete, and up to date. There are two main ways to lay the foundation for ensuring high data quality: 1) implementing data integration methods, and 2) creating a holistic DQM program. Enabling and maintaining data quality may seem trivial when your organization wants to quickly go to market with AI; however, skipping over this step will be costly—85% of your AI models may fail.

Read on Forbes: Why 85% Of Your AI Models May Fail

Previous
Previous

Business Wire - Data Quality is Not Being Prioritized on AI Projects, a Trend that 96% of U.S. Data Professionals Say Could Lead to Widespread Crises

Next
Next

RAND - The Root Causes of Failure for Artificial Intelligence Projects and How They Can Succeed: Avoiding the Anti-Patterns of AI.