top of page

Does quality of your data impact your generative AI model insights?

The world of banking and financial services is abuzz with the potential of generating AI. It's a game-changer that can revolutionize customer service, boost revenue, and streamline operations. But with great power comes great responsibility - and a whole lot of regulatory challenges. We need a legal framework that balances, market safety, consumer protection, and access to finance. Plus, we can't ignore the elephant in the room: trust. Customers need to have faith in AI's abilities before they can embrace it fully. That's why AI regulation must be proactive, not reactive. We can ensure it unlocks its full potential by carefully weighing the risks and benefits of generative AI in the financial sector.

Poor data quality can lead to inaccurate predictions and decisions, resulting in financial losses. In addition, data quality is essential for Generative AI to gain insights from data and generate accurate insights. According to data researchers, data scientists spend 80% of their time preparing and organizing data. Poor data quality can also decrease customer satisfaction, as inaccurate predictions can lead to wrong decisions being made. Furthermore, it can decrease customer loyalty, as customers may be unhappy with the product or service they receive. This is possible through systematic quality control and verification of data.

Did you know several guidelines are in place for managing data quality? From the Data Quality Act of the US to the ISO 8000 series of the International Organization for Standardization and even the Big Data Quality Verification Standard of the United Nations, plenty of measures are at play. And that's not all - many companies have their own customized quality certifications, too.

The accuracy and completeness of data have been shown by practitioners and researchers to have an impact on data science models based on classification and regression. In comparison to other data stages such as consumption, financial institutions are increasingly focusing on data collection and management, which makes the dimensions of Data Quality more crucial than ever. Among the many factors leading to this trend are recent changes in government policy regarding data privacy and governance, such as GDPR in Europe. In addition to regulatory drivers, this focus on data collection is motivated by the changing needs of customers, the growth of digital channels, and the expansion of diverse products such as buy-now-pay-later. The dimensions of quality that a data office has to prioritize for data collection are as follows:

  1. Accuracy: How well does data reflect reality, like a phone number from a customer?

  2. Completeness: Can complete data be processed for a specific purpose, such as "housing expense" for a loan?

  • Column completeness – Is the complete “phone number” available? Group completeness – Are all attributes of “address” available? Is there a complete fill rate in storage to process all customers?

In the financial services industry, the term "coverage" refers to the inclusion of all relevant data for specific use cases. For instance, a lending firm may have different customer segments and associated sub-products. Including all the transactions that describe customers and the products they are associated with is crucial to avoid biased or inaccurate machine learning results. Although collecting all relevant data from different sub-entities, point-of-sale systems, and partners can be challenging, it is an acknowledged aspect that must be addressed.

  • Coverage: Is there an adequate population of data for consumption? Does data cover all datasets that provide context for a use case?

Ultimately, a comprehensive data collection strategy with the right data quality checks is necessary to ensure the success of any Generative AI model or Artificial Intelligence model.

The article is written by Tejasvi Addagada

12 views0 comments


bottom of page