Large language models are disrupting data operations, as well as how data teams are structured and work together.
I (Elizabeth Press from D3M Labs) spoke with Leonid Nekhymchuk (Leo), CEO and Co-Founder of Datuum.ai about how large language models will transform data operations. Datuum uses AI to connect data sources with target models, automate mapping, making data integration less time-consuming and less expensive.
Leo and my discussion in the upcoming video focuses on how LLMs help with data integration.
This article is an overview of some of the topics Leo and I covered, as well as my own perspective.
How will large language models impact DataOperations?
Leo outlined a couple of high-impact use cases, which he will outline in the video.
Data integration stands out as one of the most formidable challenges in the data ecosystem. It requires a hard-to-find combination of managerial and technical expertise, making it a scarce skill-set to acquire and costly to do well. The process itself is intricate and prone to issues such as pipeline disruptions, system malfunctions, and faulty code, which can all compromise data quality and erode trust in the data.
I asked ChatGPT about the impact of large language models on data operations. Some use cases beyond DataOperations included:
Analytical tasks such as data summarization and exploration. Many BI tools are integrating ChatGPT functionalities that enable natural language queries that result in visualizations, bypassing SQL.
Anonymization, and improving privacy through replacing personally identifiable information (PII) with synthetic, privacy-preserving placeholders was one use case it told me.
Security and threat monitoring is another use case. Large language models can help identify threats and anomalies in the data.
Training and documentation. Many data teams are often busy and underwater with tasks. Documentation is one enabler to smooth operations that is often victim to de-prioritization to more “urgent” tasks. Large language models will help us automate that! Sphynx and Confluence are two of my favorites.
Will large language models kill jobs in data?
This blog shall contain no spoiler alerts! Watch the video coming out on November 2nd.
Job profiles and team composition will be different. When simple data tasks are done quicker, the focus will be placed on generating insights.
To all the people asking me about junior jobs:
Leo offered some advice about upskilling yourself on LLMs and how adoption in enterprises has gaps.
Watch the conversation between Leo and Elizabeth on YouTube