Data lakehouse is somewhere in the middle between many concepts. Databricks seemed to be getting positive buzz, and they seem to have a reasonably good open source solution for MLops, which I would like to learn more about.
Data Mesh and data fabric were two terms that were talked about quite often, usually in conjunction with data democratization. BARC and Denodo both gave good definitions and some understandable delineations between the two. A data fabric is a data architecture that sits on top of heterogenous data assets and allows for enterprise level data management. A data mesh is the agglomeration of processes and organizational constructs that allow for businesses to have direct access to the data. There is a O’Reilly book and technical paper by Zhemak Gehghami that I will look into to learn more about this subject.
Data quality is the foundation of everything. I have almost always hired a data quality engineer in every team I have built, I was very happy to hear Dr. Sahar Changuel explain her definition of data quality and description of a data quality project.
Data cataloging, governance, behavioral intelligence of analytics users and documentation are also some of my favorite topics, as without them, data teams can quickly fall into disfunction. Ataccama and Alation gave some good presentations on the topic.
Data strategy is something that every company wants, few companies have and even fewer have implemented, in addition to being on of those words whose meaning metamorphosizes with each person who talks about it. Walid Mehanna from Merck gave a succinct presentation about his data strategy and a high-level overview the structure of how he implements it. The presentation was very helpful for my coming new challenge.
Low code – no code is a topic to which has recently convinced me. Alteryx gave a project demo. Although I am not convinced that a non-data person could gainfully use their tool, I am onboard with the idea that code is no longer has to be a bottle neck for building pipelines. Analysts can focus on delivering impactful data stories rather than writing good and compatible code.
Data culture and data education were talked about as a means to elevate the numeric and technical education of organizations. Data education and cultural initiatives are also means for data teams to create impact and for gain influence. Elevated data education, in my opinion, are necessary if business users are to use a tool such as Alterlyx, because even without code such tools necessitate knowledge about data concepts and principles.
Data products, what they are, how best to create them and the fact that data, and espcially artificial intelligence, is often included in end products to customers was also heavily discussed, also at the Kamingespräch.
Data democratization had a panel as well as many mentions throughout the conference. My personal observation is that when people talk about data democratization, they are actually describing data anarchy. Just like Germany, USA and other democracies, freedom comes through governance structure and the rule of law – and finding the right level of rule of law – rather than the absolute freedom to do what we want when we want.
This list is far from exhaustive, but has some of the impressions I have that are close to my own work.
There were also some great panels, for example, about digital sovereignty and the future of mobility. And of course a party. The Data Festival was lots of fun. I learned tons and met some inspiring people.