Blog

Jeremy Rotter 5/22/19 Jeremy Rotter 5/22/19

2024 Data + AI Summit Recap

Yellow Shark Labs recap of the 2024 Databricks Data + AI Summit.

by Jeremy

Yellow Shark Labs was in San Francisco last week at Databricks’ Data + AI Summit! Here’s our recap of the conference.

Training

I kicked off my week on Monday with deep-dive training sessions on Delta Live Tables and Unity Catalog and learned a ton about ETL, data quality, and data governance in the Databricks ecosystem.

The combination of these two topics drove home the fact that Databricks is a platform, not just a tool. If you build ETL pipelines with a tool, you have ETL pipelines. If you want monitoring or quality checks or governance, you almost have to start from scratch and build those separately. If you build ETL within a platform, it creates a foundation for those deeper layers, allowing you to more easily iterate and grow maturity in your data systems.

Hackathon

Mark chose a more hands-on approach for Monday, entering Databricks’ GenAI hackathon, where he teamed up with Aman Sehgal and Anvesh Bethu to compete against 55 other teams in the 6-hour, 220+ person event.

Using a Databricks Notebook, the team focused on an open-source prescription claims dataset from the Databricks Marketplace, building an effective medication list for the represented patients. They then enriched this dataset using the RxNorm terminology to ensure standardized medication naming.

Next, leveraging Databricks Model Serving and their GenAI SDK, Mark and team sourced unstructured Drug-Drug Interaction (DDI) information and converted it into knowledge triples: perpetrator drug, impact, and victim drug. This structured interaction data was then applied to each patient’s effective medication list.

Finally, with the identified interactions, they used Model Serving to generate clear, actionable statements for healthcare providers and patients. This allowed them to highlight potential risks and suggest alternatives or precautions in a scalable manner.

While Mark has worked with similar concepts before, this project was unique in that it utilized technology capable of scaling these processes and integrating diverse data sources effectively. This kind of solution could make a tangible difference in healthcare, for patients, providers, and payers.

The judges certainly agreed, as “Medication Regimen Optimization Using LLMs” was awarded the $10,000 first prize! Mark became a minor celebrity at the conference, being recognized for his exploits, and later appearing with his team on the “Live From the Lakehouse” show on the conference expo floor.

Breakout Sessions

The conference’s 500+ breakout sessions began on Tuesday. Between us, Mark and I attended over 30 - with topics ranging from data architecture, data processing, analytics, data quality, data governance, process, and AI, AI, AI, there was plenty to choose from. Simon Whiteley’s lightning talk on the Medallion architecture was definitely a favorite. (Here’s a similar presentation.)

While we both took in a heavy dose of Databricks fundamentals, I ultimately gravitated toward DevOps and (to no one’s surprise) quality topics, like developer tooling, test methodologies, and test tooling. Mark leaned toward architecture and generative AI topics.

Keynote

Wednesday’s keynote brought the announcement that all Databricks services will soon be available serverless. Given prior concerns we’ve encountered from potential customers about resource and personnel costs to manage Databricks infrastructure, we think this is going to be a gamechanger in reducing the barrier to entry and total cost of ownership for the Databricks ecosystem.

Also at the keynote, the live demo of the Mosaic AI Gateway by “Cookie Lady” Kasey Uhlenhuth was another reminder of how quickly generative AI is evolving and the amazing use cases Databricks is helping to enable. Shutterstock also jumped out here as a company who is really drawing energy from the “flywheel” of unstructured data at their disposal.

Vendors

With its swag and excitement, we spent a good bit of time talking to vendors in the expo hall at the conference. Soda and dbt were known entities coming in and are still Yellow Shark favorites. O’Reilly was there - stopping by their booth was like visiting an old friend. One of my highlights of the conference was getting a copy of “Fundamentals of Data Engineering,” a Yellow Shark staple, signed by co-author Matt Housley.

As for new discoveries, I was excited about Datafold, as their data validation product parallels some of my current efforts. And Yellow Shark loves Prophecy! Their AI-powered ETL copilot tooling can offer two intriguing benefits to clients: they are an accelerator for pipeline development and they de-skill pipeline management and maintenance, reducing the total cost to build and own production data systems.

Final Thoughts

All in all, it was a great week for us. By Thursday, I felt like my brain was full – it was an intensive and productive four days of learning and connecting, all while fully immersed in the world of data and artificial intelligence.

Special thanks to Databricks for their help and encouragement in getting us there. We’re already looking forward to next year!