AI is going to alter the world. Will it lead to a golden age, human extinction, or just one more technical advance? History tells us that the winners in the race will be those who best merge human capabilities with the power of AI. We’ll look at how this is already playing out in analytics and open source software. Some of the changes will surprise you.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eApache Superset has always been leading the charge on open-source BI, but now it’s getting ready to truly take over the BI world. Learn all about Superset’s new extensions architecture that will allow users and developers to more rapidly expand and improve the product’s capabilities, while simplifying life for both developers and maintainers.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eDiscover the transformative power of streaming analytics featuring groundbreaking case studies from some of the most innovative companies in the world. Explore how Uber, Razorpay, and Stripe leverage next-gen streaming architectures to power their real-time decision-making, improve user experiences, and drive operational excellence. These case studies will offer a rare glimpse into the advanced technologies and strategies behind these leading-edge systems, showcasing real-world applications of streaming analytics that are as inspiring as they are practical.
Learn how these organisations tackle the complexities of real-time data processing to unlock speed, scale, and insight—and leave with actionable ideas to jumpstart your own journey into the world of streaming analytics. We will share reference architectures and the resulting technical and business outcomes.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eAs the Lakehouse paradigm rises in popularity, so does the risk of being locked into a single vendor’s ecosystem. But what if you could have all the benefits of a unified architecture—without giving up control?
In this session, we introduce Lakekeeper, an open-source Apache Iceberg catalog that makes it possible to build Lakehouse architectures that are truly portable: across clouds, compute engines, and storage layers.
This talk speaks directly to data professionals looking to stay ahead of the curve by exploring:
How open formats like Iceberg and open catalogs like Lakekeeper can break down silos
How to build cloud-neutral, compute-agnostic analytics pipelines
Why metadata and catalogs are the new control plane for governance and orchestration
Real-world examples of Lakehouse deployments using Trino, Spark, and DuckDB
A vision for future-proofed architectures built entirely on open standards
Whether you’re modernizing your stack or starting fresh, this session offers practical insight and fresh ideas for staying flexible, scalable, and free from lock-in — while staying fully open-source.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eThis session explores how open-source analytics technologies are transforming the public sector through the lens of Electronic Income Verification (EIV) systems—platforms that process over 850,000 real-time verifications daily, integrate 40+ data sources, and maintain 99.95% uptime to support equitable, efficient public benefit delivery.
We’ll dive into the open-source stack behind these systems: event streaming with Apache Kafka, data orchestration with Airflow, analytics with Apache Superset and DuckDB, and ML-powered fraud detection using tools like scikit-learn and Hugging Face NLP. You’ll learn how public agencies are building scalable, secure, and cost-effective solutions by leveraging community-driven technologies and standards.
Topics include building modular data pipelines, real-time dashboards, anomaly detection, and managing governance and compliance (GDPR, HIPAA) in open environments. The talk will also highlight DevOps practices such as IaC, GitOps, and monitoring with Prometheus and Grafana to maintain visibility, security, and auditability in high-trust systems.
Ideal for data engineers, open-source practitioners, and civic tech innovators, this session offers a real-world case study on how open analytics infrastructure can power large-scale, high-impact digital public services.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eIn this talk, I will introduce how Apache Doris, as a real-time analytical database, extends from custom-facing business scenarios to agent-facing ones.
I will cover the technical details behind high concurrency and low-latency query analytics, as well as capabilities supporting AI scenarios such as hybrid search, agent observability, and collaboration between Doris MCP Server and large language models (LLMs).
This will help the audience understand how Doris empowers enterprises to perform real-time data exploration in the AI era.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eThis presentation showcases an open source healthcare analytics platform that reduced ICU transfers by 20% through real-time patient risk prediction. Built entirely with open source technologies, the system demonstrates how healthcare organizations can leverage community-driven tools to achieve clinical impact without vendor lock-in.
The architecture combines Apache Kafka for real-time EMR streaming, Apache Spark for ML model training, and PostgreSQL with TimescaleDB for time-series clinical data. Docker containerization ensures reproducible deployments across environments, while Kubernetes orchestrates auto-scaling during patient admission surges. The ML pipeline uses scikit-learn and XGBoost models trained on anonymized historical cohorts, with MLflow tracking experiments and model versioning.
Key open source components include: Apache Airflow for workflow orchestration, Grafana for clinical dashboards, and Apache Superset for analytics visualization. The platform implements FHIR standards through HAPI FHIR server, ensuring interoperability with existing hospital systems.
Critical lessons learned include: designing privacy-preserving analytics with differential privacy libraries, implementing federated learning across hospital networks, and maintaining sub-second latency for critical alerts using Redis caching. The session covers practical deployment strategies, cost optimization techniques, and governance frameworks for open source healthcare analytics.
Attendees will learn to build production-ready healthcare analytics platforms using exclusively open source tools, complete with code examples, architecture patterns, and regulatory compliance strategies that deliver measurable patient outcomes.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eWhat happens when databases don’t just store data—but help analyze it intelligently? This talk explores emerging trends in AI-powered database intelligence, from schema optimization and real-time query tuning to transforming unstructured content using techniques like sentiment analysis and entity recognition. We’ll also dive into the future of self-driving databases and multi-modal analytics that integrate text, images, and more. Attendees will leave with a forward-looking view of how AI is reshaping database engines into context-aware, insight-generating platforms—laying the foundation for the next generation of open-source analytics systems.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eApache Iceberg™ made great advancements going from Table Format V1 to Table Format V2, introducing features like position deletes, advanced metrics, and cleaner metadata abstractions. But with Table Format V3 on the horizon, Iceberg users have even more to look forward to.
In this session, we’ll explore some of the exciting new user-facing features that V3 Iceberg is about to introduce and see how they’ll make working with Open Data Formats easier than ever! We’ll go through the high-level details of the new functionality that will be available in V3. Then we’ll dive deep into some of the most impactful features. You’ll learn what Variant types have to offer your semi-structured data, how Row Lineage can enhance CDC capabilities, and more.
The community has come together to build yet another great release of the Iceberg spec, so attend and learn about all of the changes coming and how you can take advantage of them in your teams.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eAs enterprises grapple with an explosion of data and increasing pressure to make rapid, informed decisions, traditional Business Intelligence (BI) tools are reaching their limits. Static dashboards and complex query interfaces often exclude non-technical users, creating friction between data and action. Enter AI-native analytics—a transformative approach that integrates natural language interfaces (NLIs) with scalable machine learning (ML) to deliver intelligent, conversational decision systems.
This keynote explores how organizations can reimagine their analytics infrastructure by embedding AI into the very fabric of user interaction. Drawing on real-world implementations and cutting-edge research, we’ll unpack the architectural foundations needed to operationalize NLIs at scale—spanning natural language understanding (NLU), data context alignment, governance, and high-performance compute. We’ll address the core challenges of conversational systems in the enterprise: query ambiguity, semantic grounding, explainability, and scalability under dynamic workloads.
With AI-driven interfaces, business users can shift from passively consuming reports to actively engaging with data through dialogue—unlocking faster insight discovery and empowering decision-making at all levels. Attendees will leave with a strategic framework for building next-generation analytics platforms that are intelligent, adaptive, and truly human-centric.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eThe retail sector’s shift toward omnichannel fulfillment and instant availability demands has exposed critical limitations in traditional batch-processed inventory systems. This presentation demonstrates how open source Event-Driven Architecture (EDA) tools are transforming retail inventory analytics, enabling continuous real-time processing that delivers superior accuracy, automated insights, and scalable supply chain responsiveness.
Open source analytics platforms are proving their value in retail operations, with adopters experiencing up to 30% fewer stockouts and 15% improved inventory accuracy compared to proprietary batch systems. Apache Kafka serves as the foundational streaming platform, processing millions of inventory events per second during peak periods, while Apache Flink provides the real-time analytics engine for instant inventory calculations and decision automation. These open source solutions dramatically improve customer experience while reducing operational costs and optimizing stock levels.
The integration of open source machine learning frameworks with EDA further enhances analytical capabilities, enabling demand forecasting with 20% greater accuracy than traditional methods. By combining Python-based ML libraries, real-time streaming analytics, and automated inventory algorithms, retailers can align their strategies with live demand signals, reducing waste and improving profitability through data-driven decision making.
Transitioning to open source event-driven inventory analytics presents implementation challenges including legacy system integration, data quality assurance, and organizational change management. However, the open source ecosystem provides cost-effective, flexible solutions for overcoming these obstacles while maintaining full control over analytical processes and data.
This session will explore proven open source architectures, demonstrate measurable impacts on retail supply chains through live analytics dashboards, and outline how open source event-driven systems enable more intelligent, sustainable retail operations. Attendees will gain practical insights into implementing these solutions and leveraging the broader open source analytics ecosystem for competitive advantage.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eRetrieval-Augmented Generation (RAG) is transforming analytics applications — but implementing it often means managing multiple systems: OLTP, vector DBs, and orchestration tools.
In this session, we’ll show how OceanBase simplifies this stack by supporting both structured and vector data natively, enabling developers to build real-time RAG pipelines using just one open-source database.
We’ll walk through a working demo that combines OceanBase with OpenAI and popular Python frameworks like LangChain, demonstrating how to perform vector search and retrieval directly using SQL.
Unlike traditional setups that require combining a relational database and a separate vector database, OceanBase handles both transactional and semantic search in a single engine — with consistency, availability, and simplicity.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eAs AI transforms how we build, scale, and interact with software, one thing is becoming clear: open source isn’t just keeping up—it’s leading. In this keynote, Max Beauchemin, creator of Apache Superset and Apache Airflow, unpacks why open source is uniquely positioned to dominate in the age of AI. From training data to developer velocity, open projects have structural advantages that proprietary vendors simply can’t replicate.
We’ll explore how LLMs ““know”” open source deeply, how AI-native workflows amplify OSS contributions, and why communities—not corporations—are becoming the new centers of gravity for software innovation. Whether you’re a maintainer, contributor, or startup founder, this talk will reframe how you think about OSS and help you ride the wave instead of getting swamped by it.
The future of software is AI-native and open by default. Let’s talk about what that really means—and how.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eCommercial Open Source Software (COSS) is a burgeoning business sector. This talk will focus on how the demand for COSS will be driven by the advance of LLM and AI.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eDistributed systems are full of surprises—and ClickHouse is no exception. In this talk, Shivji (Nutanix) and Anurag (Incerto) share real-world war room stories from designing, testing, and running ClickHouse across both cloud and on-prem environments, where debugging production issues often felt more like unraveling plot twists in a thriller than routine operations.
We’ll walk through some of the toughest incidents we’ve faced: what broke, what we thought was wrong, what actually was wrong, and how we got to the root cause. Along the way, we’ll introduce a practical framework for tackling such issues—combining human intuition, AI assistance, and the messy negotiations that often define real-world problem-solving.
Because in production, solutions aren’t always about perfect optimizations—they’re about trade-offs, context, and sometimes, knowing when to compromise.
Join us for a behind-the-scenes look into ClickHouse in the wild—lessons learned, patterns observed, and a few fun battle scars.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eRecently, several open source companies attracted a lot of attention after their announcements of license changes. Not surprisingly, these shifts sparked backlash from open source enthusiasts, prompting some to create community-driven forks under open source foundations.
Now there is growing skepticism toward (single) company backed open source projects, with many arguing that open source projects should be run by neutral foundations to prevent future bait-and-switch tactics. But is foundation backing really the answer?
Drawing on over a decade of experience in both open source foundations and companies, Fatih and Ray will compare foundation-backed and company-backed projects across key areas such as governance, roadmap planning, community, and funding. They’ll explore real-world examples of successful—and not-so-successful—projects in both models.
Finally, Fatih and Ray will discuss why funding models should be just one of several factors in assessing the long-term viability of open source projects. They’ll offer a holistic approach for evaluating open source projects, helping developers and decision-makers make informed choices about which projects to adopt, support, or contribute to.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eAfter years of pushing ClickHouse to its outer limits in real-world observability workloads, we’ve learned a lot - sometimes the hard way - about getting the most out of your analytics system. But before you dive into inverted indexes, object storage, and terabyte-scale performance tuning, it’s critical to get the basics right.
This talk starts at the beginning, walking through the fundamentals that make ClickHouse such a powerful engine for analytical workloads, the performance advantages of columnar storage, how its architecture supports horizontal scaling, and why it’s ideal for high-throughput, low-latency queries. We’ll share how our observability platform ingests and queries billions of logs, traces, and metrics using these core principles (and yours can too!).
Then dive into the deep end, covering some advanced and novel techniques we’ve developed over time. You’ll learn how we use custom inverted indexes for rare-event querying, materialized views for real-time aggregations, secondary index tuning, and cost-efficient object storage integration. Along the way, we’ll highlight performance optimization strategies grounded in real-world data scale.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eFinancial analytics platforms face unprecedented data challenges, processing millions of transactions while delivering real-time insights to traders, risk managers, and compliance teams. Open source distributed databases have emerged as the backbone of modern FinTech analytics, enabling organizations to scale cost-effectively while maintaining full control over their data architecture.
This presentation explores how open source technologies like Apache Cassandra, PostgreSQL with Citus, and ClickHouse power financial analytics through strategic sharding, replication, and consistency models. We’ll examine how leading financial institutions leverage these tools to handle 65,000+ transactions per second during peak trading periods while maintaining sub-millisecond query performance for real-time risk assessment and fraud detection.
Key topics include horizontal scaling strategies for multi-terabyte financial datasets, implementing cross-region replication for regulatory compliance, and balancing consistency requirements between transactional accuracy and analytical responsiveness. Through real-world case studies, attendees will discover how open source database architectures enable sophisticated financial analytics—from high-frequency trading algorithms to regulatory reporting pipelines—while reducing infrastructure costs and avoiding vendor lock-in.
Whether you’re building trading platforms, risk management systems, or compliance dashboards, this talk provides practical insights into architecting scalable, fault-tolerant analytics infrastructure using proven open source technologies that power today’s most demanding financial applications.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eObservability (O11y) is the practice of collecting, analyzing and acting upon system telemetry to ensure optimum performance and reliability. It is a real-time use case that every tech-driven organization faces. However, the landscape of observability is rapidly changing driven by a number of factors:
• The adoption of OpenTelemetry (OTel) at the agent and collection layers • The rise of disaggregated observability stacks, combining best-of-breed solutions for each layer, built on open source technologies • The evolution of “Observability 2.0” which combines all types of telemetry into a single common data model • The advent of AI within observability systems, and, conversely, the need for observability of AI systems
This talk will bring attendees up to speed on the rapidly changing observability landscape, and how data streaming, stream processing, and real-time analytics technologies play critical roles in emerging disaggregated observability stacks.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eI was fed up with FAQ search that spits back junk or nothing at all. TSIA says over sixty percent of support tickets could be solved by our own docs, so why isn’t that happening? In this talk I’ll show how I built a simple semantic search app with just three hundred lines of Python. I’ll demo it live answering three differently worded questions, and all in under a hundred milliseconds using TiDB Open-Source and Amazon Bedrock embeddings. At the end i’ll provide the steps you need to take so you can clone the repo and run it on your docs tonight.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eHow we built an open source observability stack that can track every frame of our game.
https://github.com/madesroches/micromegas/
When every frame lasting 1/60th of a second can record thousands of events, traditional time series databases just won’t do.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121eMagenta Telekom ingests many terabytes of new data every day, and every downstream consumer wants it immediately. The real bottleneck turned out not to be hardware but humans wrestling with hidden, hard-wired dependencies in hundreds of heterogeneous pipelines and sometimes tool silos.
Our fix was to treat every data asset as a node in a data-dependency graph and every transformation as an edge. Ingestion, Transformation, AI and BI are all part of the same executable graph. By using suitable abstractions and dependency injection less technical people are empowered to contribute business logic which can be operationalized efficiently.
This talk covers:
Operational challenges are handled by the abstractions and analysts only focus on the business logic.
https://us.airmeet.com/e/69f1f9b0-2f11-11ef-82f4-1d5f1667121e