Fit for What Use?

Data doesn't need to be perfect to be useful. It needs to be fit for a specific purpose. A framework for evaluating data quality against the use case that actually matters.

Measuring Fitness: Finite Resource Allocation

How do you measure whether data is good enough to guide the distribution of critical resources? An applied case study in defining and quantifying data quality for a specific, consequential use case.

Measuring Fitness: Surge Prediction

Building a model to predict COVID-19 deaths is hard enough. Building one that accounts for how the underlying data changes over time is harder — and significantly more honest.

Quality in Context: Producing America's COVID-19 Data

How America's COVID-19 data was actually made — the fragmented public health infrastructure, political decisions, and reporting failures that shaped what we knew and when we knew it.

Rising from the Dead

Tracking California's COVID-19 death data daily — watching numbers disappear, reappear, and rewrite history — and what it reveals about the state of modern data.

The Data Revolution's Blind Spot

We have built sophisticated ways to use data — machine learning, generative AI, real-time models. We have not built the methods to know whether that data is worth using.

Visualizing the Problem

Standard line graphs hide how data changes over time. Shifted line plots, filtered heat maps, lag plots, bifrost plots, and impact plots — a visualization toolkit for data that revises itself.

What Data Quality Demands of Us

Data doesn't need to be perfect. But its imperfections must be understood. A closing argument for building a practical, urgent field of data quality research.

AI in Industry: Gen Ai X Summit at ODSC West

A talk on the real-world application of generative AI in industry — what's working, what's hype, and where the field is actually headed. Delivered at ODSC West's Gen Ai X Summit, 2024.

What Leaders Actually Need to Know About AI Right Now

The boardroom conversation about AI has outpaced most leaders' ability to evaluate what they're hearing. Here's how to ask better questions — and what the answers should sound like.

Practical Data Quality for Modern Data and Modern Uses

The full dissertation submitted to Cornell University's Field of Statistics, 2023. A rigorous treatment of data quality applied to America's COVID-19 data — the problems, the metrics, and a framework for thinking about fitness for use.

Women Data Leaders Panel Discussion

A panel conversation on leadership, career trajectories, and what it takes to advance equity in data science and analytics. Data Leaders USA, 2023.

Breaking into Data

A practical, candid guide to breaking into data science — what hiring managers actually look for, how to build a portfolio that stands out, and how to navigate the transition from wherever you're starting.

Essential Data Literacy

What every professional — technical or not — actually needs to understand about data in order to make better decisions, ask better questions, and hold data-driven claims to a higher standard.

Why & When: Cross Validation in Practice

Data scientists run cross-validation constantly — but many do it without understanding why. Here's the full reasoning, from first principles to production.

The Impact Hypothesis: The Missing Link in AI Projects

Data science teams spend months building models that technically work — and fail to move the business. The culprit is almost always the same: an unstated assumption between output and outcome.

The Revolution Will Be Data-Driven

A keynote for nonprofit and civic sector leaders on how the data revolution was reshaping civil society — the opportunities, the risks, and what organizations need to do to use data without losing sight of the people behind it.