Writing — Kerstin Frailey

Jun 7, 2026

Compare, Contrast, and Evolve: A Data Quality Literature Review

A survey of how data quality has been defined, measured, and debated across decades of research — and why the field still lacks the practical, quantitative foundation modern AI demands.

Jun 7, 2026

Fit for What Use?

Data doesn't need to be perfect to be useful. It needs to be fit for a specific purpose. A framework for evaluating data quality against the use case that actually matters.

Jun 7, 2026

Measuring Fitness: Finite Resource Allocation

How do you measure whether data is good enough to guide the distribution of critical resources? An applied case study in defining and quantifying data quality for a specific, consequential use case.

Jun 7, 2026

Measuring Fitness: Surge Prediction

Building a model to predict COVID-19 deaths is hard enough. Building one that accounts for how the underlying data changes over time is harder — and significantly more honest.

Jun 7, 2026

Quality in Context: Producing America's COVID-19 Data

How America's COVID-19 data was actually made — the fragmented public health infrastructure, political decisions, and reporting failures that shaped what we knew and when we knew it.

Jun 7, 2026

Rising from the Dead

Tracking California's COVID-19 death data daily — watching numbers disappear, reappear, and rewrite history — and what it reveals about the state of modern data.

Jun 7, 2026

The Data Revolution's Blind Spot

We have built sophisticated ways to use data — machine learning, generative AI, real-time models. We have not built the methods to know whether that data is worth using.

Jun 7, 2026

Visualizing the Problem

Standard line graphs hide how data changes over time. Shifted line plots, filtered heat maps, lag plots, bifrost plots, and impact plots — a visualization toolkit for data that revises itself.

Jun 7, 2026

What Data Quality Demands of Us

Data doesn't need to be perfect. But its imperfections must be understood. A closing argument for building a practical, urgent field of data quality research.

Jan 1, 2025

Building an AI-Powered Course Kick-Starter: Agents That Research, Draft, and Evaluate ↗

A practical walkthrough of building a multi-agent AI system that researches a topic, drafts course content, and evaluates its own output — with real lessons from building it.

Oct 29, 2024

AI in Industry: Gen Ai X Summit at ODSC West ↗

A talk on the real-world application of generative AI in industry — what's working, what's hype, and where the field is actually headed. Delivered at ODSC West's Gen Ai X Summit, 2024.

Jan 15, 2024

What Leaders Actually Need to Know About AI Right Now

The boardroom conversation about AI has outpaced most leaders' ability to evaluate what they're hearing. Here's how to ask better questions — and what the answers should sound like.

Dec 1, 2023

Practical Data Quality for Modern Data and Modern Uses ↗

The full dissertation submitted to Cornell University's Field of Statistics, 2023. A rigorous treatment of data quality applied to America's COVID-19 data — the problems, the metrics, and a framework for thinking about fitness for use.

Oct 1, 2023

Less Data Needed: Why Data Selection Is Critical to the Future of AI in Education ↗

The future of AI in education isn't more data — it's better data. A talk on why data selection, not data volume, is the critical design decision for effective AI in learning contexts.

Jun 1, 2023

Women Data Leaders Panel Discussion

A panel conversation on leadership, career trajectories, and what it takes to advance equity in data science and analytics. Data Leaders USA, 2023.

May 11, 2023

The Stuff They Didn't Teach You in Data Science Class ↗

Tips and tricks for every level of practical data science — from the talk at ODSC East 2023 that resonated far beyond the conference room.

Oct 25, 2022

jupyckage: Turn Any Jupyter Notebook Into a Python Package in One Line ↗

The pain of sharing code across notebooks is real. jupyckage solves it — one command, proper package structure, no boilerplate.

Jun 1, 2022

Breaking into Data ↗

A practical, candid guide to breaking into data science — what hiring managers actually look for, how to build a portfolio that stands out, and how to navigate the transition from wherever you're starting.

Feb 8, 2021

On AI ROI: The Questions You Need to Be Asking ↗

Most organizations treat AI ROI as a measurement problem. It's actually a strategic one. The questions you ask before you build determine everything that comes after.

Feb 8, 2021

Chasing Impact: On Building Technical Projects That Actually Matter ↗

A keynote from the GET Cities Kickoff Summit on what it takes to build technology that moves the world rather than just moving fast.

Sep 1, 2019

Essential Data Literacy

What every professional — technical or not — actually needs to understand about data in order to make better decisions, ask better questions, and hold data-driven claims to a higher standard.

Jul 19, 2019

Why & When: Cross Validation in Practice ↗

Data scientists run cross-validation constantly — but many do it without understanding why. Here's the full reasoning, from first principles to production.

Jun 1, 2019

Rabbit Holes, Red Herrings, and Rewards: Managing Curiosity in Data Science ↗

Curiosity is a data scientist's greatest asset and most dangerous liability. How to follow interesting threads without losing weeks — and how to know the difference between a distraction and a discovery.

Apr 1, 2019

Building an Effective Data Science Project Portfolio ↗

What actually makes a data science portfolio stand out — project selection, storytelling, demonstrating judgment and impact rather than technical execution alone.

Mar 22, 2019

The Impact Hypothesis: The Missing Link in AI Projects

Data science teams spend months building models that technically work — and fail to move the business. The culprit is almost always the same: an unstated assumption between output and outcome.

Mar 8, 2019

Talk the Talk: Data Science Jargon for the Non-Data Scientist

AI, machine learning, models, features — the words get thrown around in every boardroom. Here's what they actually mean, in plain language, with no condescension.

Nov 1, 2016

The Revolution Will Be Data-Driven

A keynote for nonprofit and civic sector leaders on how the data revolution was reshaping civil society — the opportunities, the risks, and what organizations need to do to use data without losing sight of the people behind it.

Jun 1, 2016

In Context: Finding Ourselves and Each Other through Data

A talk on the human dimension of data work — how data can illuminate community, identity, and connection, and what it means to use data in service of people rather than in spite of them.