Garbage in, garbage out. It’s a phrase every business person knows, and for good reason: even the most sophisticated decisions are only as good as the data behind them. If the data’s no good, everything else wobbles.
I recently took a deep dive into the USDA’s National Weekly Hemp Report — the official document meant to track prices and import figures for the U.S. hemp industry. It’s not a complicated dataset: nine columns, about 100 rows per week. Easy enough, right? But after extracting the PDFs, loading them into a SQL database, and running a few basic queries, I found errors that should never have made it out the door.
I’m no stranger to mistakes — I’ve built a career cleaning up messy data — but what started as a standard dashboard project quickly turned into an exercise in spotting red flags. Within two hours, I had uncovered broken formulas, sloppy naming, unexplained jumps in year-to-date numbers, and even the occasional country swap. This is the USDA. The bar should be higher than “mostly right, most of the time.”
Errors big and small
Take the June 19, 2024 report: Australia suddenly became Austria in the Hemp Twine section, with no change to the running totals. The switch stuck for the rest of the year. Or the Sept. 11 report, when the Netherlands vanished entirely, causing a chain reaction of incorrect totals across four other countries before quietly reappearing the next week, no explanation given.
And then there’s April 17 — the full-blown anomaly. Year-to-date values jumped without reason, bore no relation to weekly values, and then reverted the next week as if April 17 had never happened. These aren’t isolated typos. They reveal a pattern of weak data hygiene and zero public accountability. If stakeholders point out errors, USDA often doesn’t respond. Cornell University, which archives the reports, told me they can’t verify the correct values — they just store what USDA sends. That’s not oversight; that’s passing the buck.
Read more at Hemp Today