Hero image for Elon Musk Says Use Grok for Taxes. Experts Say Don't. Here's What AI Can and Can't Do
By Personal Finance Tools Team

Elon Musk Says Use Grok for Taxes. Experts Say Don't. Here's What AI Can and Can't Do


On March 5, 2026 (five weeks before the April 15 deadline), Elon Musk posted on X: “Grok can help with your taxes.”

It got a lot of engagement. It was also the kind of advice that could cost you real money.

Tax professionals pushed back immediately. The New York Times ran tests. TaxCalcBench ran more rigorous benchmarks. The results weren’t good for AI chatbots, at least not as full tax prep tools. But the question is more nuanced than “AI bad, accountant good.” There are specific things AI handles well, specific things it gets wrong, and a clear line between where you can safely use it and where you can’t.

With IRS refunds up 10.6% in early 2026 and more money on the table than ever, this is the year you want to get that line right.

The Honest Summary

AI ToolBest UseShould You File With It?
GrokQuick questions about deductions, understanding IRS languageNo
ChatGPTExplaining concepts, checking your mathNo
ClaudeUnderstanding complex scenarios, document draftingNo
TurboTax / FreeTaxUSAActually filing your return accuratelyYes

AI chatbots are research tools. Not tax software. The distinction matters.

What Musk Actually Said (and What He Meant)

The post was brief: “Grok can help with your taxes.” No caveats. No “consult a professional.” No “for simple questions only.”

To be fair, xAI’s Grok is genuinely good at explaining tax concepts in plain English. Ask it what the standard deduction is, how a Roth IRA works, or what qualifies as a home office deduction, and you’ll get a clear, accurate answer most of the time. For that use case (education and exploration) it’s a useful tool.

But there’s a gap between “help with your taxes” and “prepare your taxes.” Musk’s phrasing blurred that line in a way that sent a lot of people toward AI chatbots during the most consequential financial filing they do all year.

The NYT Test: $2,000+ Average Errors

The New York Times ran chatbots through real-world tax scenarios (W-2 income, freelance income, investment gains, deductions) and compared the outputs to what a tax professional would calculate. Across Grok, ChatGPT-4, and Claude, the average refund miscalculation was over $2,000.

That’s not a rounding error. That’s missing a deduction you were owed, or miscalculating income thresholds for credits you qualified for, or getting the phase-out math wrong on something like the child tax credit.

The errors weren’t random. They clustered around a few problem areas:

Phase-outs and income thresholds. Tax credits and deductions often reduce as income rises. The exact numbers change year to year, and the interactions between multiple phase-outs on the same return are genuinely complicated. AI models got these wrong consistently.

Self-employment and gig income. Schedule C is where AI falls apart. Determining which expenses are deductible, calculating self-employment tax correctly, and handling the qualified business income deduction all require state-specific knowledge and fact patterns that chatbots hallucinate or simplify.

New deductions they haven’t been trained on. The OBBB deductions (tips, overtime, senior 65+) that went live for 2025 tax year returns are new enough that most AI models either don’t know about them or apply them incorrectly. If you’re a tipped employee, this gap could mean missing a major new deduction worth thousands of dollars.

TaxCalcBench Results: Most Models Under 50%

TaxCalcBench, which runs standardized tax scenarios through AI models and scores accuracy against professional calculations, found that most leading models fail to crack 50% accuracy on a full return.

Think about that. If you asked ChatGPT to flip a coin, it would be right 50% of the time. On a complete tax return, it’s not doing much better.

The benchmark covers scenarios across income types, filing statuses, and deduction combinations. Simple W-2-only returns score higher, sometimes 70-80% accuracy. Add a 1099, some investment income, and a deduction or two, and accuracy drops fast.

That’s the pattern: AI is roughly competent on vanilla returns and unreliable on anything with complexity.

What AI Gets Right

Before writing off chatbots entirely: there are real use cases where they save time and add value during tax season.

Understanding IRS notices. If you get a CP2000 notice, a letter about an audit, or a form you’ve never seen before, a good chatbot will explain what it means in plain language. This is genuinely useful. IRS language is dense. “We have information indicating you may not have reported all income” reads like a threat. A chatbot can translate it calmly.

Checking whether something is deductible. “Can I deduct my home office if I’m an employee?” “Is my side hustle income subject to self-employment tax?” These conceptual questions have established answers, and AI gets them right most of the time. You still need to verify, but it’s a fast way to know if something is worth pursuing.

Finding forms you didn’t know existed. First time with freelance income? AI can tell you about Schedule C, SE, and quarterly estimated payments in a way that saves you hours of IRS.gov archaeology.

Explaining instructions you can’t parse. Every IRS form has instructions. They’re written for tax professionals, not people doing this once a year. Paste a confusing section into a chatbot and ask what it means. The explanation is usually good.

Pre-filling a conversation with a tax pro. If you’re heading to an accountant, AI can help you organize your questions, understand what documents you need, and identify the topics worth flagging. You walk in prepared instead of confused.

The pattern: AI works well for education, orientation, and translating complexity. It breaks down when it’s calculating your actual numbers.

What AI Gets Wrong

Calculating your actual refund or liability. This is the core problem. Tax calculation requires applying the exact current-year rules to your specific numbers. AI models are trained on historical data, may have outdated figures, and don’t have access to your full financial picture. The NYT’s $2,000+ average error wasn’t hypothetical. It’s what happens when people treat a language model as a calculator.

Applying the right year’s numbers. Standard deduction amounts, contribution limits, phase-out thresholds: these change every year. A model that was trained through mid-2025 may apply 2024 figures to a 2025 return without flagging the discrepancy.

State taxes. Most chatbots are shaky on state-level rules. State conformity with federal law varies significantly. California doesn’t follow the OBBB tips deduction. New York has its own standard deduction. If you live in a high-tax state, the gap between federal and state treatment can be substantial, and AI reliably underestimates how different they are.

Anything that requires looking at your documents. AI chatbots can’t see your W-2, your 1099s, your brokerage statements. They’re working from what you describe, and you will leave something out. Tax software that imports documents catches things you forget to mention.

Only 37% of Filers Trust AI Over a Tax Pro

A 2026 survey found that only 37% of Americans trust AI over a tax professional for tax prep, down from 43% in 2025. That’s a meaningful drop in a single year, and it tracks with a year that included high-profile errors and more coverage of AI hallucinations in consequential contexts.

The remaining 63% either trust humans more or aren’t sure. Given what the benchmarks show, that skepticism is reasonable.

But there’s a third option that the survey framing misses: trust dedicated tax software over both. TurboTax, FreeTaxUSA, TaxSlayer, H&R Block: these tools are built specifically to calculate tax returns. They’re tested against current-year rules. They import your documents. They run error checks. They aren’t general-purpose chatbots trying to answer tax questions; they’re purpose-built filing engines.

General AI chatbots aren’t competing with tax software. They’re competing with asking your coworker or Googling a confusing question. At that task, they’re often quite good.

The Right Way to Use AI This Tax Season

Here’s a workflow that actually makes sense:

Use AI for research and questions. Before you open your tax software, use a chatbot to understand what you’re dealing with. What income types do you have? What deductions might apply? What forms will you need? This orientation saves time and surfaces things you might have missed.

Use dedicated tax software to file. Once you know what you’re doing, open FreeTaxUSA, TurboTax, H&R Block, or whatever tool fits your situation. These are the appropriate tools for the actual calculation and filing. The IRS Direct File alternatives guide covers free options if you want to minimize cost.

Use AI to understand what software produces. If your tax software generates a form you don’t recognize or a number that surprises you, a chatbot can explain it. “My Schedule SE shows $X in self-employment tax and I don’t understand why.” That’s a question AI handles well.

Don’t mix the two roles. The error pattern happens when people start in a chatbot, get a number they believe, and then file based on it. Chatbots are not filing tools. They cannot file your return. Any number they give you is an estimate at best and a hallucination at worst.

Which Chatbot Is Best for Tax Questions

If you’re going to use one, here’s how they stack up for the education use case:

Grok: Decent for quick factual questions. The most likely to be up to date given xAI’s training pipeline, which is useful for new legislation like the OBBB deductions. But it’s also the most confident in its wrong answers. That’s a dangerous combination when you’re making financial decisions.

ChatGPT-4: Strong on explanations and conceptual clarity. Better at acknowledging uncertainty than Grok. Will sometimes tell you “verify this with a tax professional,” which is appropriate. Accuracy is roughly on par with Claude.

Claude: The most likely to hedge, qualify, and point you toward authoritative sources. Some people find this annoying; for tax questions, it’s the right instinct. Claude will explain a concept but is less likely to give you a specific dollar figure and tell you to trust it.

None of them should be your filing tool. But for the education tasks above, any of them will help more than hurt, as long as you verify before you act.

What to Do Right Now

April 15 is five weeks out. IRS refunds are up 10.6% in early 2026, which means more money flows to people who file accurately. The refund gap between a correct return and one with a $2,000 chatbot error is real.

If you haven’t started: pick a filing tool based on your situation. For simple returns, FreeTaxUSA is $0 federal and handles the new OBBB deductions well. If you want a step-by-step experience, TaxSlayer’s interview flow is the most user-friendly for first-timers. If you have complex income, TurboTax vs. H&R Block breaks down which handles each scenario better.

If you have specific questions while you’re filing, use a chatbot for clarification. Just don’t let it calculate your refund and don’t file based on a number it gave you.

Musk wasn’t entirely wrong: Grok can help with your taxes. It just can’t do your taxes. That’s the distinction that will determine whether this filing season costs you money or saves it.


FAQ

Is it safe to use ChatGPT to do my taxes? Not as your primary filing method. ChatGPT can answer questions, explain deductions, and help you understand forms, but it makes calculation errors, may have outdated figures, and can’t import your actual documents. Use dedicated tax software (FreeTaxUSA, TurboTax, H&R Block) to file. Use AI chatbots to understand what you’re doing.

How accurate is Grok for tax questions? For conceptual questions (what a deduction is, how a credit works, what form you need), Grok is reasonably accurate. For calculating your actual refund or tax liability, TaxCalcBench found most AI models fall below 50% accuracy on full returns. Grok isn’t an exception.

Why did the NYT find $2,000+ refund errors? The errors clustered around phase-outs and income thresholds (which change yearly), self-employment income, and new deductions like the OBBB changes. AI models applied incorrect figures, missed interactions between deductions, and in some cases generated plausible-sounding but wrong numbers.

What should I actually use to file my taxes? Dedicated tax software built for the current year: FreeTaxUSA (best free option), TurboTax, H&R Block, or TaxAct. The IRS Free File program covers free options if your income qualifies. These tools are purpose-built for calculation accuracy in a way chatbots aren’t.

Can AI at least help me find deductions I’m missing? Yes, this is one of the better use cases. A chatbot can help you think through deduction categories you might qualify for based on your situation. But you should still verify through IRS.gov or your tax software before claiming anything. “AI said I can deduct this” is not a position you want to defend to the IRS.


Based on publicly reported testing results from the New York Times, TaxCalcBench benchmarks, and xAI’s Grok documentation as of March 2026. Tax rules based on IRS guidance for the 2025 tax year. This is informational content, not personalized tax advice.