Trump’s 2025 Tariffs: Tech Disrupted, Trade Rewired
Trump’s spring 2025 tariff blitz hammered global tech stocks and shattered supply chains. But in the rubble, India rose, and semiconductor power plays began....
It was a Bengaluru morning like any other — clouds hanging low, the comforting aroma of filter coffee, and Apple’s WWDC streaming in the background. The phrase Apple Intelligence lit up the screen — a perfect blend of marketing brilliance and technical aspiration. I’ll admit, I leaned forward.
After a decade of debugging system bottlenecks and watching AI gradually seep from server racks into silicon, this felt like a moment. M-series chips. Unified memory. Local LLMs. It had finally arrived.
Or had it?
Few months later, the rose-tinted optimism had dulled. Siri hadn’t become any wiser, and the local LLMs felt more like college interns trying to impersonate a CEO. As I wrote in “The Apple Intelligence Saga”, the dream was real, but the execution struggled.
And that’s what this blog unpacks: not just how the edge is promising, but also why it’s still disappointing.
Modern SoCs like Apple M-series, Snapdragon X Elite, AMD Phoenix, and Intel Meteor Lake are marvels: tightly integrated, multi-core, and packed with AI accelerators. But power doesn’t always mean practicality.
Pouring petrol through a straw won’t win a race.
Even Apple’s M3 Pro at 150+ GB/s pales in comparison to data center GPUs like NVIDIA H100 (3 TB/s). LLMs are bandwidth-hungry monsters.
Low arithmetic intensity in LLM layers means even 100 TOPS NPUs can sit idle — starving for data.
Edge devices are built for bursts, not endurance. Sustained inference on a thin MacBook? Your tokens/sec graph will look like Bangalore traffic on a Monday morning.
Quantization is essential — but risky.
Format | Memory Use | Accuracy | Notes |
---|---|---|---|
FP16 | 50% of FP32 | Near-lossless | Ideal if supported |
INT8 | 25% of FP32 | Mild drop | Needs calibration |
INT4 | ~12.5% | Noticeable loss | Use GPTQ or QLoRA |
INT3 | ~9% | High risk | Often “robotic” responses |
From FLAC to MP3 to ringtone — there’s a breaking point.
INT4 gets 7B models on 8 GB RAM, but coherence suffers without careful tuning.
Portable, but lacks optimal kernels. Apple M1/M2 performance often slower than CPU fallback.
Fastest option on Apple — if you use llama.cpp
with Metal. Avoid PyTorch on MPS unless you enjoy waiting.
A great idea (runs on all GPUs) but not yet optimized for LLMs. Coverage for complex ops remains limited.
Google Maps might show a straight line. Bannerghatta Road says otherwise.
Platform | Tokens/sec (7B Q4/Q6) |
---|---|
Apple M2 | 20–40 (llama.cpp ) |
Intel Meteor Lake | ~10–30 (DirectML, est.) |
AMD Phoenix | ~20 (FP16, ONNX) |
Snapdragon X Elite | TBD (claimed 13B support) |
Apple leads due to software-hardware synergy. Others have potential, but runtime gaps hold them back.
7B models don’t have the depth or nuance we expect from ChatGPT-level experiences — especially after quantization.
You don’t need GPT-4 for grocery lists. But don’t expect a 7B model to write your wedding vows.
Until we get well-tuned 13B+ models on-device or hybrid cloud-edge solutions, quality will remain mid-tier.
Apple promised an on-device, contextual AI revolution.
What we got:
Even Apple hit the glass ceiling: limited RAM, thermal budget, and immature runtimes.
We need:
Let’s stop calling 7B INT4 a “GPT-4 competitor.” It’s not. Not yet.
Edge AI feels a lot like Bengaluru’s road network: intelligent intentions, brilliant engineers, and… bottlenecks at every corner.
We’ve got great engines — M3 Max, X Elite, Phoenix — but our road (runtime maturity, bandwidth, quantization quality) is full of potholes.
“Kya karein, traffic toh hai.” But that doesn’t mean we stop building.
There’s progress. Tools like llama.cpp
, MLX, ONNX, and Core ML are improving. Next-gen chips (M4, Strix, Arrow Lake) will lift ceilings further.
And maybe, just maybe, the future of AI won’t live in cloud clusters — it’ll run right from your laptop bag.
While we just focused on LLMs for this article, we need to appreciate the fact that DNN-inference especially relevant to image, video, speech processing have progressed significantly and we cannot discard their significance just because the client/edge systems are still “Not LLM ready”.
Have you tried running LLMs on your Mac, PC, or ARM laptop?
Drop your comments, exeriences, benchmarks, pain points, or clever hacks in the comments or tag @bhargavachary — let’s build this road together.
#EdgeAI #LocalLLM #MSeries #AMD #Intel #SnapdragonXElite #Quantization #llamacpp #Inference #SystemEngineering #MachineLearning
Leave a Comment