AI belongs at the edge

When I refer to “AI” in this article, I specifically mean inference, running a model to generate outputs. Training and fine-tuning will likely continue to happen in large datacenters. What is changing is that running inference locally is becoming increasingly affordable and practical for regular users and small organizations.

Most homes have a room, or at least a corner, dedicated to the less glamorous side of daily life. It’s where the vacuum cleaner, cleaning products, tools, and miscellaneous items end up. We rarely think about this space, yet it quietly supports how we live. In the coming years, this same technical room is likely to undergo a quiet but meaningful transformation. Instead of functioning mainly as storage, it will increasingly house a small rack with networking equipment and compute nodes, the infrastructure that powers personal AI assistants, self-hosted services, media libraries, and private data. What was once a forgotten storage space may become one of the most important rooms in the house.

To understand why this shift is happening, it helps to look at how AI models themselves are evolving. The performance gap between the top frontier models has narrowed significantly. These models are trained on largely the same data, the public internet, and in many benchmarks, they differ by less than 0.3 percentage points. This convergence means there is no longer a single clearly superior model for most tasks. Source: https://rogo.ai/news/introducing-the-big-finance-benchmark

In practice, the majority of everyday prompts (80-90% in my opinion) do not require real-time internet access. Tasks such as proofreading, writing assistance, mathematical reasoning, physics or chemistry explanations, and logical analysis rely on knowledge that has remained stable for years. Only a smaller portion of queries genuinely need fresh information, such as current weather, news, or live scores. In those cases, a locally running agent can make a targeted request to an online service and retrieve only the necessary data. This reality makes the default approach of sending nearly every prompt to the cloud increasingly difficult to justify for most users.

At the same time, relying on cloud-based AI carries real risks. Google Cloud has accidentally deleted accounts containing massive amounts of data due to internal errors. In other cases, automated systems have wrongly flagged users for serious violations, resulting in permanent loss of access to years of photos, emails, and documents with little effective recourse. Similar problems have appeared with Microsoft’s Copilot. A recent bug caused the system to summarize confidential emails, and the company’s own terms of service describe Copilot as being “for entertainment purposes only”, a disclaimer that contrasts sharply with how aggressively the product is being marketed. These incidents highlight a clear imbalance: when your data and AI processing live entirely on someone else’s infrastructure, you remain at the mercy of their systems, policies, and mistakes.

This is why a growing number of people are turning toward self-hosting and local infrastructure. There is a clear trend among tech-savvy users, privacy-conscious individuals, and homelab enthusiasts toward keeping files, photos, and services under their own control. Unlike cloud subscriptions, which act as recurring rent, self-hosting involves an upfront investment in hardware and electricity. For anyone with a sizable library, it typically becomes cheaper within one to three years. More importantly, your data remains under your control, with no risk of sudden policy changes, account restrictions, or having it used to train someone else’s models.

Open-source software has reached a point where it offers strong feature parity with commercial alternatives in file storage, photo management, and media serving. At the same time, consumer and prosumer networking has improved significantly. While technically inclined users once had to build their own routers using Linux or pfSense, solutions like Ubiquiti now offer enterprise-grade features, such as advanced routing, VLANs, and centralized management, at reasonable prices and without subscriptions. These more capable and secure local networks are becoming a key enabler for practical AI inference at the edge.

Running AI locally is also becoming more accessible. Configuration complexity is decreasing, dedicated AI accelerators (TinyAI) are becoming more affordable, and an increasing number of models are being released with open weights. As a result, even people with limited technical experience can now set up a private AI inference node over a single weekend, often with help from AI-assisted tools.

The case for local inference becomes even stronger with the rise of physical AI. As robots and intelligent devices gain access to our homes and personal spaces, they will continuously collect camera, microphone, and sensor data from our most private environments. Processing this information in the cloud would expose highly sensitive personal details to third parties. Running inference locally keeps that data under direct control and significantly reduces the privacy risks that come with physical systems operating inside the home.

I have experienced this tradeoff firsthand with my Reachy Mini. For the robot to be properly aware of its environment, it needs to process video and audio from its cameras and microphones in real time. Running this through a cloud service would cost between $10 and $60 per month, along with hundreds of milliseconds of latency. Running inference locally reduces the cost to less than a dollar per month, essentially just the additional electricity — while delivering much lower latency.

The same logic applies at larger scale. Someone who wants to index emails, photos, and documents, or process old scanned files through OCR and query them intelligently, faces a choice. Uploading everything to a major cloud provider requires moving large volumes of data over the internet, accepting the provider’s terms, committing to ongoing subscriptions, and accepting the risk of data exposure. Indexing and querying the same data locally eliminates data transfer costs, removes recurring fees, reduces latency, and keeps full ownership of the information.

A rough cost comparison for indexing one million PDFs illustrates the difference. On the cloud side, initial indexing and embedding can cost between $7,000 and $15,000, with monthly storage and search charges adding another $450 to $1,000. Running the same workload locally requires only electricity, typically in the range of tens of dollars per month, with hardware representing a one-time investment that can be used for years.

The technical room is already beginning to change in many homes. What used to be a space for storing cleaning supplies and tools is gradually turning into a small but purposeful infrastructure hub. As local networks become more capable, AI inference becomes more accessible, and physical AI enters our homes, this transformation will likely accelerate. For those who choose to build it, the benefits are tangible: greater control over personal data, lower long-term costs, reduced latency, and infrastructure that serves the household rather than the other way around.

The infrastructure is becoming available. The question is no longer whether this shift will happen, but how many people will decide to build their own version of it.

Featured image generated with Grok Imagine

AI belongs at the edge

Leave a Reply Cancel reply