The device showing your favorite show at night could be quietly working a second job: routing web-scraping traffic for AI companies while you sleep. That’s the finding from a research write-up by Include Security, surfaced and widely discussed on Hacker News this week. According to Hacker News, the report digs into how Bright Data, a data-collection firm, turns ordinary home devices into exit nodes for AI training data harvesting.
What stands out here is the mechanism. Bright Data markets what it calls one of the world’s largest residential proxy networks, with figures ranging from 150 million to 400 million home IP addresses. The supply comes from an SDK, a small piece of software embedded in consumer apps. With user consent, that SDK turns a phone or smart TV into a relay. A paying Bright Data customer routes scraping traffic through your home connection, so the request lands on a target website looking like it came from a normal Comcast or T-Mobile subscriber.
Why this is happening now
AI models run on scraped web content. Pre-training, retrieval, agent grounding, search. They all need fresh data pulled from the open internet. The problem is the modern web fights back. Cloudflare, DataDome, and HUMAN throttle or block requests coming from known datacenter IPs.
Residential proxies are the workaround. Traffic from a real home IP slips past those defenses because it looks like a real person browsing. Brian Krebs reported in October 2025 that a glut of proxies is fueling large-scale data harvesting tied to various AI projects. The FBI issued a formal advisory earlier this year. Academic research going back to 2019 shows these networks are overwhelmingly misused.
Most coverage so far has focused on the illegal supply: botnets, trojanized apps, pre-infected IoT hardware. The Include Security research points at the legal side, which has gotten far less scrutiny. That’s the part worth paying attention to.
Why the TV is the perfect proxy
A smart TV is close to an ideal residential proxy, and the comparison with a phone makes it obvious:
- Always on. A TV never hits 1% battery. It sits in standby 24/7.
- Always connected. High-speed WiFi, no cellular data caps to trip over.
- Unattended. Nobody’s watching the screen at 3 a.m.
- No oversight. Phones have mobile device management and security tooling. TVs have almost none.
- Weak consent surface. You agree by clicking through text with a remote’s arrow keys.
That last point matters. The report flags a Roku app called Petflix, documented by The Verge, whose opt-in screen says the app will “occasionally” use your device’s resources and IP address. But the SDK’s publicly queryable config sets a default monthly WiFi budget of 200 GB. “Occasionally” and 200 gigabytes are not the same thing.
What the research actually proves
Bright Data exposes a partner manifest through an unauthenticated public endpoint that anyone can fetch. The researcher is careful here: being listed doesn’t prove a given app currently ships the SDK in production. Per-app verification is still required. What it does prove is that at least three connected-TV entities monetized their users’ devices as proxy exit nodes, and that the roster sits in the open for anyone to read.
What to do about it
A few practical moves for readers and businesses:
- Audit your free apps. “Free with fewer ads” often means your bandwidth is the product. Read the opt-in text, even on a remote.
- Watch your router. Unexplained upstream traffic from a TV is a signal worth checking.
- For companies: if your security model assumes residential IPs are trustworthy humans, that assumption is breaking. Bot defense built only around datacenter IP blocking is now a step behind.
- For AI builders: know where your scraped data comes from. “Consent SDK” sourcing carries reputational and regulatory risk as scrutiny grows.
The bigger shift is this: the cost of AI’s data hunger is leaking into ordinary living rooms, one idle device at a time. Expect regulators and privacy advocates to circle this legal gray zone next. The full technical breakdown, including 30 days of instrumented traffic, is worth reading at the original source.