Homestack

Inspiration

Large language models are rapidly becoming more powerful, but running them locally still requires expensive hardware, large amounts of VRAM, and technical setup that many users cannot access. We were inspired by the growing gap between AI accessibility and hardware limitations. Homestack was inspired by the idea that unused computing resources already exist in homes, dorms, and small labs. We asked: What if users could securely pool local hardware to run larger models collaboratively, while still maintaining privacy and local control?

What it does

Homestack is a distributed local AI infrastructure platform that allows users to: • Share and distribute compute and VRAM across trusted local devices • Run larger language models locally without requiring a single high-end GPU • Provide OpenAI-compatible endpoints powered by distributed inference • Maintain privacy by keeping models and data within a trusted local network Homestack essentially acts as a "local AI cluster orchestrator," allowing users to scale their personal AI workloads using resources they already own.

How we built it

Homestack works only either locally or via secure subnets. The master machine runs the create session shell script which will start a vite server that manages a global state containing the master's ip, and all of the workers that have connected to the session. Workers can connect to the session by using the read_sesh_file shell script to read the .sesh file created with the session and launching the browser to send the worker to the appropriate route, passing its ip address as a query parameter so the vite server global state can be updated. Once the master starts the session you can choose your model. The models are run through vLLM using ray for distributed inference across the LAN/subnet. HTTP requests are then made to the vite server using the openai completions API.

Challenges we ran into

One of the biggest challenges was balancing performance with network overhead. Distributing model execution across devices introduces latency and synchronization complexity, especially when managing large context windows or streaming token generation.

Another challenge was handling heterogeneous hardware environments. Different devices have varying GPU architectures, memory speeds, and compute capabilities. Designing a scheduling system that fairly distributes workloads while maximizing throughput required significant experimentation.

Security was also a major concern. Since Homestack allows multiple devices to collaborate, we had to design trust boundaries and authentication methods to prevent unauthorized resource access.

Accomplishments that we're proud of

We successfully demonstrated distributed inference across multiple local devices, enabling larger models to run without requiring enterprise-grade hardware.

We built a system that integrates local AI agents with distributed inference while maintaining user privacy.

We created a flexible architecture that allows Homestack to support multiple model runtimes and agent frameworks.

Most importantly, we showed that collaborative local compute can meaningfully lower the barrier to running advanced AI models.

What we learned

We learned that distributed AI systems are as much about orchestration as they are about raw compute. Efficient scheduling, memory management, and network optimization are critical to real-world performance.

We also learned that usability is essential. Even technically powerful systems need intuitive deployment and configuration to gain adoption.

Finally, we gained a deeper understanding of how agent frameworks interact with model latency and context limitations, which influenced how we designed Homestack’s execution pipeline.

What's next for Homestack

Our next goals include:

• Expanding support for multi-node GPU parallelism • Adding dynamic model sharding and adaptive load balancing • Building a graphical dashboard for cluster monitoring • Improving security through encrypted node communication and role-based permissions • Supporting additional inference backends such as vLLM and SGLang • Exploring federated learning and distributed fine-tuning capabilities

Long-term, we envision Homestack becoming a decentralized AI infrastructure layer that empowers individuals and small teams to run powerful AI systems without relying on centralized cloud providers.

Built With

bash
llama.cpp
openai
ray
react
vllm

Updates

Nico Rodriguez started this project — Feb 08, 2026 01:02 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.