SIIMPAF: Building a Self-Hosted AI Infrastructure for Real-World Applications
Introduction
Over the past year, I've been developing something I call SIIMPAF - Synthetic Intelligence Interactive Matrix Personal Adaptive Familiar. The name is admittedly a mouthful, but it captures what I'm trying to build: an AI system that functions like the mythical concept of a familiar - a helpful companion that assists with daily tasks, learns your preferences, and grows more useful over time.
SIIMPAF is a self-hosted AI infrastructure that handles document processing, vector-based semantic search, and intelligent content analysis. It runs entirely on consumer hardware using open-source tools, which matters to me both philosophically and practically.
Why Self-Hosted?
My work spans multiple organizations - I serve as Full and Fractional CITO at PracticingMusician.com and ClimbHigh.AI, and I'm involved with RPGResearch.com, RPG.LLC, Dev2Dev.net, NeuroRPG.com, and several other projects. Each of these has different data sensitivity requirements, different use cases, and different budgets.
Cloud AI services present challenges:
- Cost scaling: API calls add up quickly across multiple organizations
- Data privacy: Some projects involve sensitive educational or therapeutic data
- Availability: Internet dependencies create single points of failure
- Customization: Commercial APIs limit fine-tuning and specialization
Building self-hosted infrastructure addresses these concerns while creating a foundation that can serve all my projects.
The Technical Foundation
SIIMPAF runs on a Python backend with FastAPI providing the REST API layer. The core components include:
Language Models: We use Ollama for local LLM inference, supporting models like Mistral, Llama 2, and specialized variants like dolphin-mistral for different use cases. For higher throughput scenarios, vLLM serves OpenHermes-2.5-Mistral-7B.
Vector Search: Qdrant handles vector similarity search, enabling RAG (Retrieval-Augmented Generation) for context-aware responses. This is critical for maintaining coherent conversations and grounding responses in actual project documentation.
Text-to-Speech: Coqui TTS provides neural voice synthesis with voice cloning capabilities. Edge TTS serves as a faster alternative for less demanding scenarios.
Animation Pipeline: This is where things get interesting. SIIMPAF includes avatar animation capabilities using SadTalker for facial animation, EMAGE for body gesture generation, and PantoMatrix for combined rendering. This enables animated AI avatars that can lip-sync to generated speech.
Distributed Computing with DGPUNET
Running these workloads on a single machine quickly hits hardware limits. The animation pipeline alone can consume 20+ GB of VRAM. Add LLM inference, TTS, and image generation, and you're looking at requirements that exceed most consumer GPUs.
This led to DGPUNET - a distributed GPU network using Ray clustering. By pooling resources from multiple machines (currently 5 systems with a combined 92GB of VRAM), SIIMPAF can handle workloads that would otherwise require expensive enterprise hardware.
Real-World Applications
SIIMPAF isn't just a technical exercise. It's designed to support actual applications:
RPEPTFS (Role-playing Enhanced Pitch Training Feedback Simulator): An AI-powered platform where entrepreneurs practice investor pitches with realistic AI-generated investor NPCs. Each investor has a distinct personality powered by fine-tuned LLMs, and the system generates animated avatar responses. This directly supports my work with ClimbHigh.AI and helps founders prepare for real investor meetings.
Educational Assistants: For PracticingMusician.com and ClimbHigh.AI, SIIMPAF provides the foundation for intelligent tutoring systems that can maintain context across learning sessions.
Therapeutic Gaming NPCs: My work at RPGResearch.com and RPG.LLC involves using role-playing games therapeutically. SIIMPAF enables AI-powered NPCs that can participate in therapeutic game sessions with appropriate context awareness.
Document Analysis: For Dev2Dev.net and other technical projects, the RAG pipeline enables intelligent analysis of large documentation sets.
Current Status and Limitations
I want to be realistic about where SIIMPAF stands. It's currently in pre-alpha (v0.0.38). The core functionality works, but there's significant work remaining:
- The animation pipeline needs optimization for real-time performance
- Fine-tuning workflows need better documentation
- Multi-user support is limited
- The web interface is functional but basic
This isn't a product ready for general release. It's a working system that serves my specific needs while I continue development.
What's Next
Near-term priorities include:
- Improving animation rendering performance
- Better integration with existing LMS platforms
- Enhanced voice cloning for character-specific voices
- Expanded documentation and deployment guides
Learn More
If you're interested in self-hosted AI infrastructure, I've put together a project page with technical details:
Project Page: https://www.siimpaf.com
The page includes information about the technology stack, related articles documenting the development journey, and links to related projects like DGPUNET and AILCPH.
About the Author
Hawke Robinson, "The Grandfather of Therapeutic Gaming," serves as Full and Fractional CITO at PracticingMusician.com and ClimbHigh.AI. He is also involved with RPGResearch.com, RPG.LLC, Dev2Dev.net, NeuroRPG.com, and numerous other projects spanning therapeutic gaming, educational technology, and AI development.
