Tether’s QVAC division announced on March 17, 2026, the launch of the world’s first cross-platform LoRA fine-tuning framework for Microsoft’s BitNet models (1-bit LLMs), enabling billion-parameter AI training and inference on consumer GPUs and smartphones.
The framework, integrated into QVAC Fabric, reduces memory and compute requirements sufficiently to fine-tune models up to 13 billion parameters on devices including the iPhone 16, Galaxy S25, and Pixel 9, with 125M-parameter models trainable in approximately 10 minutes on mobile hardware.
The release marks a significant step in Tether’s strategic pivot from stablecoin issuer to broader infrastructure provider, challenging the centralized AI development model dominated by cloud providers and specialized NVIDIA hardware.
The QVAC Fabric framework enables LoRA (Low-Rank Adaptation) fine-tuning and inference acceleration across heterogeneous consumer hardware, including:
Desktop GPUs: AMD, Intel, and NVIDIA
Apple ecosystem: Apple Silicon M chips and Bionic mobile GPUs
Mobile GPUs: Adreno (Samsung), Mali, and others
This broad compatibility eliminates the previous requirement for enterprise-grade NVIDIA systems or cloud infrastructure, which has concentrated AI development among organizations with specialized hardware budgets.
Tether’s engineering team demonstrated successful fine-tuning on flagship smartphones with the following results:
125M-parameter models: Fine-tuning on a Samsung Galaxy S25 (Adreno GPU) completes in approximately 10 minutes for a biomedical dataset of ~300 documents (~18k tokens)
1B-parameter models: Fine-tuning the same biomedical data completes in 1 hour 18 minutes on Samsung S25 and 1 hour 45 minutes on iPhone 16
Maximum capacity: Models up to 13 billion parameters were successfully fine-tuned on iPhone 16, pushing edge device capabilities far beyond typical sub-3B parameter demonstrations
BitNet inference on mobile GPUs shows substantial acceleration compared to CPU baselines:
Speed improvement: GPU performance between 2 and 11 times faster than CPU across tested devices
Practical implication: Mobile GPUs can now support workloads previously requiring specialized expensive hardware or data centers
Benchmarks demonstrate significant memory savings compared to conventional models:
BitNet-1B (TQ1_0) : Uses up to 77.8% less VRAM than Gemma-3-1B (16-bit)
vs. Qwen3-0.6B: 65.6% less VRAM than the 16-bit version
These reductions apply across both inference and LoRA fine-tuning workloads, creating meaningful memory headroom for larger models and personalization workflows on hardware previously considered insufficient.
The framework enables fine-tuning of models twice as large on edge devices compared to Q4 non-BitNet models, demonstrating the superior memory efficiency of the BitNet architecture.
Tether CEO Paolo Ardoino framed the release within a broader vision of accessible AI: “Intelligence will be a key determining factor in the future of society. When training large language models depends on centralized infrastructure, innovation becomes stagnant, the ecosystem becomes fragile, and societal equilibrium is put at risk. By enabling meaningful large-model training on consumer hardware, including smartphones, Tether’s QVAC is proving that advanced AI can be decentralized, inclusive, and empowering for everyone.”
The efficiency gains make federated learning achievable, allowing fine-tuned updates to be trained and shared across distributed devices while keeping sensitive user data local. This reduces dependence on centralized infrastructure while enabling collaborative model improvement.
By reducing reliance on cloud providers, the framework enables users to keep sensitive data local to their devices during fine-tuning, addressing privacy concerns associated with transmitting data to centralized servers.
Tether’s release directly challenges the centralized AI development model dominated by hyperscalers and cloud providers. By enabling meaningful AI work on consumer hardware, the company positions itself as an infrastructure player in the edge AI stack, independent of traditional cloud jurisdictions.
The framework, including the paper, adapters, benchmarks, and cross-platform binaries, is available on Hugging Face. This open-source approach aims to establish QVAC as a default path for independent developers and small labs to deploy AI on consumer hardware, building cultural and technical relevance outside traditional regulatory frameworks.
The release continues Tether’s expansion beyond stablecoin issuance into critical digital infrastructure, following previous QVAC initiatives including the 41-billion-token Genesis I dataset and local AI Workbench. The company has signaled continued investment in decentralized AI infrastructure over “coming weeks, months, and years.”
Full technical documentation, including performance benchmarks, implementation details, and cross-platform binaries, is available through the Hugging Face blog: “LoRA Fine-Tuning BitNet b1.58 LLMs on Heterogeneous Edge GPUs via QVAC Fabric.”
Tether describes its mission as advancing freedom, transparency, and innovation through technology, enabling direct peer-to-peer information exchange without unnecessary intermediaries. The company aims to replace centralized models with decentralized infrastructure designed for privacy, efficiency, and resilience.
The QVAC Fabric BitNet LoRA framework supports consumer GPUs from AMD, Intel, and NVIDIA; Apple’s ecosystem including Silicon M chips and Bionic mobile GPUs; and mobile GPUs including Adreno (Samsung), Mali, and others. This enables AI fine-tuning on laptops, desktops, and flagship smartphones without specialized enterprise hardware.
According to Tether’s benchmarks, GPU-based inference on flagship mobile devices runs between 2 and 11 times faster than CPU baselines. Memory usage drops by up to 77.8% compared to conventional models, enabling larger models to run within the same hardware constraints.
Fine-tuning a 13-billion-parameter model on a smartphone represents a step change from typical on-device AI demonstrations, which usually revolve around sub-3B parameter models or offload heavier workloads to the cloud. This capability suggests a future where serious model personalization and domain-specific adaptation can occur locally, without shipping user data to centralized servers.