> For the complete documentation index, see [llms.txt](https://docs.nerve-protocol.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.nerve-protocol.com/overview/on-device-llms.md).

# On-Device LLMs & Myelin

## Myelin — The Edge Model

**Myelin** is Nerve Protocol's on-device model, engineered specifically for edge environments — phones, laptops, and on-device agents. At 4B parameters, it is the only model in its class delivering native audio **and** video processing alongside text and image, all within a 128K context window under an unrestricted Apache 2.0 license.

### Blistering On-Device Speed

The defining trait of Myelin is extreme execution speed. Powered by a highly optimized **Per-Layer Embedding architecture** that minimizes the active memory footprint, Myelin minimizes time-to-first-token (TTFT) and maintains ultra-high generation throughput. It outruns Phi-4 multimodal and vastly outpaces Qwen3.5-4B on identical GPU hardware.

This processing efficiency means Myelin easily handles complex, multi-turn, real-time edge applications — such as live voice conversation tracking and fluid video feed parsing — without thermal throttling or exhausting local device memory.

### Aligned for Safe Deployment

Unlike Soma (the uncensored flagship), Myelin is meticulously aligned for safe, reliable, and predictable enterprise and consumer deployment. It features robust guardrails that prevent harmful outputs while preserving sharp logical capabilities, making it immediately ready for consumer-facing applications and strict corporate compliance standards.

### Feature Comparison

| Feature              | Myelin     | Qwen3.5-4B | Llama 3.2 3B    | Phi-4 multimodal |
| -------------------- | ---------- | ---------- | --------------- | ---------------- |
| **Effective params** | **4B**     | \~4B       | 3B              | \~4B             |
| **Native audio**     | **Yes**    | No         | No              | Yes              |
| **Native video**     | **Yes**    | No         | No              | No               |
| **Context window**   | 128K       | 256K       | 128K            | 128K             |
| **License**          | Apache 2.0 | Apache 2.0 | Llama community | MIT              |

### Benchmark Performance

Myelin holds the edge over its closest same-class rival, Qwen 3.5-4B, across graduate-level reasoning, live coding, and vision benchmarks.

| Benchmark             | Myelin    | Qwen3.5-4B |
| --------------------- | --------- | ---------- |
| **MMLU Pro**          | **80.5%** | 79.1%      |
| **GPQA Diamond**      | **78.2%** | 76.2%      |
| **LiveCodeBench v6**  | **61.4%** | 55.8%      |
| **MMMU Pro (vision)** | **69.8%** | 66.3%      |

Myelin redefines what is possible on mobile hardware — dominating native media processing, executing at blistering speeds, and outscoring the competition on core reasoning. It is hyper-optimized for the rich, multimodal inputs required by next-generation on-device applications and can run fully offline.

***

**On-device Large Language Models (LLMs)** are self-contained models that run directly on operator hardware — smartphones, laptops, or private cloud instances — without routing inference through a third-party provider. Within Nerve Protocol, on-device models are downloaded, stored, and served through the application’s TEE layer, ensuring queries and outputs remain inside the operator’s cryptographic boundary.

Your on-device LLM evolves into your **Personal AI** by consuming data retrieved by [Secure Data Connectors](/overview/data-integrators.md) and stored locally in your trusted location. It fine-tunes on your personal knowledge graph without ever sending weights or training data to an external server.

#### **Technical advantages over cloud-hosted inference:**

* **Zero inference-side data exposure:** Queries never leave the device or enclave. A cloud provider cannot log, train on, or monetize your prompts because they never see them.
* **Low-latency, offline-capable execution:** Myelin (4B) runs entirely on local CPU/GPU with no network round-trip required. Response time is bounded by hardware, not API quotas or regional routing.
* **Continuous private personalization:** The model fine-tunes incrementally on your encrypted knowledge graph during idle periods. Gradient updates stay local; only the improved weights persist — raw training data is never retained after each update cycle.

***

### **Functionality of On-Device LLMs**

On-Device LLMs leverage local computational resources to perform tasks without relying on cloud-based servers. Within the **Nerve Protocol** ecosystem, these models are securely downloaded, stored, and managed through the Nerve Protocol application, ensuring that all data remains private and under the user's control. The operational process includes:

* **Storage:** Compressed models are stored on local storage devices (SSD/HDD) or within a private cloud, accessible via the Nerve Protocol app.
* **Local Processing:** AI computations are executed directly on the device's CPU/GPU, keeping sensitive operations confined to the local environment.
* **Hybrid Model:** For more complex tasks, local processing can be augmented with optional cloud resources, with the user retaining full control over data sharing.
* **Local Personalization:** Learning and model updates occur on-device during periods of inactivity, enabling continuous adaptation while maintaining data privacy.

***

### **Key Features of On-Device LLMs**

The integration of On-Device LLMs within Nerve Protocol offers several significant benefits:

#### **Contextual Personalization**

A pivotal feature of on-device LLMs is the continuously updated **Personal Index**, derived from the user's daily interactions, history, and preferences. This index may encompass emails, documents, browsing patterns, purchasing habits, and other data the user opts to include. The LLM utilizes these data embeddings to generate highly personalized responses, seamlessly adapting to the user's evolving needs. Over time, the model refines these embeddings, learning unique nuances such as communication style, task priorities, and domain interests.

#### **Hybrid Computing Architecture**

While on-device LLMs handle sensitive operations locally, they can also leverage cloud-based models or specialized agents for tasks requiring more intensive computation or domain-specific expertise. For instance, the local model may process initial user queries and context retrieval, then securely transmit anonymized or restricted data to a remote agent for specialized analysis or the generation of complex outputs. The user maintains full control over when and how these external interactions occur and which data, if any, are shared.

#### **Secure Data Handling and Trusted Execution**

On-device LLMs can employ secure hardware enclaves or **Trusted Execution Environments (TEEs)** to further protect user data. These TEEs isolate memory and computational processes from the rest of the system, preventing unauthorized access. Combined with robust encryption and secure model-serving techniques, on-device LLMs ensure that even if an attacker gains access to the device, the user's data and embeddings remain protected.

#### **Continuous Learning and Adaptation**

With the model residing locally, it can be updated or fine-tuned iteratively as the operator's usage patterns evolve. As Nerve Protocol users continuously retrieve data and update their Personal Indexes, the on-device model progressively improves. This incremental learning loop allows the Personal AI to refine its internal representations with minimal latency, resulting in a continuously adapting assistant capable of handling nuanced domain tasks — from financial analysis and scheduling to document preparation and compliance review. The operator retains full control over the model weights, the prompts, and all outputs. No training data, gradient updates, or fine-tuned weights leave the device.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://docs.nerve-protocol.com/overview/on-device-llms.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.