Technology for Private AI on Apple Devices

Private AI, Powered by Apple Silicon
Enigmus is built exclusively for Apple platforms, leveraging MLX—Apple's machine learning framework—to deliver private AI on Mac, iPhone, and iPad. By running entirely on-device, data never leaves the hardware.
Why Apple Silicon?
Apple's M-series chips (M1, M2, M3, M4, and M5) revolutionized what's possible for on-device AI. The key innovation is unified memory architecture—CPU, GPU, and Neural Engine all share the same memory pool, eliminating the data transfer bottlenecks that plague traditional systems.
This means large language models can run efficiently without expensive dedicated GPUs. A MacBook, iMac, or iPhone becomes a capable AI workstation.
MLX: Apple's ML Framework
MLX is Apple's array framework for machine learning, purpose-built for Apple Silicon. At WWDC 2025, Apple signaled MLX as a strategic component of their AI ecosystem, with deep integration into macOS and iOS.
Key Advantages
- Unified Memory: Arrays live in shared memory—operations run on CPU, GPU, or Neural Engine without data copying
- Metal GPU Acceleration: Purpose-built for Apple's Metal framework, maximizing performance on Apple hardware
- Native Swift Support: First-class Swift API makes it perfect for iOS and macOS app development
- Neural Engine Integration: On M5 chips, MLX leverages dedicated Neural Accelerators for matrix operations
On-Device Benefits
Running AI locally on Apple devices provides:
- Complete Privacy: Conversations and data never leave the device
- No API Costs: No per-token fees or subscription requirements
- Offline Capable: Works without internet once the model is downloaded
- Low Latency: Instant responses without network round-trips
Supported Models
Enigmus supports leading models optimized for Apple Silicon:
GPT-OSS by OpenAI
OpenAI's first open-weight models since GPT-2, released August 2025:
- gpt-oss-20b: 21B parameters, runs within 16GB memory—ideal for M1/M2/M3 Macs
- gpt-oss-120b: 117B parameters for high-memory configurations
Both use mixture-of-experts (MoE) architecture with 4-bit quantization, delivering excellent performance on Apple Silicon.
Qwen3 by Alibaba
Alibaba's hybrid reasoning models, released April 2025:
- Qwen3-0.6B / Qwen3-1.7B / Qwen3-4B / Qwen3-8B: Models for iPhone and iPad, with Qwen3-8B for high-memory devices
- Qwen3-14B / Qwen3-32B: Full-featured models for Mac
- Qwen3-30B-A3B: Sparse MoE variant—32B-class performance with only 3B parameters active
- Qwen3-Next-80B / Qwen3-235B-A22B: Large models for high-memory Macs (64GB+)
Qwen3 features hybrid reasoning (toggle between fast and deep thinking), 128K context window, and support for 119 languages.
Performance on Apple Devices
Mac (Apple Silicon)
On M1 and newer Macs, Enigmus delivers responsive AI interactions:
- M1/M2 (8GB): Qwen3-0.6B and Qwen3-1.7B run smoothly for everyday tasks
- M1/M2 Pro (16GB+): GPT-OSS-20b and Qwen3-14B for advanced use cases
- M3/M4 Max (64GB+): Run large models including Qwen3-32B, Qwen3-Next-80B
- M3/M4 Max (128GB+): Run the largest models including Qwen3-235B-A22B
- M5 with Neural Accelerators: Optimized matrix operations for fastest inference
iPhone & iPad (iOS 18+)
Enigmus brings on-device AI to mobile:
- iPhone 13+ / iPad: Run Qwen3-0.6B, Qwen3-1.7B, Qwen3-4B, or Qwen3-8B (high-memory devices)
- Increased Memory Entitlement: Enables larger models on capable devices
- Metal GPU Required: Real device needed (simulators not supported)
The Privacy Advantage
Unlike cloud-based AI services, Enigmus processes everything locally. When asking a question, drafting an email, or analyzing a document:
- Input stays on the device
- The AI model runs on Apple Silicon
- The response is generated locally
- Nothing is uploaded to external servers
This architecture ensures privacy by design—data remains on the device at all times.
Frequently Asked Questions
What is MLX and why does Enigmus use it?
MLX is Apple's open-source machine learning framework, designed specifically for Apple Silicon. Unlike other ML frameworks, MLX uses a unified memory model where arrays live in shared memory—allowing operations to run on CPU, GPU, or Neural Engine without copying data between them. This makes it exceptionally efficient for running large language models on Mac, iPhone, and iPad.
At WWDC 2025, Apple announced deeper MLX integration into macOS and iOS, signaling it as a core component of their AI strategy.
Sources: Apple MLX Open Source · GitHub - ml-explore/mlx · WWDC 2025 MLX Session
What's the minimum Mac configuration to run Enigmus?
Enigmus requires any Mac with Apple Silicon (M1 or newer) running macOS 14 Sonoma or later. The experience scales with your hardware:
- 8GB RAM: Run compact models like Qwen3-0.6B or Qwen3-1.7B for everyday tasks
- 16GB RAM: Run mid-size models like GPT-OSS-20b or Qwen3-14B
- 32GB+ RAM: Run larger models with better context handling
- 64GB+ RAM: Run large models like Qwen3-32B, Qwen3-Next-80B
- 128GB+ RAM: Run the largest models like Qwen3-235B-A22B
The M5 chips with Neural Accelerators provide the fastest inference thanks to dedicated matrix multiplication hardware.
Sources: MLX Documentation · Apple M5 Neural Accelerators
Can I run Enigmus on iPhone or iPad?
Yes. Enigmus supports iOS 18+ on devices with sufficient hardware:
- iPhone 13 or newer (A15 chip or later)
- iPad with A15 chip or later
A real device is required—iOS Simulators don't support the Metal GPU features MLX requires. For larger models, the "Increased Memory Limit" entitlement must be enabled in device settings.
The Qwen3-0.6B and Qwen3-1.7B models run well on all supported devices, with Qwen3-4B and Qwen3-8B available on high-memory devices.
Sources: MLX Swift on iOS · GitHub - ml-explore/mlx-swift
What is GPT-OSS and how does it compare to ChatGPT?
GPT-OSS is OpenAI's first open-weight model family since GPT-2, released in August 2025. It includes two variants:
- gpt-oss-20b: 21 billion parameters, fits in 16GB memory
- gpt-oss-120b: 117 billion parameters for high-end systems
Both use a mixture-of-experts (MoE) architecture with 4-bit quantization (MXFP4). The gpt-oss-120b matches or exceeds OpenAI's o4-mini on benchmarks for coding, math, and tool use—running entirely on-device with no API costs or data sharing.
Sources: Introducing GPT-OSS | OpenAI · GPT-OSS Model Card · GitHub - openai/gpt-oss
What makes Qwen3 special for local AI?
Qwen3, released by Alibaba in April 2025, offers several advantages for local deployment:
- Hybrid reasoning: Toggle between fast responses and deep thinking mode
- Efficient variants: The Qwen3-30B-A3B uses only 3B active parameters while delivering 32B-class performance
- Massive context: 128K token context window (1M tokens in Qwen3-2507)
- Multilingual: Supports 119 languages and dialects
- Size range: From 0.6B (ultra-light) to 32B (full-featured)
The compact Qwen3-0.6B and Qwen3-1.7B models work on all supported iOS devices, Qwen3-4B and Qwen3-8B on high-memory devices, while larger variants shine on Mac.
Sources: Alibaba Qwen3 Announcement · GitHub - QwenLM/Qwen3 · Qwen on Hugging Face
Is my data really private with Enigmus?
Yes, completely. Enigmus processes everything on-device using Apple's MLX framework. Here's what that means:
- No cloud connection required: Once a model is downloaded, Enigmus works entirely offline
- No data transmission: Prompts, documents, and conversations never leave the Mac, iPhone, or iPad
- No telemetry: No usage data, analytics, or interaction information is collected
- Local ownership: Everything stays in local storage under user control
This is fundamentally different from cloud AI services like ChatGPT or Claude, which process data on remote servers. With Enigmus, privacy isn't a policy—it's architecture.
Sources: MLX Unified Memory Model · On-device ML with MLX Swift