Installation & Network
RobotVision is designed to act as a local edge vision sensor. Setting up the connection enables local communication between your phone, backend server, and downstream controllers.
1. Enable Hotspot
Enable 4G Personal Hotspot on the iPhone running the app. The default hotspot IP serves as the network gateway at 172.20.10.1.
2. Connect Downstream Clients
Connect your backend host machine and downstream controllers (such as an ESP32-S3 micro-controller or database server) to the iPhone's Hotspot.
3. Verify Connection
The iOS app runs its REST server on port 8080 (http://172.20.10.1:8080). Downstream micro-controllers can query the phone's IP directly. Ensure all devices are on the same local subnet.
// Hotspot Topology Map
iPhone Hotspot Router (172.20.10.1)
├── iOS REST Service -> http://172.20.10.1:8080
├── Backend Server -> http://172.20.10.X (Dynamic IP)
└── Downstream Client -> http://172.20.10.Y (e.g. Robot / Database)
Permissions & Data
RobotVision relies strictly on local device capabilities to protect your data. Below are the permissions required and how face/depth data is managed:
Camera Access
Essential for capturing real-time color video frames used for neural network inference and ARKit spatial raycasting.
Photo Library Access
Only requested when you click the screenshot button to save visual detection bounding boxes locally. The app never uploads images.
When switching to the front-facing camera, ARKit initiates face geometry tracking. This processing occurs 100% locally in-memory. We never collect, store, share, or transmit facial shapes or biometric data. All variables are instantly dropped when the AR session closes.
Core Features
RobotVision leverages advanced iOS technologies for optimal machine vision performance:
Dynamic Resolutions
Runs at 720p (1280x720) instead of raw 4K. This reduces tensor downscaling overhead, preserves battery, and helps maintain a constant 30-60 FPS workflow.
ANE Co-Processing
Forces CoreML models to load in cpuAndNeuralEngine mode. This offloads model calculations to Apple's Neural Engine, keeping the GPU free for ARKit UI overlays.
Built-In NMS
CoreML exports include Non-Maximum Suppression inside the model block. This moves redundant box filtering out of CPU code, dropping inference latency to sub-30ms.
REST API Reference
The iOS app hosts an HTTP server on port 8080. The backend triggers queries directly to control settings and read coordinates.
GET/health
Check if the REST server is reachable and inspect current model state.
// Response JSON (200 OK)
{
"status": "ok",
"model": "colorcapyolo26x",
"state": "idle"
}
GET/status
Get rolling averages of on-device inference speed, system thermal state, and uptime.
// Response JSON (200 OK)
{
"uptimeSeconds": 612.0,
"thermalState": "nominal",
"processedFPS": 1.5,
"lastInferenceMs": 24.3,
"p95InferenceMs": 29.1
}
POST/detect
Trigger a single YOLO inference on the latest camera frame. Returns array of 3D spatial points.
| Field | Type | Required | Description |
|---|---|---|---|
requestId |
String | Yes | UUID-v4 string to trace the request and match logs. |
// Request Body
{
"requestId": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"
}
// Response JSON (200 OK)
{
"requestId": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
"success": true,
"objects": [
{
"class": "red",
"confidence": 0.96,
"centerX": 320.0,
"centerY": 240.0,
"width": 80.0,
"height": 80.0,
"distanceMeters": 0.42,
"realWidthMeters": 0.08,
"realHeightMeters": 0.08
}
]
}
POST/capture
Fetch a JPEG snapshot of the raw camera sensor (without overlay) for logging/debugging.
// Response
Binary image stream (image/jpeg)
POST/settings
Adjust camera FPS, confidence thresholds, non-maximum suppression parameters, and model states.
| Field | Type | Required | Description |
|---|---|---|---|
selectedModel |
String | No | Name of the target CoreML model to compile. |
confidenceThreshold |
Float | No | Detection confidence limit (0.0 to 1.0). |
iouThreshold |
Float | No | IOU overlap limit for box suppression. |
targetFPS |
Float | No | Target frames per second (e.g. 20.0, 30.0). |
isScanning |
Boolean | No | Set to false to suspend background CoreML task. |
// Request Body
{
"confidenceThreshold": 0.55,
"targetFPS": 20.0
}
// Response JSON (200 OK)
{
"success": true,
"updatedFields": ["confidenceThreshold", "targetFPS"]
}
Troubleshooting
Check the following items when encountering local connection or inference failures:
REST Server Unreachable (timeout/error)
Confirm your backend computer is connected to the iPhone's Hotspot. Test connection directly using curl http://172.20.10.1:8080/health. Verify that the REST server is set to active inside the iOS app GUI.
Downstream Controller Unconnected
Verify that your controller is powered and its serial log displays Wi-Fi connection success. Check SSID and password settings. Ensure it has obtained a local IP on the Hotspot subnet.
Thermal Warning or FPS Dropping
The iOS system will limit Neural Engine performance if the device overheats. Try reducing camera target FPS (e.g. to 15.0) via POST /settings, turn off live bounding box drawing on screen, or move the device away from direct heat.
FAQ
No. The AI models run entirely on-device, and the REST server operates over a local Wi-Fi hotspot. You do not need internet access to run detections, making it ideal for remote or network-isolated research labs.
Yes. The app is built to support CoreML model packages (.mlpackage). You can compile a custom YOLO model using our training pipeline and load it via Xcode or transfer it to the app's document directory.
On devices with LiDAR sensors (iPhone Pro series), spatial distance is accurate within 1-2 millimeters at close range (0.3m to 1.5m). On non-LiDAR devices, accuracy relies on ARKit feature point mapping, which remains within 1-2 centimeters.