Installation & Network

RobotVision is designed to act as a local edge vision sensor. Setting up the connection enables local communication between your phone, backend server, and downstream controllers.

1. Enable Hotspot

Enable 4G Personal Hotspot on the iPhone running the app. The default hotspot IP serves as the network gateway at 172.20.10.1.

2. Connect Downstream Clients

Connect your backend host machine and downstream controllers (such as an ESP32-S3 micro-controller or database server) to the iPhone's Hotspot.

3. Verify Connection

The iOS app runs its REST server on port 8080 (http://172.20.10.1:8080). Downstream micro-controllers can query the phone's IP directly. Ensure all devices are on the same local subnet.

// Hotspot Topology Map
iPhone Hotspot Router (172.20.10.1)
  ├── iOS REST Service    -> http://172.20.10.1:8080
  ├── Backend Server      -> http://172.20.10.X (Dynamic IP)
  └── Downstream Client   -> http://172.20.10.Y (e.g. Robot / Database)

Permissions & Data

RobotVision relies strictly on local device capabilities to protect your data. Below are the permissions required and how face/depth data is managed:

Camera Access

Essential for capturing real-time color video frames used for neural network inference and ARKit spatial raycasting.

Photo Library Access

Only requested when you click the screenshot button to save visual detection bounding boxes locally. The app never uploads images.

TrueDepth & Facial Data Guarantee
When switching to the front-facing camera, ARKit initiates face geometry tracking. This processing occurs 100% locally in-memory. We never collect, store, share, or transmit facial shapes or biometric data. All variables are instantly dropped when the AR session closes.

Core Features

RobotVision leverages advanced iOS technologies for optimal machine vision performance:

Dynamic Resolutions

Runs at 720p (1280x720) instead of raw 4K. This reduces tensor downscaling overhead, preserves battery, and helps maintain a constant 30-60 FPS workflow.

ANE Co-Processing

Forces CoreML models to load in cpuAndNeuralEngine mode. This offloads model calculations to Apple's Neural Engine, keeping the GPU free for ARKit UI overlays.

Built-In NMS

CoreML exports include Non-Maximum Suppression inside the model block. This moves redundant box filtering out of CPU code, dropping inference latency to sub-30ms.

REST API Reference

The iOS app hosts an HTTP server on port 8080. The backend triggers queries directly to control settings and read coordinates.

GET/health

Check if the REST server is reachable and inspect current model state.

// Response JSON (200 OK)
{
  "status": "ok",
  "model": "colorcapyolo26x",
  "state": "idle"
}

GET/status

Get rolling averages of on-device inference speed, system thermal state, and uptime.

// Response JSON (200 OK)
{
  "uptimeSeconds": 612.0,
  "thermalState": "nominal",
  "processedFPS": 1.5,
  "lastInferenceMs": 24.3,
  "p95InferenceMs": 29.1
}

POST/detect

Trigger a single YOLO inference on the latest camera frame. Returns array of 3D spatial points.

Field Type Required Description
requestId String Yes UUID-v4 string to trace the request and match logs.
// Request Body
{
  "requestId": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d"
}

// Response JSON (200 OK)
{
  "requestId": "9b1deb4d-3b7d-4bad-9bdd-2b0d7b3dcb6d",
  "success": true,
  "objects": [
    {
      "class": "red",
      "confidence": 0.96,
      "centerX": 320.0,
      "centerY": 240.0,
      "width": 80.0,
      "height": 80.0,
      "distanceMeters": 0.42,
      "realWidthMeters": 0.08,
      "realHeightMeters": 0.08
    }
  ]
}

POST/capture

Fetch a JPEG snapshot of the raw camera sensor (without overlay) for logging/debugging.

// Response
Binary image stream (image/jpeg)

POST/settings

Adjust camera FPS, confidence thresholds, non-maximum suppression parameters, and model states.

Field Type Required Description
selectedModel String No Name of the target CoreML model to compile.
confidenceThreshold Float No Detection confidence limit (0.0 to 1.0).
iouThreshold Float No IOU overlap limit for box suppression.
targetFPS Float No Target frames per second (e.g. 20.0, 30.0).
isScanning Boolean No Set to false to suspend background CoreML task.
// Request Body
{
  "confidenceThreshold": 0.55,
  "targetFPS": 20.0
}

// Response JSON (200 OK)
{
  "success": true,
  "updatedFields": ["confidenceThreshold", "targetFPS"]
}

Troubleshooting

Check the following items when encountering local connection or inference failures:

REST Server Unreachable (timeout/error)

Confirm your backend computer is connected to the iPhone's Hotspot. Test connection directly using curl http://172.20.10.1:8080/health. Verify that the REST server is set to active inside the iOS app GUI.

Downstream Controller Unconnected

Verify that your controller is powered and its serial log displays Wi-Fi connection success. Check SSID and password settings. Ensure it has obtained a local IP on the Hotspot subnet.

Thermal Warning or FPS Dropping

The iOS system will limit Neural Engine performance if the device overheats. Try reducing camera target FPS (e.g. to 15.0) via POST /settings, turn off live bounding box drawing on screen, or move the device away from direct heat.

FAQ

No. The AI models run entirely on-device, and the REST server operates over a local Wi-Fi hotspot. You do not need internet access to run detections, making it ideal for remote or network-isolated research labs.

Yes. The app is built to support CoreML model packages (.mlpackage). You can compile a custom YOLO model using our training pipeline and load it via Xcode or transfer it to the app's document directory.

On devices with LiDAR sensors (iPhone Pro series), spatial distance is accurate within 1-2 millimeters at close range (0.3m to 1.5m). On non-LiDAR devices, accuracy relies on ARKit feature point mapping, which remains within 1-2 centimeters.