Voice notes

How to record voice observations, what gets transcribed, and how the AI structures the recording into a description and recommended action.

By Richard Pryce·Last updated 2026-05-07

Voice notes are for the moments where typing on the iPad slows you down: high up a stair with cold hands, in a tight bin store, or mid-walk between flat doors. Tap and talk. FRA Flow transcribes the recording and structures it into the same fields you would have typed.

How a voice observation is captured

Tap + Observation on a location, then tap Voice. The mic permission prompt appears the first time. Allow it. Record. Tap the stop button. The recording attaches to the observation and you move to the risk-rating step like any other capture.

Recordings can be up to a few minutes long but the sweet spot is fifteen to thirty seconds. Keep one observation per recording.

What gets transcribed

Once you reconnect, FRA Flow sends the audio to a transcription service (EU-hosted, the audio bytes never leave the EU region). The result is a verbatim text transcript stored on the observation.

The transcript is then passed to the AI a second time, with a prompt that asks for two things:

A description of what you observed. Cleaned up grammar, reordered if you doubled back mid-thought, but no editorial changes.
A recommended action. Pulled out only if you actually said one. The AI does not invent recommendations.

Both fields land on the observation card alongside the photo. You can edit either before saving.

Speaking patterns that work

A few habits that produce clean structured output:

State the location first. "Third floor landing, east stair." The transcript pairs the spoken location against the location you registered, so a mismatch is a useful flag.
Describe before recommending. "There's a 10 mm gap on the strike side of this fire door, and the seal looks compressed. We need to get a fire-door contractor to refit the seal and re-test under the alarm." The AI splits this cleanly into description and action.
Speak the measurements. Numbers and units ("eight millimetre gap", "third floor", "FD30S") are critical. The no-fabricated-numbers guard checks the report against the original transcript, so a number you said gets traced through the report; a number nobody said triggers a flag.

What the assessor sees in the workbench

Each voice observation has a player on the card so a reviewer can listen to the original recording. The transcript is stored too, so the reviewer can read along or search across recordings.

If the transcript looks wrong, you can edit it. The AI is not re-run after a transcript edit; the description and recommended action are independent fields once they have been generated.

Privacy and data residency

Audio bytes are uploaded to a transcription service hosted in the EU and processed there. The audio is retained while the report is live so the reviewer can play it back; once the report is signed off and archived, audio is retained for the audit period agreed in your data-processing addendum.

Data residency and GDPR covers where each piece of data lives, how long it is retained, and what sub-processors handle which step.

When the recording does not transcribe

Three common causes:

The audio is too short (under a second). The recording attaches to the observation but produces no transcript. Re- record.
Background noise drowns out the voice. The transcript will be garbled or empty. The voice-transcribe troubleshooter covers re-running.
The transcription service is temporarily unreachable. The recording stays attached, and the transcription queue retries with backoff. You can still read the audio back; only the text fields are missing.

Where to go next

Photo capture for the parallel photo flow.
Risk levels explains how the three-tier risk score maps to the action plan.
Edit and delete observations covers the workflows for fixing a captured observation later.

Risk levels →