Docs · Capabilities

Voice and speak

Fulcrum can listen and speak. /voice records from your mic, transcribes via Whisper-Large-v3 on scx.ai, and drops the text into your prompt for review before submit. /speakreads assistant replies back through scx.ai's TTS catalogue. Both are off by default — toggle on per-message or persistent.

/voice — mic to prompt

Type /voice in the TUI and hit Enter. The recording starts inline above the prompt input — no fullscreen modal, the transcript and prompt stay fully visible.

What happens

  1. PortAudio (via sounddevice) opens the system default input. macOS may prompt for microphone permission on the first run; the indicator stays visible while the prompt is showing.
  2. Audio is buffered locally and POSTed to scx.ai's /audio/transcriptions endpoint with Whisper-Large-v3.
  3. The returned text drops into the prompt input. You can edit, prepend, or delete it before pressing Enter.

Review before submit

The transcribed text lands in the prompt input, not the wire. This is intentional — Whisper makes errors and the prompt is going to a coding agent that will read files, run shell, and edit your repo. You don't want stale or hallucinated input shipped without a glance.

Indicator UI

A one-line bar docks above the prompt input while you record: amber-on-dark when capturing, with elapsed time and a level meter. After you stop, the bar switches to a “processing on scx.ai…” animation while the transcription request is in flight, then disappears as the text appears in the prompt.

Stopping

While recording: Space stops capture and triggers transcription; Esc cancels without sending. Both are wired as priority bindings — they take effect even when the prompt input has focus.

Cross-platform notes

  • macOS: Core Audio via PortAudio (preinstalled on stock macOS). The release workflow installs portaudio via Homebrew when building, but binary users get it for free.
  • Linux: install libportaudio2 and portaudio19-dev (e.g. sudo apt-get install libportaudio2 portaudio19-dev). Without it, /voice shows a clean error and falls back to keyboard input.
  • Windows: WASAPI via PortAudio is bundled with the wheel.

/speak — assistant to audio

Per-message

After a turn, type /speak on its own to read a short summary of the last assistant reply through your system audio. /speak some text herespeaks the inline argument verbatim and skips summarisation. The MP3 is fetched from scx.ai's /audio/speechendpoint and handed to the platform's native player — afplay on macOS, paplay/aplay/ffplay on Linux, start on Windows.

Persistent auto-speak

/speak-on enables auto-speak: a one-line spoken summary of every terminal assistant reply, until disabled.

/speak-off disables it.

Voice selection

/voices opens a cursor picker over the catalogue (↑/↓ to navigate, Enter to pick, Esc to cancel). The choice persists to ~/.fulcrum/config.json as the tts_voice field — same shape as /models.

TTS model

tts-1 is the default. scx.ai exposes an OpenAI-compatible /audio/speech endpoint but with its own voice catalogue — these are not the OpenAI alloy/echo/nova names.

Playback

One clip plays at a time — issuing a new /speak terminates any in-flight playback before starting the next, so two clips never overlap. Replies above ~3500 characters are trimmed before synthesis to stay under the upstream per-request cap.

Configuration

Three fields in ~/.fulcrum/config.json control the voice surface. Defaults are sensible — most users never touch them.

FieldDefaultEffect
transcription_modelWhisper-Large-v3Speech-to-text model used by /voice.
tts_modeltts-1Text-to-speech model used by /speak.
tts_voiceserene-assistantVoice ID from the scx.ai catalogue (see below).

Persist by editing ~/.fulcrum/config.json directly, or by setting the corresponding FULCRUM_* environment variable (e.g. FULCRUM_TTS_VOICE=alice-bennett) at shell startup. /voices does the same write for you, atomically, with mode 0600.

Voice catalogue

Six voices, hard-coded in the picker because scx.ai exposes no /voices endpoint as of this release. Pass the ID to tts_voice in config or pick visually with /voices.

IDCharacter
serene-assistantneutral, calm — good default
alice-bennettwarm female, conversational
itosoft, narrator-like
australian-samAustralian male, casual
friendly-kiwiNew Zealand male, upbeat
likeable-aussieAustralian female, friendly

Use cases

  • Hands-free pair programming. /speak-on with an ambient mic is well-suited to conference calls where you want to watch the agent work and narrate decisions without typing every reaction.
  • Documentation by dictation. /voice drafts long prompts faster than typing — useful for design rationales, bug repros, and post-mortems where you want to think aloud first and edit the transcript second.
  • Accessibility. /voice plus /speak-on makes Fulcrum operable without keyboard or screen for long stretches — recordings start with a single command and replies come back spoken.

Why scx.ai TTS instead of OpenAI

The voice catalogue is curated for Australian English — australian-sam, likeable-aussie, friendly-kiwi— which is the structural sovereign-AI choice carried into voice. Same Australian Privacy Act jurisdiction that applies to text inference applies to speech: the audio bytes you send and receive stay in scx.ai's Australian region, not a US-routed third party.

Privacy note

Your voice data goes to scx.ai for transcription and TTS. It is not stored locally beyond the in-flight buffer (an in-memory WAV chunk on the way out, an MP3 written to $TMPDIRfor the OS player on the way back). If you're discussing sensitive material — credentials, third- party customer data, anything covered by NDA — use the keyboard. The voice path is best-effort transcription, not a confidential channel.

Next steps

  • Models → The scx.ai catalogue Fulcrum talks to — Whisper-Large-v3, tts-1, and the text-completion models the same gateway routes.
  • Best practices → When to dictate, when to type, and how to keep auto-speak from becoming noise.