Connect with us


SwiftWhisper – The easiest way to transcribe audio in Swift

The easiest way to use Whisper in Swift.

Easily add transcription to your app or package. Powered by whisper.cpp.


Swift Package Manager

Add SwiftWhisper as a dependency in your Package.swift file:

let package = Package(
  dependencies: [
    // Add the package to your dependencies
    .package(url: "", branch: "master"),
  targets: [
    // Add SwiftWhisper as a dependency on any target you want to use it in
    .target(name: "MyTarget",
            dependencies: [.byName(name: "SwiftWhisper")])


Add in the “Swift Package Manager” tab.


API Documentation.

import SwiftWhisper

let whisper = Whisper(fromFileURL: /* Model file URL */)
let segments = try await whisper.transcribe(audioFrames: /* 16kHz PCM audio frames */)

print("Transcribed audio:",\.text).joined())

Delegate methods

You can subscribe to segments, transcription progress, and errors by implementing WhisperDelegate and setting whisper.delegate = ...

protocol WhisperDelegate {
  // Progress updates as a percentage from 0-1
  func whisper(_ aWhisper: Whisper, didUpdateProgress progress: Double)

  // Any time a new segments of text have been transcribed
  func whisper(_ aWhisper: Whisper, didProcessNewSegments segments: [Segment], atIndex index: Int)
  // Finished transcribing, includes all transcribed segments of text
  func whisper(_ aWhisper: Whisper, didCompleteWithSegments segments: [Segment])

  // Error with transcription
  func whisper(_ aWhisper: Whisper, didErrorWith error: Error)


Downloading Models 📥

You can find the pre-trained models at here for download.

Converting audio to 16kHz PCM 🔧

The easiest way to get audio frames into SwiftWhisper is to use AudioKit. The following example takes an input audio file, converts and resamples it, and returns an array of 16kHz PCM floats.

import AudioKit

func convertAudioFileToPCMArray(fileURL: URL, completionHandler: @escaping (Result<[Float], Error>) -> Void) {
    var options = FormatConverter.Options()
    options.format = .wav
    options.sampleRate = 16000
    options.bitDepth = 16
    options.channels = 1
    options.isInterleaved = false

    let tempURL = URL(fileURLWithPath: NSTemporaryDirectory()).appendingPathComponent(UUID().uuidString)
    let converter = FormatConverter(inputURL: fileURL, outputURL: tempURL, options: options)
    converter.start { error in
        if let error {

        let data = try! Data(contentsOf: tempURL) // Handle error here

        let floats = stride(from: 44, to: data.count, by: 2).map {
            return data[$0..<$0 + 2].withUnsafeBytes {
                let short = Int16(littleEndian: $0.load(as: Int16.self))
                return max(-1.0, min(Float(short) / 32767.0, 1.0))

        try? FileManager.default.removeItem(at: tempURL)


Speed boost 🚀

You may find the performance of the transcription slow when compiling your app for the Debug build configuration. This is because the compiler doesn’t fully optimize SwiftWhisper unless the build configuration is set to Release.

You can get around this by installing a version of SwiftWhisper that uses .unsafeFlags(["-03"]) to force a maximum optimization. The easiest way to do this is to use the latest commit on the fast branch. Alternatively, you can configure your scheme to build in the Release configuration.

  dependencies: [
    // Using latest commit hash for `fast` branch:
    .package(url: "", revision: "6ed3484c5cf449041b5c9bcb3ac82455d6a586d7"),
SwiftWhisper on GitHub:
Whisper.cpp on GitHub:
OpenAI Whisper on GitHub:
Platform: iOS
⭐️: 38