iOS Reference Library Apple Developer
Search

Audio Session Basics

Read this chapter to learn which iOS development problems audio sessions solve and to make first acquaintance with audio session features.

Why Does iOS Need to Manage Audio?

On your morning commute you unlock your iPhone and start listening to a new episode of a podcast, which plays back through the built-in speaker. As your seatmate frowns, you quickly plug in your headset and the Podcast continues, its output now rerouted and at the volume you last used for the headset. Maybe you start a sudoku, which plays its own sound effects—that mix with the podcast output. A few seconds later the podcast fades to silence, an alarm sounds, and an alert appears, reminding you of a birthday. You dismiss the alert. The podcast fades back in and resumes where it left off. Sounds in your sudoku game resume working.

You might do all this in the space of a minute—and without touching any audio settings. The remarkable simplicity of the audio user experience on iPhone belies an underlying complexity greater than that on a Mac Pro. The infrastructure that makes the simplicity possible is exposed to your application through an audio session object.

An audio session lets you provide a seamless audio experience in your application. In fact, any iOS application that uses the AV Foundation framework, Audio Queue Services, OpenAL, or the I/O audio unit must use the audio session programming interface to meet Apple’s recommendations as laid out in iPhone Human Interface Guidelines.

What Is an Audio Session?

An audio session is the intermediary between your application and iOS for configuring audio behavior. Upon launch, your application automatically gets a singleton audio session. You configure it to express your application’s audio intentions. For example:

Audio session configuration influences all audio activity while your application is running, except for user-interface sound effects that you play. You can query the audio session to discover hardware characteristics of the device your application is on—characteristics such as channel count, sample rate, and availability of audio input. These can vary by device and can change due to user actions while your application runs.

You can explicitly activate and deactivate your audio session. For application sound to play, or for recording to work, your audio session must be active. The system can also deactivate your audio session—which it does, for example, when a phone call arrives or an alarm sounds. Such a deactivation is called an interruption. The audio session APIs provide ways to respond to and recover from interruptions.

What Is an Audio Session Category?

An audio session category is a key that identifies a set of audio behaviors for your application. By setting a category, you indicate your audio intentions to the system—such as whether your audio should continue when the screen locks. The six audio session categories in iOS, along with a set of override and modifier switches, let you customize your app’s audio behavior.

Each audio session category specifies a particular pattern of “yes” and “no” for each of the following behaviors, as detailed in “Audio Session Categories”:

Most applications need only set the category once, at launch. That said, you can change the category as often as you need to, and can do so whether your session is active or inactive. If your session is inactive, your category request is sent when you activate your session. If your session is already active, your category request is sent immediately.

Audio Session Default Behavior—What You Get for Free

An audio session comes with some default behavior. Specifically:

This collection of behavior is encapsulated in, and named by, the AVAudioSessionCategorySoloAmbient audio session category—the default category.

Your audio session is automatically activated on application launch. This allows you to play (or record, if you specify one of the categories that supports audio input). However, relying on the default activation is a risky state for your application. For example, if an iPhone rings and the user ignores the call—leaving your application running—your audio may no longer play, depending on which playback technology you’re using. The next section describes some strategies for dealing with such issues, and “Handling Audio Interruptions” goes into depth on the topic.

You can take advantage of default behavior as you’re bringing your application to life during development. However, the only times you can safely ignore the audio session for a shipping application are these:

In all other cases, do not ship your application with the default audio session. You may elect to use the default category, but explicitly using and managing an audio session that employs the “solo ambient” category provides different behavior than using it by default, as described next.

Why a Default Audio Session Usually Isn’t What You Want

If you don’t initialize, configure, and explicitly use your audio session, your application cannot respond to interruptions or audio hardware route changes. Moreover, your application would have no voice in OS decisions about audio mixing between applications.

Note: Default audio session behavior changed between iOS 2.1 and 2.2. Starting in iOS 2.2, audio, by default, is silenced when the user moves the Ring/Silent switch to the ‚Äúsilent‚Äù position or when the screen locks. To ensure that your audio keeps playing or recording in these circumstances, assign an appropriate category to your audio session. For details, see ‚ÄúAudio Session Categories‚Äù and ‚ÄúSetting the Category.‚Äù

Here are some scenarios that clarify audio session default behavior and how you can change it:

How the System Resolves Competing Audio Demands

As your iOS application launches, built-in applications (Messages, iPod, Safari, the phone) may be running in the background. Each of these may want to produce audio: a text message arrives, a podcast you started 10 minutes ago continues playing, and so on.

If you think of an iPhone as an airport, with applications represented as taxiing planes, the system serves as a sort of control tower. Your application can make audio requests and state its desired priority, but final authority over what happens “on the tarmac” comes from the system. You communicate with the “control tower” using the audio session. Figure 1-1 illustrates a typical scenario—your application wanting to use audio while the iPod is already playing. In this scenario, your application, in effect, interrupts the iPod.

Figure 1-1  Core Audio manages competing audio demands

A comic-book representation of the sequence of events surrounding the activation of an audio session.

In step 1 in the figure, your application requests activation of its audio session. You’d make such a request, for example, on application launch, or perhaps in response to a user tapping the Play button in an audio recording and playback application. In step 2, Core Audio considers the activation request. Specifically, it considers the category you’ve assigned to your audio session. In Figure 1-1, the SpeakHere application uses a category that requires other audio to be silenced.

In steps 3 and 4 in the figure, Core Audio deactivates the iPod application’s audio session, stopping its audio playback. Finally, in step 5, Core Audio activates the SpeakHere application’s audio session and playback begins.

Core Audio manages competing audio demands by having final authority to activate or deactivate any of the audio sessions present on a device. In deciding, it follows the inviolable rule that “the phone always wins.” No application, no matter how vehemently it demands priority, can trump the phone. When a call arrives, the user gets notified and your application is interrupted—no matter what audio operation you have in progress and no matter what category you have set.

To ensure that your audio is not disrupted by a phone call or a Clock alarm, the user must turn on Airplane Mode. This highlights another inviolable rule: the user, not your application, is in control of the device. For example, there is no programmatic way to silence a Clock alarm. To prevent an alarm from ruining a recording, the user must turn off scheduled alarms. Similarly, there is no way to programmatically set hardware playback volume. The user is always in control of the hardware volume using the volume buttons on the side of the device.

The Two Audio Session APIs

iOS offers two APIs for working with the audio session object, each with its own advantages:

You can mix and match calls to the two audio session APIs—they are completely compatible with each other. The rest of this document goes into depth on using the various features of these APIs.

Developing with the Audio Session APIs

When you add audio session support to your application, you can run your app in the Simulator or on a device. However, the Simulator does not simulate audio session behavior and does not have access to the hardware features of a device. When running in the simulator, you cannot:

To test the behavior of your audio session code, you need to run on a device. For some development tips when working with the Simulator, see “Running Your App in the Simulator.”




Last updated: 2010-07-09

Did this document help you? Yes It's good, but... Not helpful...