Blame view

kernel/linux-rt-4.4.41/Documentation/sound/alsa/compress_offload.txt 10.2 KB
5113f6f70   김현기   kernel add
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
  		compress_offload.txt
  		=====================
  	Pierre-Louis.Bossart <pierre-louis.bossart@linux.intel.com>
  		Vinod Koul <vinod.koul@linux.intel.com>
  
  Overview
  
  Since its early days, the ALSA API was defined with PCM support or
  constant bitrates payloads such as IEC61937 in mind. Arguments and
  returned values in frames are the norm, making it a challenge to
  extend the existing API to compressed data streams.
  
  In recent years, audio digital signal processors (DSP) were integrated
  in system-on-chip designs, and DSPs are also integrated in audio
  codecs. Processing compressed data on such DSPs results in a dramatic
  reduction of power consumption compared to host-based
  processing. Support for such hardware has not been very good in Linux,
  mostly because of a lack of a generic API available in the mainline
  kernel.
  
  Rather than requiring a compatibility break with an API change of the
  ALSA PCM interface, a new 'Compressed Data' API is introduced to
  provide a control and data-streaming interface for audio DSPs.
  
  The design of this API was inspired by the 2-year experience with the
  Intel Moorestown SOC, with many corrections required to upstream the
  API in the mainline kernel instead of the staging tree and make it
  usable by others.
  
  Requirements
  
  The main requirements are:
  
  - separation between byte counts and time. Compressed formats may have
    a header per file, per frame, or no header at all. The payload size
    may vary from frame-to-frame. As a result, it is not possible to
    estimate reliably the duration of audio buffers when handling
    compressed data. Dedicated mechanisms are required to allow for
    reliable audio-video synchronization, which requires precise
    reporting of the number of samples rendered at any given time.
  
  - Handling of multiple formats. PCM data only requires a specification
    of the sampling rate, number of channels and bits per sample. In
    contrast, compressed data comes in a variety of formats. Audio DSPs
    may also provide support for a limited number of audio encoders and
    decoders embedded in firmware, or may support more choices through
    dynamic download of libraries.
  
  - Focus on main formats. This API provides support for the most
    popular formats used for audio and video capture and playback. It is
    likely that as audio compression technology advances, new formats
    will be added.
  
  - Handling of multiple configurations. Even for a given format like
    AAC, some implementations may support AAC multichannel but HE-AAC
    stereo. Likewise WMA10 level M3 may require too much memory and cpu
    cycles. The new API needs to provide a generic way of listing these
    formats.
  
  - Rendering/Grabbing only. This API does not provide any means of
    hardware acceleration, where PCM samples are provided back to
    user-space for additional processing. This API focuses instead on
    streaming compressed data to a DSP, with the assumption that the
    decoded samples are routed to a physical output or logical back-end.
  
   - Complexity hiding. Existing user-space multimedia frameworks all
    have existing enums/structures for each compressed format. This new
    API assumes the existence of a platform-specific compatibility layer
    to expose, translate and make use of the capabilities of the audio
    DSP, eg. Android HAL or PulseAudio sinks. By construction, regular
    applications are not supposed to make use of this API.
  
  
  Design
  
  The new API shares a number of concepts with the PCM API for flow
  control. Start, pause, resume, drain and stop commands have the same
  semantics no matter what the content is.
  
  The concept of memory ring buffer divided in a set of fragments is
  borrowed from the ALSA PCM API. However, only sizes in bytes can be
  specified.
  
  Seeks/trick modes are assumed to be handled by the host.
  
  The notion of rewinds/forwards is not supported. Data committed to the
  ring buffer cannot be invalidated, except when dropping all buffers.
  
  The Compressed Data API does not make any assumptions on how the data
  is transmitted to the audio DSP. DMA transfers from main memory to an
  embedded audio cluster or to a SPI interface for external DSPs are
  possible. As in the ALSA PCM case, a core set of routines is exposed;
  each driver implementer will have to write support for a set of
  mandatory routines and possibly make use of optional ones.
  
  The main additions are
  
  - get_caps
  This routine returns the list of audio formats supported. Querying the
  codecs on a capture stream will return encoders, decoders will be
  listed for playback streams.
  
  - get_codec_caps For each codec, this routine returns a list of
  capabilities. The intent is to make sure all the capabilities
  correspond to valid settings, and to minimize the risks of
  configuration failures. For example, for a complex codec such as AAC,
  the number of channels supported may depend on a specific profile. If
  the capabilities were exposed with a single descriptor, it may happen
  that a specific combination of profiles/channels/formats may not be
  supported. Likewise, embedded DSPs have limited memory and cpu cycles,
  it is likely that some implementations make the list of capabilities
  dynamic and dependent on existing workloads. In addition to codec
  settings, this routine returns the minimum buffer size handled by the
  implementation. This information can be a function of the DMA buffer
  sizes, the number of bytes required to synchronize, etc, and can be
  used by userspace to define how much needs to be written in the ring
  buffer before playback can start.
  
  - set_params
  This routine sets the configuration chosen for a specific codec. The
  most important field in the parameters is the codec type; in most
  cases decoders will ignore other fields, while encoders will strictly
  comply to the settings
  
  - get_params
  This routines returns the actual settings used by the DSP. Changes to
  the settings should remain the exception.
  
  - get_timestamp
  The timestamp becomes a multiple field structure. It lists the number
  of bytes transferred, the number of samples processed and the number
  of samples rendered/grabbed. All these values can be used to determine
  the average bitrate, figure out if the ring buffer needs to be
  refilled or the delay due to decoding/encoding/io on the DSP.
  
  Note that the list of codecs/profiles/modes was derived from the
  OpenMAX AL specification instead of reinventing the wheel.
  Modifications include:
  - Addition of FLAC and IEC formats
  - Merge of encoder/decoder capabilities
  - Profiles/modes listed as bitmasks to make descriptors more compact
  - Addition of set_params for decoders (missing in OpenMAX AL)
  - Addition of AMR/AMR-WB encoding modes (missing in OpenMAX AL)
  - Addition of format information for WMA
  - Addition of encoding options when required (derived from OpenMAX IL)
  - Addition of rateControlSupported (missing in OpenMAX AL)
  
  Gapless Playback
  ================
  When playing thru an album, the decoders have the ability to skip the encoder
  delay and padding and directly move from one track content to another. The end
  user can perceive this as gapless playback as we dont have silence while
  switching from one track to another
  
  Also, there might be low-intensity noises due to encoding. Perfect gapless is
  difficult to reach with all types of compressed data, but works fine with most
  music content. The decoder needs to know the encoder delay and encoder padding.
  So we need to pass this to DSP. This metadata is extracted from ID3/MP4 headers
  and are not present by default in the bitstream, hence the need for a new
  interface to pass this information to the DSP. Also DSP and userspace needs to
  switch from one track to another and start using data for second track.
  
  The main additions are:
  
  - set_metadata
  This routine sets the encoder delay and encoder padding. This can be used by
  decoder to strip the silence. This needs to be set before the data in the track
  is written.
  
  - set_next_track
  This routine tells DSP that metadata and write operation sent after this would
  correspond to subsequent track
  
  - partial drain
  This is called when end of file is reached. The userspace can inform DSP that
  EOF is reached and now DSP can start skipping padding delay. Also next write
  data would belong to next track
  
  Sequence flow for gapless would be:
  - Open
  - Get caps / codec caps
  - Set params
  - Set metadata of the first track
  - Fill data of the first track
  - Trigger start
  - User-space finished sending all,
  - Indicaite next track data by sending set_next_track
  - Set metadata of the next track
  - then call partial_drain to flush most of buffer in DSP
  - Fill data of the next track
  - DSP switches to second track
  (note: order for partial_drain and write for next track can be reversed as well)
  
  Not supported:
  
  - Support for VoIP/circuit-switched calls is not the target of this
    API. Support for dynamic bit-rate changes would require a tight
    coupling between the DSP and the host stack, limiting power savings.
  
  - Packet-loss concealment is not supported. This would require an
    additional interface to let the decoder synthesize data when frames
    are lost during transmission. This may be added in the future.
  
  - Volume control/routing is not handled by this API. Devices exposing a
    compressed data interface will be considered as regular ALSA devices;
    volume changes and routing information will be provided with regular
    ALSA kcontrols.
  
  - Embedded audio effects. Such effects should be enabled in the same
    manner, no matter if the input was PCM or compressed.
  
  - multichannel IEC encoding. Unclear if this is required.
  
  - Encoding/decoding acceleration is not supported as mentioned
    above. It is possible to route the output of a decoder to a capture
    stream, or even implement transcoding capabilities. This routing
    would be enabled with ALSA kcontrols.
  
  - Audio policy/resource management. This API does not provide any
    hooks to query the utilization of the audio DSP, nor any preemption
    mechanisms.
  
  - No notion of underrun/overrun. Since the bytes written are compressed
    in nature and data written/read doesn't translate directly to
    rendered output in time, this does not deal with underrun/overrun and
    maybe dealt in user-library
  
  Credits:
  - Mark Brown and Liam Girdwood for discussions on the need for this API
  - Harsha Priya for her work on intel_sst compressed API
  - Rakesh Ughreja for valuable feedback
  - Sing Nallasellan, Sikkandar Madar and Prasanna Samaga for
    demonstrating and quantifying the benefits of audio offload on a
    real platform.