Upsampling

Päivitetty 6.8.2000

Mitä on upsampling?

(käytetään myös termejä up-conversion ja eräissä yhteyksissä oversampling)

Upsamplingillä tarkoitetaan alkuperäisen pulssikoodimoduloidun signaalin ylinäytteistämistä korkeammalle näytteenottotaajuudelle ja sananpituuden interpoloimista suuremmalle bittiluvulle.

Upsampling tapahtuu pääpiirteissään neljässä vaiheessa₍₁₎

Ensimmäisessä vaiheessa alkuperäisen 44.1/16-signaaliin näytteistystaajuutta lisätään ylinäytteistyssuotimella.
Seuraavassa vaiheessa bittimäärää muutetaan interpolointisuotimessa tai noise-shaping modulaattorissa
Kolmannessa vaiheessa digitaalinen signaali muutetaan analogiseksi DAC-piirillä
Neljäs vaihe on alipäästösuodatin.

Upsamplingin periaate

Alkuperäinen 44.1/16 signaali
Alkuperäinen signaali interpoloinnin jälkeen
Alkuperäinen signaali ylinäytteistyksen jälkeen
Alkuperäinen signaali upsamplauksen jälkeen ( = interpoloituna ja ylinäytteistettynä)

Kuva: MSB Technology

Upsamplingin yhteydessä ylinäytteistystaajuuksille syntyneet alkuperäisen signaalin kuvajaiset voidaan suodattaa loivalla alipäästösuotimella, tällöin suodatuksen kuuloalueelle ulottuva haitallinen vaikutus vähenee.

Upsampling Explained (pdf)

Upsamplingin vaikutukset äänenlaatuun

Kun analogisignaalin alipäästösuodatus tapahtuu hyvän matkaa kuuloalueen yläpuolella, voidaan käyttää loivempaa alipäästösuodatusta, haitalliset vaikutukset vähenevät kuultavalla alueella, joka ilmenee ylärekisterin sävykkyytenä ja luonnollisuutena. Toisaalta matalatasoisten signaalien tarkkuus paranee upsamplingin vaikuttaessa kvantisointivirheen vähentymiseen.

Upsampling ei kuitenkaan palauta sellaista alkuperäistä informaatiota, joka on peruuttamattomasti kadonnut äänitteen tuotannon eri vaiheissa.

Balanced Audio Technologyn kuuluisa suunnittelija Victor Khomenko vertasi upsamplingia freskojen entisöintiin: Rappeutuneen maalauksen puutuvat palat täydennetään sen informaation perusteella mitä säilyneessä osassa on käytettävissä. Entisöinti ei palauta alkuperäistä, puutuvaa informaatiota, mutta maalaus on saanut uskottavan olemuksen, joka on rappeutunutta alkuperäistä ylevämpi.

Juuri tästä on kysymys upsamplingissä: puutteellista signaalia täydennetään uskottavalla informaatiolla sen perusteella mitä ympäröivästä informaatiosta voidaan päätellä ja lopputulos on vakuuttavampi kuin alkuperäisellä, mutta puutteellisella signaalilla.

Max Hauserin selostus upsamplingistä

Bob Ohlsson on ystävällisesti postittanut oheisen artikkelin, joka on ilmestynyt alunperin usenetissä 1991 rec.audio.high-end uutisryhmään:

Introduction

This is a broad and not-very-technical online summary of CD
oversampling, antidotal to the lies and nonsense served up in
consumer-audio retailing and an alternative to the well-intentioned but misleading or fractional explanations often seen in the popular audio press (including the so-called serious or high-end publications). This synopsis is less detailed but much broader than my earlier "technical summary," written for engineers and posted to rec.audio in 1987 and occasionally since.

If you want more depth of information on these topics, my colleague Prasanna Shah recently published a dense technical overview of audio oversampling in the popular magazine Audio in January. I published a long technical tutorial (not specific to CD or DAT products) in the Journal of the Audio Engineering Society, January/February 1991 (vol. 39 no. 1/2 pp. 3-26). A lighter and shorter overview of mine is also available as Preprint #2973 from the Audio Engineering Society, 60 East 42nd Street, New York, New York 10165 USA, Telephone 212-661-8528, FAX 212-682-0477. This preprint was recently recommended by the popular magazine Stereophile. AES charges $5 for such preprints ($4 for members), $3 for journal-article copies and $10 for back issues, and is pretty efficient about getting them into your hands if you FAX them a request with V. or MC. credit-card information (I did so recently and the copies arrived in the mail in about three days). These papers contain many further references. Some of these sources emphasize the A/D rather than D/A path but the core principles are identical (the circuit implementation of each section interchanges between analog and digital).

Do not be alarmed if the following summary takes an approach different from what you read elsewhere. There are details here not usually mentioned in popular summaries (or even in the research literature).

A. Thumbnail Sketch of Oversampling (Upsampling)

Signals such as audio, stored digitally, entail a finite *sampling rate* (44.1 kilosamples/sec for the 12-cm CD) whereas in their natural (analog) form they are continuous-time waveforms (you can think of this usefully as an "infinite" sampling rate). The circuitry that regenerates the continuous-time analog output in a CD player has two major tasks: translating a stream of digital numbers into analog values ("conversion") and also bridging between the finite sampling rate of the digital sequence and the "infinite" sampling rate of the outside world (that is, restoring a correct continuous waveform from discrete samples) -- "reconstruction."

Non-oversampling conversion-reconstruction (C-R) systems make the transition from finite to "infinite" sampling rate in one step, while oversampling systems do it through one or more intermediate sampling rates (higher than the original, but still finite). Although the details may not be obvious at this point, producing these intermediate signals with elevated sampling rates is a purely digital process and can thus be performed predictably and repeatably (although it does require that you have technologies where digital logic is very cheap, and therefore it was unattractive until recent years, although the basic techniques have been known since the 1950s and in embryonic forms since the time of the second world war).

Not only the reconstruction process but also separately the conversion process (bits into volts) benefits from the use of an intermediate sampling rate on the way to continuous time. Designers can orchestrate eloquent mathematical tricks to trade a higher deliberate sampling rate for lower required resolution in internal digital-to-analog converter (DAC) circuitry. This in turn tends to render the analog part of the C-R chain simpler and more tolerant of component fluctuations. But moreover, in practice oversampling C-R systems blend the two tasks of conversion and reconstruction so that they overlap in actual hardware, unlike a classical, non-oversampling system. The subjects of this paragraph are extremely complex and seductively counterintuitive even to well-trained engineers, and they habitually garner the most imaginative misinterpretations in popular-press writing.

An oversampling conversion-reconstruction (C-R) system in practice normally contains a series of four major blocks. The first is a sampling-rate- increasing digital filter, the second a digital quantization-management subsystem or "noise-shaping modulator," the third a DAC circuit _per se_ and the fourth an analog lowpass filter. A classical, or non-oversampling, system lacks the first two blocks, but is far more demanding of the last two blocks, which are analog circuits that largely determine the performance and subtler behaviors of the signal path. (That's the whole reason for oversampling, in a nutshell.)

By the way, these four blocks reflect a combination of traditionally separate specialties in electrical engineering, each with a different intuition and set of assumptions about what is technically difficult or important. This is why you will find many different explanations of oversampling (some of them seemingly in conflict) even from competent specialists. The first of the four blocks is generically a digital filter, the second a quantizer (or quantized feedback system), the third a precision analog circuit and the fourth an analog filter, and most or all are realized in integrated circuitry. Thus, for example, someone familiar with digital filtering will usually focus on the first of the four blocks, and when asked for more information will
instinctively steer you to the general digital-filter literature (which unfortunately is extremely dilute on this subject). In reality an oversampling C-R chain entails intimate concert between all four blocks and between multiple technical specialties -- none alone is sufficient to explain what is going on.

B. Interpolator

The first block, the sampling-rate-increasing digital filter, in an oversampling C-R system is commonly nicknamed an "interpolator." This jargon is triply unfortunate. First, almost everybody unfamiliar with multirate digital filtering assumes from the name, incorrectly, that this block performs "interpolation" in the common mathematical sense (such as linear or polynomial interpolation between data points). Actually the name is a specialized digital-filtering coinage subtly but crucially different. Second and third, as if that weren't trouble enough, the term "interpolative" is sometimes applied in two further ways to oversampling C-R systems (one of these usages is a subset of the other). More details about this and other glorious terminological pitfalls are in my recent AES Journal paper.

Here is the briefest sketch of how the rate-increasing filter works. The objective is to convert a signal at a sampling rate like 44.1 ks/s to a signal at a higher sampling rate *without* changing the information content. Mathematically this is a well-defined and tractable problem. If you just take the original sequence of samples and insert after each of them, for example, three new samples (with value zero, or holding the last old-sample value, or almost anything else intelligent) then you will obtain a new sequence at four times the original rate. In frequency spectrum this new sequence will however include new high-frequency replicas (images) of the original signal's spectrum. A digital lowpass filter will remove these images and leave a signal spectrally identical to the original. In the time domain, you will now see a higher-rate sequence that will look like the original but with the "right" new samples smoothly inserted between old. (In actual practice the "insertion" of new samples is NOT a separate step as above, but is incorporated into the digital-filter arithmetic.)

C. Multibit vs. single-bit vs. MASH etc.

All four of the major blocks of an oversampling C-R system, outlined in Section A, admit endless variations, opportunities for design
cleverness, and high/low quality choices, and account for practical performance and manufacturing-cost differences among CD players. (All of which players, incidentally, appear more or less indistinguishable in the rudimentary and unrevealing standard specifications -- peak SNR, frequency response, step response -- commonly published.) A lot of fuss and advertising copy are however devoted to one particular design difference, the organization of the noise-shaping modulator (and consequently the format of the internal DAC circuit(s)). The common organizations are:

Multibit feedback noise shaping. The modulator properly predistorts (noise-shapes) the oversampled digital signal sent to a lower-resolution multibit DAC so that when properly analog-postfiltered its output will yield the full 16-bit resolution stored on the CD. This is the oldest scheme common in consumer products, widely popularized by the NV Philips SAA 7030 / TDA 1540 chip set (1983) with a 14-bit internal DAC and 4:1 oversampling yielding 16-bit final resolution. (For more details on how 4:1 relates to two additional bits, see either of my AES papers above.)
One-bit feedback noise shaping (called "Bitstream" by Philips and "delta-sigma" by the research community [Note 1]). Same as the previous but taken to an extreme: the internal DAC has only one bit of resolution and the 16-bit net D-to-A resolution is accomplished by the oversampling, noise-shaping and postfiltering process. This approach requires a higher oversampling factor, such as 128 or 256, other things being equal.
Feedforward or multistage noise shaping (abbreviated "MASH" [Note 2] by Nippon Telephone and Telegraph). A series of small one-bit feedback noise shapers, each of which operates on a quantization-error (residue) output from the previous stage, while simultaneously the quantized outputs are combined properly to form the main output. "MASH" data converters are definitely not "one-bit" data converters in a meaningful sense, as I've explained in more depth in print, although they are commonly made up of one-bit subsections and this sometimes causes confusion.

Each of these competing modulator topologies has technical strengths and weaknesses that are very involved and do not lend themselves to summary. The signal fidelity in each of them can be excellent but depends on different sets of circuit elements. It is all a matter of "second-order" electrical effects; if the components are all perfect (as they invariably are assumed to be, in popular explanations of this subject matter), then all the techniques work equally well. Very broadly, however, I would say that the one-bit designs have the fewest subtle distortion vulnerabilities.

D. What does it mean for sound

The electrical specifications of an oversampling C-R system depend on innumerable component values and design choices and are in no way simply predictable from whether the internal modulator uses, for example, MASH or Bitstream or some other topology. Still less
predictable are perceptual fidelity measures, which of course are the ultimate figures of merit in audio and other human-interface
electronics (a point taken for granted not only by musicians but also, of course, by competent engineering researchers and by the graduate communication-theory texts, such as Jayant and Noll).

This does not, of course, prevent glib advertising copywriters and cult audio pundits from directly linking this or that audible property to the glamorous _au courant_ labels like MASH and delta-sigma (just as it is thought very fashionable to talk about Fast Fourier Transforms, no matter how ignorantly or irrelevantly, in stock-price analysis, or about chaos theory in management -- I could go on and on, and I do). Such writers might even be right, but they almost certainly don't know it -- the audible differences are much more likely due to the oblique dependence of the C-R topology on other design choices in the player, or to the quality of analog-digital ground isolation, or to the choice of output-filter op amps.

Notes from the text

Note 1: "Delta-sigma" modulation and data conversion (the inventors' term) was unintentionally rechristened "sigma-delta" at the Bell Telephone Laboratories in 1963 and this reversal has propagated through many paper titles, so you will see both names in use. No difference is intended. I have made efforts to redress this reversal and the principals are now in accord. My recent JAES paper mentions this and I have further details if anyone professionally interested sends a mailing address.

Note 2: Some people dislike the acronym MASH for MultistAge noise SHaping, though there certainly are endless well-known precedents (UNIted nations ChildrEn's Fund; GEheime STAatsPOlizei). When its coiners introduced "MASH" in the US in 1986 a colleague remarked to me that MUSH was better on acronym style. I think however that MUSH would have less audio-marketing cachet.

The technique now dubbed "MASH" by NTT has existed in various forms since long before its recent popularization by Toshio Hiyashi et al. in February 1986 (this origin itself is usually misattributed to a later paper by Uchimura et al.). I have antecedents going back at least to 1969. Multibit feedback noise shaping is even older, due to Cutler in 1954.

Max W. Hauser {mips,philabs,pyramid}!prls!max prls!max@mips.com