The influence of aliasing filters as the “source of all evil” has been discussed more and more often recently. I already addressed this in my first article many years ago, including many beautiful illustrative diagrams; admittedly as just one aspect among many. Subsequently, the discussion went in a different direction. In the meantime, however, I am quite inclined to reduce the whole topic of “choosing the right sampling rate for music transmission” to the filters, and so I would like to comment on this topic again. Because as things stand today, I would actually reduce everything to one sentence:
“Apply the sampling theorem correctly!” In principle, everyone knows the theorem: “The sampling rate must be twice as high as the highest frequency to be transmitted.” I don’t think anyone disagrees with that. However, the theorem is usually applied in exactly the opposite way: The bandwidth is limited to half the desired sampling rate. At this point, the human hearing range is used as an argument. However, this is the wrong comparative variable here. The only decisive comparative variable is – as the sampling theorem says – the bandwidth of the useful signal. This is because if the bandwidth is reduced with the help of technical equipment (aliasing filters), the signal does not remain unchanged in the useful band either. In the beginning, people thought they were on the safe side with the wonderfully phase-stable, symmetrical digital filters. Today, the term ringing is well known. These filters generate temporal smearing and new spectral components that are not contained in the original signal and are unfortunately not outside the range of human perception.
The decisive question is therefore not whether humans derive any benefit from frequencies above 20 kHz. These frequency components are simply present in the music signal to be digitized because instruments generate them, microphones pick them up and analogue amplifier technology transmits them. If the sampling theorem is not fulfilled in the correct way, but only through the “back door”, the filters used will inevitably discredit the signal. If the sampling theorem is applied correctly – i.e. the sampling rate is actually selected to be at least twice as high as the highest frequency contained in the useful signal – then these errors do not occur and no artifacts superimposed on the music signal arise. For this reason – and perhaps only for this reason – sampling rates higher than 48kHz sound better than those below. Even at 96kHz, the filters work with significantly fewer artifacts, and at 192kHz these artifacts are virtually non-existent because there is really nothing left in the frequency spectrum of music that needs to be filtered out. The sampling theorem is only completely fulfilled in the correct interpretation from 192kHz sampling rate.
So any kind of argumentation is wrong, as with the invention of the CD and also just read again: “A sampling frequency of 44.1kHz is sufficient to store everything that the human ear hears in digitized form.” However, a formulation such as “A sampling frequency of 44.1 kHz is sufficient to store everything contained in the useful signal in digitized form” would be permissible if it were true, because this is only the case with music in exceptional cases. In my opinion, understanding this subtle difference is the core of the issue!
The bandwidth of the useful signal as it leaves the microphone sets the reference value to which everything must conform if the transmission is to be as ideal as possible. If this is not followed, a lower quality level is automatically set (see also Claude Elwood Shannon’s “A Mathematical Theory of Communication” from 1948, page 47/48).
DSD
The central design flaw in the CD format – the negative influences of the filters – had apparently also been recognized in the development department at Sony in the mid-1990s. Furthermore, the delta-sigma principle had largely established itself as a converter type in the audio sector at that time. The actual converter stages from and to analog work with a bitstream on the digital side. Conversion stages convert this bitstream into PCM or generate it from PCM. The basic idea behind DSD and SACD was to simply omit these so-called decimation filters or interpolation filters and transport the bitstream without these intermediate stages. The fact that the above-mentioned errors are avoided without these filter stages is the main advantage of DSD. Measured against the state of the art in the 1990s, the format could really only mean an improvement (although I personally never felt that way acoustically).
However, there were also two decisive disadvantages: In principle, the audio signal in DSD format contains noise in the high frequency range at a relatively high level from just under 20kHz. The main disadvantage, however, is that DSD cannot be processed in the studio. It must either be processed analog or converted to PCM. The latter is the method most often chosen, unless the production was completely in PCM anyway. There are probably very few genuine DSD productions.
Technical development then continued and, from today’s perspective, this 20-year-old idea is long outdated. Although the sigma-delta principle has remained, it has undergone an enormous development process. Today, the leading chip manufacturers usually work internally in ADC or DAC chips with 12.288MHz at 6Bit instead of 2.822MHz at 1Bit, which was standard when SACD was invented in the early 1990s and which is why the DSD format for SACD was born. The logic of the time, that the decimation and interpolation filters in the AD and DA converters should simply be omitted and this format stored instead, has been overtaken by technical developments. The step away from 1-bit to multi-bit is a very decisive one. If you look into the theory behind DSD, you come across the problem of idle tones – perhaps the explanation for the acoustic dissatisfaction mentioned above. This problem is completely avoided with Multibit.
On the other hand, the negative influences of the filters in PCM are easily avoidable with current technology if the delta-sigma bitstream is converted to PCM at a sufficiently high sampling rate; sufficiently high in the sense of fulfilling the sampling theorem. What DSD was supposed to solve in the mid-1990s is no longer a problem with the modern use of PCM. With 24/192 there are no filter artifacts and the bandwidth is in any case significantly greater than with DSD. The impulse responses often shown are misleading in this respect and make DSD appear more precise than it really is for a typical music signal. This is because high frequencies in a music signal always have comparatively low levels. This is why analog tape machines and records work with corresponding equalization. An audio transmission system must be able to process high frequencies as accurately as possible at a low level. Even the inventors of the CD were aware of this, as can be seen in the Emphasis option. The inventors of the SACD, however, had apparently completely forgotten what type of signal they were dealing with. Only the impulses (Dirac surge) that are so popular are perfect for DSD. They consist of all frequencies at the same level. Such an impulse is therefore also transmitted very well via DSD at a level just below full scale, because all spectral components are above the HF noise. However, this has nothing to do with the reality of music transmission, as the high frequency components of music disappear in the noise. DSD actually has exactly the opposite of the characteristics required for music and for which, for example, the ancient record and tape recordings were optimized back then. DSD can only transmit high frequencies at high levels. However, music does not contain high frequencies at high levels. More realistic statements can therefore be obtained from tests with square wave signals – where the amplitude of the nth harmonic decreases with 1/n – and it is then not surprisingly very clear that PCM24/192 is clearly better than DSD64. It only becomes interesting from DSD128 onwards, because the noise then starts at a similarly high level as the bandwidth ends at 24/192 PCM. But another very important problem remains: DSD cannot be further processed in the studio. The digital conversion procedures to PCM and back are anything but trivial, and the losses incurred in the process call into question any benefits that may have existed previously. The same applies to the alternative production route via analog stages and the additional conversion processes required as a result. The admittedly positive aspect of DSD with regard to filter artifacts can also be achieved in PCM formats with a sufficiently high sampling rate. However, these formats can easily be handled in the studio at a very high level of quality.
If the idea behind DSD is to be up to date, these 12.288MHz/6-bit signals from modern converter chips would have to be stored. However, it is doubtful whether this immense effort compared to 24/192 PCM – which, as we know, can be processed directly – really makes sense. The gap between an optimally converted 24/192 PCM and an excellent analog is already too small, if it still exists at all. The renaissance of DSD should rather be seen in the context of the not really understood design flaw in first generation digital formats. The flaw is recognized acoustically. However, the wrong or at least more ineffective solution is chosen.