Audio Innovation & IoT; My Audiophile Journey

October 27, 2020

Good audio is dependent more than ever on software and its integration with hardware. Due to the heavy dependencies of one and the other, audio has been a great way for me to deeply understand its nuances and understand its importance in the realm of IoT (Internet of Things) in addition to data engineering across the audio spectrum.

Getting ear molds for a custom headphone creation.
Getting custom ear molds made at a local audiologist. I had quite the conversation with him about the latest in hearing aid tech. Also, these go DEEP. Talk about ear wax clean up 😅 This was my second time doing this for ear molds.

I blew up my dad’s large floor standing speakers when I was 8 years old playing with what I can only imagine being the Pokemon soundtrack. As it was his first “married person” purchase you can imagine his reaction. However, since this point in time I’ve been infatuated with how audio works and what I can do to fix it and optimize it.

Paired with a recent understanding of my ADHD (auditory sensitivity, specifically) I’ve now come to realize that my ears are innately capable of picking out subtle audio discrepancies that others can’t hear for themselves. It’s also why when it’s time for me to drown out the world, audio is my escape.

My first main point is that audio is deeply personal. What sounds good to one person may be bad for another making most comparisons pointless. If you’re happy with your Apple Airpods, great. However, in the domain of audio products much like anything else there’s considerable brand recognition behind it and if you dive under the hood you can maximize your investment by learning what makes great audio and what doesn’t.

Those ear molds? They resulted in this custom molded setup created from Ultimate Ears Triple Fi Pros. Effectively turning $300 earbuds (eBay’d for $120!) into UE’s $1,200 custom IEMs for 20% of the cost.

Hardware: What Makes Sound (Super Simplified)

In the most basic sense a speaker needs an input/source, amplifier and speaker/magnet. The source provides your music and then passes its low volume to an amplifier which controls how loud it gets by adjusting its output to the magnet or speaker.

Turntables provided consumers 2 “channels” of audio. That is vinyls had the ability to have both a left speaker and right speaker. When recording and editing vinyl, music engineers had “2 products” to deliver in the form of ridges inside a vinyl that told the player what sounds to play from each speaker. The more expensive the production, the better music would sound to take advantage of these two variables such as the Beatles where multiple layers would combine onto the 2 channels. This source is primarily responsible for SQ or sound quality.

A simple 2 channel setup I had in college. Polk Monitors using an old Onkyo amp.

The amplifier then is responsible for the SPL or sound pressure level aka volume we hear. A good amplifier will allow you to turn music to a volume you desire while staying true to the source.

This is why analog vinyl players hold such a place in the heart of many audiophiles both young and old. It’s audio in its purest form and likewise allows an easy setup while still allowing audiophiles to use higher and higher end equipment to reach their sound goals.

The Subwoofer is Born; Let There Be BASS

Leave it to the hippies (or was it just audio junkies?) to express the desire to “feel” music. This movement’s embrace shepherded the creation of new audio sources allowing for additional bass effects to come into play. Because vinyl’s ridges can’t carry the needed “data bandwidth,” new technologies like the cassette gave sound engineers more variables to deliver sound.

Using these new technologies required a new component to emerge in the form of a processor. This processor’s main responsibility was to direct sound towards where it best needs to go. In this case it would direct bass signals to a dedicated subwoofer that in turn could be further amplified. It also allowed other elements to be altered such as Treble. You may know this as EQ adjustments.

Above is a great way to visualize the audio spectrum. Certain speaker hardware is dedicated to representing each segment of sound. As tech has improved we’re able to better send signals exactly to where they can be best played and, likewise, better optimize sound.

If you had a higher quality source the better the processor would be able to direct and manipulate sound. Vinyl, lacking the data for bass would oftentimes be processed and without the data would potentially sound worse. Cassettes would give processors more data to play albeit analog allowing for more sophisticated sound without degradation.

All the above, though, was again processed using analog hardware which was both expensive and difficult to understand and manipulate unless a pro.

Enter Digital & MP3

CDs and MP3s introduced digital audio to consumers for the first time. It allowed the processor to now become digital. In effect this allowed sound engineers to mix as they intended while allowing aftermarket processors to have direct access to data variables to further improve sound.

This is why true digital sources can be manipulated much more than analog sources. More data = more to adjust and fine tune without distortion.

This was my “mobile” setup throughout most of high school and college. Due to loud noise at swim meets I had an external headphone amplifier but bypassed the iPod’s internal amplifier so that I was receiving the clearest analog signal possible. Also why my poor CPU died early (likely) due to all the lossless conversions.

In the above diagram users could now set a 80hz “filter” and direct all audio contained within a digital source to where its desired whereas analog equipment had to run hardware to obtain the same effect. The cost of innovation or improvement was drastically reduced and hardware became more obtainable at lower price points.

This is where Bose emerged as a major contender allowing for better sound in more compact equipment. They invested in their own processing on top of digital sources to maximize the sound output of a speaker based on the data’s attributes. Their price premium wasn’t for better hardware; it was for the cost of the digital sound engineering that they used as a differentiator. This is why most enthusiasts refer to Bose as Buy-Other-Sound-Equipment. If savvy enough you could purchase better hardware for the same price as Bose and be better off. Still, a great product lesson continuing to show UX is king.

Audio Innovation Pre-Sonos; More Channels or Manipulate What’s There

With new formats such as DVDs and larger MP3 players came a fork in the road for sound innovation. You could either develop audio processing for sources already in existence (2 Channels) or you could add channels and, likewise, more variables for sound engineers to tweak.

This is a simple diagram showing how many 1990’s and 2000’s processors or AV receivers would turn a 2 Channel signal into faux “surround sound.” They tried to fill in missing space by duplicating data either through custom hardware or licensing software decoding from Dolby, THX or DTS.

I’ll save it for a dedicated home theater post but Dolby Digital and DTS emerged as standards for 5.1 channel audio allowing for movie sound to envelope consumers inside their own homes. However, due to headphones and mobile storage limitations 2 channel audio remained the standard for music production. More channels = more engineering and thus more expensive to create music so why bother pivoting away from the norm?

Looking towards product innovation, develop where most users already are; don’t create a competing platform.

This is where the majority of sound processing and innovation has taken place. Processing a 2 Channel audio files and turning into something more.

Lossless Audio & New Codecs

Although logo-tized this does a good job visualizing the fidelity (more data points) lossless audio provides.

The average MP3 file is roughly 5MB for a typical 3 minute song. However that size is only a fraction of the actual mastered track which may be as high as 200MB or more. Due to early digital sources not having that high of storage, songs were compressed using MP3 or other codecs to reach smaller files sizes.

These file sizes would reflect a song’s bit rate or the amount of data it sends to a source per second. The higher this is the more the source can process for output to speakers. Lossless music retains all elements of the song while MP3 or lossy codecs only keep the most audible frequencies we can hear and leave out the rest.

I once dragged a few college swimming teammates to a movie theater 2 hours away just to watch The Hobbit in High Frame Rate AND Dolby Atmos. The first and, still, only movie to ever include both in the same viewing. Worth it? Eh.

To many, the difference is hard to decipher and a well compressed MP3 can sound every bit as good as a FLAC file as most of the ‘data’ elements are things such as a drumstick hitting the floor; elements outside our auditory range and likewise, worth not including.

Although we no longer have to worry about the 500 song limit on our iPod Classic, these limitations are still in effect across streaming services today. Tidal for example advertises 1411kbps (Lossless Quality) whereas Spotify’s highest streaming option is 320kbps and defaults to even lower unless changed. Amazon and Tidal even offer Master quality where streaming could be as high as 4000kbps (1080P Movie!!).

These services, though, cost more than Spotify and reflect the increased cost of streaming such high quality files. Unless you have high end equipment the difference will never be noticeable and exactly why Tidal and others advertising this remain after a very niche market and Spotify continues to dominate the industry.

It is worth noting that Sony has recently been pushing a 360 degree spatial audio format. To date only select Sony recording artists have used the technology and although it does sound better, it’s yet another format engineers need to develop for. Recycling the aforementioned product scenario; build where people are vs creating something new.

Aaron Mahnke’s recent 13 Days of Halloween Podcast highlights the better approach; add spatial audio to the pre-existing formats. I believe this is another doomed Sony innovation.

The Innovation Path of the Future; Speaker Optimization

Have you noticed how Sonos has now become more premium than Bose? How about why Sonos is suing Google?

Sonos is using what’s been around for years in the high-end audio space; time correction and bass correction using microphones. Knowing that sound processing has gotten as good as it can; let’s optimize the speakers directly for their environment.

A patent image highlighting how this processing works in an automated fashion. Although the tech itself is nothing new; the application and UX is the biggest audio innovation in the last decade.

What they did was simplify what took multiple steps in a home theater or lab and reduced it to a few seconds when users would first turn on a device. They integrated a microprocessor with a microphone so the device could tune itself to wherever you placed the unit maximizing not the audio reproduction (Like Bose’s Tech) but instead adjust its speaker output based on environment.

Looking at an audio graph you may see frequencies completely disappear but others amplified so that the perception of sound is improved (only thing that matters when selling); not the actual source. Genius!

It’s a magnificent example of how a simplified UX and more integrated engineering results in better products and a reason they became the dominant speaker of choice for many over the past decade.

This is the best I can do highlighting how this is such an advancement in audio. The tight integration between the CPU and Audio Processor is critical and only feasible with very tight hardware and software integration only recently made viable at scale through tightly coupled SoC (System-on-Chip) “big tech” deals.

This has only become viable for speakers outside of $1k+ home audio receivers as this type of processing needs enough power to allow the UX to come forth. As the above highlights in red you can see how an additional non-audio processor needs to use a microphone to figure out improvements before setting them on the audio processor which then yields the benefit.

As the lawsuit notes; both Amazon and Google use the same technology within their IoT-based smart speakers. Whether they stole the technology or simply cloned it is another case entirely but it’s the type of technology that once done for one device, can be applied to many if using the same chipsets.

I was one of the Beta testers for Bragi’s Dash fully wireless earbuds before Airpods and Bluetooth 5.0 took off. They featured a very half-baked Alexa experience with terrible audio sound quality. Mainly due to poor processing, not speakers.

This is also the reason many third party speakers boasting Google and Alexa smarts may not simply sound as good as others. Each brand is likewise responsible for its own tuning and without this feature, better hardware (higher manufacturing cost) is going to sound worse than lower end hardware that is software optimized.

The IP for this to work would be the embedded code processing the mic information and relaying that back to a compatible audio processor. Without this IP any speaker in the marketplace is doomed for failure as it simply can’t compete with the tuning and (very noticeable) audio quality boost.

If Sonos, Google and Amazon can make their tiny speakers sound great why can’t I?

Well, it comes down to cost and that aforementioned IP. This type of technology needs tightly integrated software and firmware. These companies have standardized on chipsets that will be used in products for many years to come whereas other players in the space don’t have the alternative revenue streams to support the easy UX.

Being the hacker I am, I attempted to figure out years ago what was making phones like the OnePlus 6 series so great in this department and how I could get it on my Google Pixel.

While browsing the forums of XDA Developers I stumbled upon a company responsible for the improvements across the Android landscape, Dirac Audio. Much to my disappointment, though, unlike other Android audio enhancement modifications this technology wasn’t in the form of an APK (Android App) or even a system-level service. It was firmly integrated into the chipset’s sound processor. Good for Dirac as this means the tech can’t be pirated. It’s truly embedded to the highest degree and shows the sophistication in their process.

My previous desktop audio setup included a Dayton Audio amplifier (Class T is seriously awesome for its size) and a Schiit Modi 2 DAC/Processor feeding it.

While at CES I stopped by their booth for a demo and was blown away. They had me guess the woofer size of a bookshelf speaker. I guessed at least 5.25” with the bass it gave out. Instead, it was a single 2” tweeter. I was in awe. Then they played for me a tiny Bose Bluetooth speaker with the processing…instant goosebumps.

I knew I wanted this tech and immediately high tailed it to CES’s dedicated home theater section. However, my excitement was quickly diminished. Receivers start at $3k. Well outside of my budget and knowing that used components would be a rarity (and still pricy in this market segment). There’s just not enough demand to drive down prices.

Going back to my Yamaha (Still high end, I might add) processing and Schiit desktop processor….I simply couldn’t help but wonder just how much I was missing out on, though.

Dreams Do Come True

Some 2.5 years later and I finally found the Dirac processor in the form of the $449 MiniDSP DDRC-24. I know, very engineeringly named. However, this is the most cost effective way to achieve Dirac processing and after selling my previous DAC, much more cost effective.

The included microphone came with a wind guard and a 10' USB cable to ensure it would reach all tuning positions. MiniDSP even includes a custom calibration file for each microphone ensuring the most accurate reading possible.

I’m able to plug it directly into my computer via USB and have a true DAC. Without Dirac turned on it’s a good processor. Maybe not as good as my Schiit but that’s not why I bought it. No, what I paid for is the tight firmware/hardware to Dirac software pairing.

Following along the instructions I took 9 separate measurements around my “tightly focused” area using a tripod for accuracy.

After the results were measured the real magic happens and Dirac’s AI-driven processing kicks in and works out all the kinks in my listening setup. I have them paired with Polk’s LSiM 703 bookshelf speakers powered by a self-refurbished Parasound 2250 THX (250w RMS) amplifier I picked up on eBay for 1/5 the brand new price back in 2008. My dream high school desktop scenario thanks to many hours navigating the Polk Audio forums for optimal pairing.

The mic stand didn’t support the tripod’s standard screw size so I was able to rig it on there through other means. Yeah, yeah audiophiles can heckle me later.

This processing made an already amazing setup go over the edge. I had goosebumps when listening.

A great amplifier and speaker pairing will give you a “wide soundstage” which was the case prior to tuning but after….well the soundstage sounds as if it goes away entirely putting you right in the action. Even the most high-end open ear headphones I’ve tested haven’t come close to this.

The best way I can describe it is imagine being front stage at your favorite concert…for your entire Spotify playlist. Heaven.

Conclusion — Smart Speakers “Smarts” Need to Trickle Down; Legacy Audio Companies Need to Upskill

I now wonder why every speaker doesn’t sound this good when it’s “simply” software.

The reality is that it’s too complex and too expensive to tightly pair all the needed components. Dirac Research has been around for years fine tuning their AI processing. In order to stay afloat they have to make money too. This is a tough market being so small.

They were back at CES this year and I demo’d their tech improving the most popular headphones on the market. It made Sony’s XM3’s sound even better which I didn’t think could be done. Sadly, it’s been radio silence since and I hope they can find an alternative revenue stream soon.

The reality is brands such as Sony or automakers such as GM would rather self-make or use cheaper, less expensive tuning to maximize profit as most people aren’t needing this type of fidelity as impressive as it may be. Coupled with an ongoing licensing deal it’s a tough sell.

However, as ambient computing and consumers come to expect this new level of audio performance legacy audio brands such as Polk Audio, Yamaha, Klipsch and countless others will need to unite on standards that allow this deep level of customization. If they don’t I fear we’ll be living in a world dominated by the few key players whom hold the expertise (software).

Luckily, I’ve kept a library of around 1,500 FLAC songs that I’ve been enjoying. You can see some of the tuning settings on the processor in addition to the whopping bitrate of The Eagle’s Hotel California (Live).

My hope is that we’ll continue to see great hardware made better by software-enabled innovation and not simply cheaper hardware using it as a way to improve cost of production margins. Only time will tell but you can already hear it in the new voice assistants.

In the meantime I’ll keep enjoying the amazing sound and helping those who wish to achieve it obtain it within their budget. I promise you won’t regret it.

My dad most definitely enjoyed listening to some of his favorites after I set everything up! He’s the reason I got into this hobby after all 😀