Posts from 2023

Memories of Molly

Published 1 year, 9 months past

The Web is a little bit darker today, a fair bit poorer: Molly Holzschlag is dead. She lived hard, but I hope she died easy. I am more sparing than most with my use of the word “friend”, and she was absolutely one. To everyone.

If you don’t know her name, I’m sorry. Too many didn’t. She was one of the first web gurus, a title she adamantly rejected  — “We’re all just people, people!”  — but it fit nevertheless. She was a groundbreaker, expanding and explaining the Web at its infancy. So many people, on hearing the mournful news, have described her as a force of nature, and that’s a title she would have accepted with pride. She was raucous, rambunctious, open-hearted, never ever close-mouthed, blazing with fire, and laughed (as she did everything) with her entire chest, constantly. She was giving and took and she hurt and she wanted to heal everyone, all the time. She was messily imperfect, would tell you so loudly and repeatedly, and gonzo in all the senses of that word. Hunter S. Thompson should have written her obituary.

I could tell so many stories. The time we were waiting to check into a hotel, talking about who knows what, and realized Little Richard was a few spots ahead of us in line. Once he’d finished checking in, Molly walked right over to introduce herself and spend a few minutes talking with him. An evening a group of us had dinner one the top floor of a building in Chiba City and I got the unexpectedly fresh shrimp hibachi. The time she and I were chatting online about a talk or training gig, somehow got onto the subject of Nick Drake, and coordinated a playing of “ Three Hours” just to savor it together. A night in San Francisco where the two of us went out for dinner before some conference or other, stopped at a bar just off Union Square so she could have a couple of drinks, and she got propositioned by the impressively drunk couple seated next to her after they’d failed to talk the two of us into hooking up. The bartender couldn’t stop laughing.

At SXSW 2005 with Dave Shea, her co-author on The Zen of CSS, and wearing an XFN shirt.

Standing outside Moscone Center in San Francisco with Cia Romano. I think this is that time we all got evacuated due to a fire alarm.

Or the time a bunch of us were gathered in New Orleans (again, some conference or other) and went to dinner at a jazz club, where we ended up seated next to the live jazz trio and she sang along with some of the songs. She had a voice like a blues singer in a cabaret, brassy and smoky and full of hard-won joys, and she used it to great effect standing in front of Bill Gates to harangue him about Internet Explorer. She raised it to fight like hell for the Web and its users, for the foundational principles of universal access and accessible development. She put her voice on paper in some three dozen books, and was working on yet another when she died. In one book, she managed to sneak past the editors an example that used a stick-figure Kama Sutra custom font face. She could never resist a prank, particularly a bawdy one, as long as it didn’t hurt anyone.

Holding court in somebody’s hotel suite, with a baby Matt Mullenweg in attendance.

Once again holding court, this time at a bar with Jason Santa Maria.

She made the trek to Cleveland at least once to attend and be part of the crew for one of our Bread and Soup parties. We put her to work rolling tiny matzoh balls and she immediately made ribald jokes about it, laughing harder at our one-up jokes than she had at her own. She stopped by the house a couple of other times over the years, when she was in town for consulting work, “Auntie Molly” to our eldest and one of my few colleagues to have spent any time with Rebecca. Those pictures were lost, and I still keenly regret that.

Rolling matzoh balls in our kitchen, *still* holding court.

On top of a bus somewhere in the world, probably London, with my partner Kat.

There were so many things about what the Web became that she hated, that she’d spent so much time and energy fighting to avert, but she still loved it for what it could be and what it had been originally designed to be. She took more than one fledgling web designer under her wing, boosted their skills and careers, and beamed with pride at their accomplishments. She told a great story about one, I think it was Dunstan Orchard but I could be wrong, and his afternoon walk through a dry Arizona arroyo.

I could go on for pages, but I won’t; if this were a toast and she were here, she would have long ago heckled me (affectionately) into shutting up. But if you have treasured memories of Molly, I’d love to hear them in the comments below, or on your own blog or social media or podcasts or anywhere. She loved stories. Tell hers.

Memories of Molly was published on Wednesday, September 6th, 2023.
It was assigned to the Personal and Web categories.
There have been forty replies.

Designing the Igalia Chats Logo

Published 1 year, 10 months past

One of the things I’ve been doing at Igalia of late is podcasting with Brian Kardell. It’s called “Igalia Chats”, and last week, I designed it a logo. I tried out a number of different ideas, ran them past the Communication team for feedback, and settled on this one.

D&AD Awards committee, you know where to find me.

And there you have it, the first logo I’ve designed in… well, in quite a while. My work this time around was informed by a few things.

Podcast apps, sites, etc. expect a square image for the podcast’s logo. This doesn’t mean you have to make the visible part of it square, exactly, but it does mean any wide-and-short logo will simultaneously feel cramped and lost in a vast void. Or maybe just very far away. The version shown in this post is not the square version, because this is not a podcast app and because I could. The square version just adds more empty whitespace at the top and bottom, anyway.
I couldn’t really alter the official logo in any major way: the brand guidelines are pretty strong and shouldn’t be broken without collective approval. Given the time that would take, I decided to just work with the logo as-is, and think about possible variants (say, the microphone icon in the blank diamond of the logo) in a later stage. I did think about just not using the official logo at all, but that felt like it would end up looking too generic. Besides, we hav e a pretty nifty logo there, so why not use it?
A typeface for the word “Chats” that works well with Igalia’s official logo. I used Etelka, which is a font we already use on the web site, and I think is the basis of the semi-serifed letters in the official logo anyway. Though I could be wrong about that; while I definitely have opinions about typefaces these days, I’m not very good at identifying them, or being able to distinguish between two similar fonts. Call it typeface blindness.
Using open-source resources where possible; thus, the microphone icon came from The Noun Project. I then modified it a bit (rounded the linecaps, shortened the pickup’s brace) to balance its visual weight with the rest of the design, and not crowd the letters too much. I also added a subtle vertical gradient to the icon, which helped the word “Chats” to stand out a little more. Gotta make the logo pop, donchaknow?

There are probably some adjustments I’ll make after a bit of time, but I was determined not to let perfect be the enemy of shipping. As for how I came to create the logo, you’re probably thinking fancy CSS Grid layout and custom fonts and all that jazz, but no, I just dumped everything into Keynote and fiddled with ideas until I had some I liked. It’s not a fantastic environment for this sort of work, I expect, but it’s Good Enough For Me™.

So, if you’re subscribed to Igalia Chats via your listening channel of choice, you should be seeing a new logo. If you aren’t subscribed… try us, won’t you? Brian and I talk about a lot of web-related stuff with a lot of really interesting people  — most recently, with Kilian Valkhof about the web development application Polypane, with Stephen Shankland about undersea data cables, with Zach Leatherman about open-source work and funding, and many more. Plus sometimes we just talk with each other about what’s new in Web land, things like Google Baseline or huge WebKit updates. And, yes, sometimes we talk about what Igalia is up to, like our work on the Servo engine or the Steam Deck.

This is one of the things I quite enjoy about working for Igalia  — the way I can draw upon all the things I’ve learned over my many (many) years to create different things. A logo last week, a thumbnail-building tool the week before, writing news posts, recording podcasts, doing audio production, figuring out transcription technology, and on and on and on. It can sometimes be frustrating in the way all work can be, but it rarely gets boring. (And if that sounds good to you, we are hiring for a number of roles!)

Designing the Igalia Chats Logo was published on Tuesday, August 22nd, 2023.
It was assigned to the Design and Work categories.
There have been no replies.

First-Person Scrollers

Published 2 years, 1 week past

I’ve played a lot of video games over the years, and the thing that just utterly blows my mind about them is how every frame is painted from scratch. So in a game running at 30 frames per second, everything in the scene has to be calculated and drawn every 33 milliseconds, no matter how little or much has changed from one frame to the next. In modern games, users generally demand 60 frames per second. So everything you see on-screen gets calculated, placed, colored, textured, shaded, and what-have-you in 16 milliseconds (or less). And then, in the next 16 milliseconds (or less), it has to be done all over again. And there are games that render the entire scene in single-digits numbers of milliseconds!

I mean, I’ve done some simple 3D render coding in my day. I’ve done hobbyist video game development; see Gravity Wars, for example (which I really do need to get back to and make less user-hostile). So you’d think I’d be used to this concept, but somehow, I just never get there. My pre-DOS-era brain rebels at the idea that everything has to be recalculated from scratch every frame, and doubly so that such a thing can be done in such infinitesimal slivers of time.

So you can imagine how I feel about the fact that web browsers operate in exactly the same way, and with the same performance requirements.

Maybe this shouldn’t come as a surprise. After all, we have user interactions and embedded videos and resizable windows and page scrolling and stuff like that, never mind CSS animations and DOM manipulation, so the viewport often needs to be re-rendered to reflect the current state of things. And to make all that feel smooth like butter, browser engines have to be able to display web pages at a minimum of 60 frames per second.

Admittedly, this would be a popular UI for browsing social media.

This demand touches absolutely everything, and shapes the evolution of web technologies in ways I don’t think we fully appreciate. You want to add a new selector type? It has to be performant. This is what blocked :has() (and similar proposals) for such a long time. It wasn’t difficult to figure out how to select ancestor elements — it was very difficult to figure out how to do it really, really fast, so as not to lower typical rendering speed below that magic 60fps. The same logic applies to new features like view transitions, or new filter functions, or element exclusions, or whatever you might dream up. No matter how cool the idea, if it bogs rendering down too much, it’s a non-starter.

I should note that none of this is to say it’s impossible to get a browser below 60fps: pile on enough computationally expensive operations and you’ll still jank like crazy. It’s more that the goal is to keep any new feature from dragging rendering performance down too far in reasonable situations, both alone and in combination with already-existing features. What constitutes “down too far” and “reasonable situations” is honestly a little opaque, but that’s a conversation slash vigorous debate for another time.

I’m sure the people who’ve worked on browser engines have fascinating stories about what they do internally to safeguard rendering speed, and ideas they’ve had to spike because they were performance killers. I would love to hear those stories, if any BigCo devrel teams are looking for podcast ideas, or would like to guest on Igalia Chats. (We’d love to have you on!)

Anyway, the point I’m making is that performance isn’t just a matter of low asset sizes and script tuning and server efficiency. It’s also a question of the engine’s ability to redraw the contents of the viewport, no matter what changes for whatever reason, with reasonable anticipation of things that might affect the rendering, every 15 milliseconds, over and over and over and over and over again, just so we can scroll our web pages smoothly. It’s kind of bananas, and yet, it also makes sense. Welcome to the web.

First-Person Scrollers was published on Tuesday, June 20th, 2023.
It was assigned to the Browsers and Web categories.
There has been one reply.

From ABC’s to 9999999

Published 2 years, 2 months past

The other week I crossed a midpoint, of sorts: as I was driving home from a weekly commitment, my iPhone segued from Rush’s “Mystic Rhythms” to The Seatbelts’ “N.Y. Rush”, which is, lexicographically speaking, the middle of my iTu — oh excuse me, the middle of my Music Dot App Library, where I passed from the “M” songs into the “N” songs.

See, about a year or so ago, I took inspiration from Kevin Smokler to set about listening through my entire music library alphabetically by song title. Thus, I started with “ABC’s” by K’naan and will end, probably in a year or so, with “9999999” by Mike Morasky (a.k.a Aperture Science Psychoacoustics Laboratory).

Every time I have to drive my car for more than a few minutes, I’ll plug in my iPhone and continue the listen from where I left off. This mainly happens during the aforementioned weekly commitment, which usually sees me driving for an hour or so. I also listen to it while I’m doing chores around the house like installing ceiling fans or diagnosing half-dead Christmas light strings.

This sort of listen is, in many ways, like listening to the entire library on shuffle, because, as Jared Spool used to point out (and probably still does), alphabetically sorting a long list of things is indistinguishable from having it randomized. For me, the main difference between alphabetical and random is that it’s a lot easier to pick back up where you left off when working through alphabetically. (Yes, Music Dot App should do that automatically, but sometimes it forgets where it was.) You can also be a lot more certain that every song gets a listen, something that’s harder to ensure if you’re listening to a random shuffle of a couple thousand tracks and your software loses its place.

There are other advantages: sometimes, artists will use the same song title, and you get interesting combinations. For example, there was “America”, which gave me a song by K’naan and then a same-titled, but very different, song by Spinal Tap. Similarly, there are titular combinations that pop out, like “Come On” by The Goo Goo Dolls, “Come On In, The Dreams Are Fine” by Dee-Lite, and “Come On Over” by Elana Stone.

Some of these combinations groove, some delight, some earn the stank face, and some make me literally laugh out loud. And some aren’t related by title but still go together really, really well. A recent example was the segue from The Prodigy’s “Narayan” to Radiohead’s “The National Anthem”, which sonically flowed just right at the switchover, almost like they’d been composed to have that effect. It made this old long-ago radio DJ smile.

I say I took inspiration from Kevin because my listen has a couple of differences to his:

Kevin has a “no skips, ever” rule, but I will skip songs that are repeats. This happens a lot when you have both live and studio albums, as I do for a few artists (particularly Rush), or have copied tracks for lightly-engineered playlists, as I have a few times. That said, if I have a song by one artist and a cover of that song by another, I don’t skip either of them. For remixes or alternate recordings of a song by the same artist, I generally don’t skip, unless the remix is just the original song with a vaguely different beat track.
I filtered out most of my classical content before starting. This is not because I dislike classical, but because they tend to sort together in unrelenting clumps  — all of Beethoven’s and Mozart’s symphonies one after another after another, for example  — and I wanted a varietal mix. I did keep “classical” albums like Carreras Domingo Pavarotti in Concert and Carmina Burana because they have normal-length tracks with titles that scatter them throughout the sort. The same reasoning was used to retain classic film and TV scores, even if I was stretching it a bit to leave in The Music of Cosmos (the 1980 one), which prefixes all its tracks with Roman numerals… but each track is a medley, so it got a pass. The whole-album-in-a-single-MP3 The Music of Osmos, on the other hand, did not.

All that said, I have a much shorter road than Kevin: he has a library of over twelve thousand tracks, whereas my slightly-filtered library is just shy of 2,500 tracks, or right around 160 hours. The repeated-song skips knock the total time down a bit, probably by a few hours but not much more than that. So, figure at an average of 80 minutes per week, that’s about 120 weeks, or two years and four months to get from beginning to end.

And what will I do when I reach the end? Probably go back to better curate the sorting (e.g., configuring Soundgarden’s “4th of July” to be sorted as “Fourth of July”), create a playlist that cuts out the repeats ahead of time, and start over. But we’ll see when I get there. Maybe next time I’ll listen to it in reverse alphabetical order instead.

From ABC’s to 9999999 was published on Monday, April 10th, 2023.
It was assigned to the Personal category.
There has been one reply.

Ventura Vexations

Published 2 years, 2 months past

I’ve been a bit over a month now on my new 14” MacBook Pro, and I have complaints. Not about the hardware, which is solid yet lightweight, super-quiet yet incredibly fast and powerful, long-lived on battery, and decent enough under the fingertips. Plus, all the keyboard keys Just Work™, unlike the MBP it replaced! So that’s nice.

No, my complaints are entirely about the user environment. At first I thought this was because I skipped directly from OS X 10.14 to macOS 13, and simply wasn’t used to How The Kids Do Things These Days®, but apparently I would’ve felt the same even if I’d kept current with OS updates. So I’m going to gripe here in hopes someone who knows more than me will have recommendations to ameliorate my annoyance.

DragThing Dismay

This isn’t on Apple, but still, it’s a huge loss for me. I know I already complained about the lack of DragThing, but I really, really do miss what it did for me. You never know what you’ve got ’til it’s gone, right? But let me be clear about exactly what it did for me, which so far as I can tell no macOS application does, nor does macOS itself.

The way I used DragThing was to have a long shelf down the right side of my monitor containing small-but-recognizable icons representing my most-used folders (home directory, Downloads, Documents, Applications, a few other folders) and a number of applications. It stayed there all the time, and the icons were always there whether or not the application was running.

When I launched, say, Firefox, then there would be a little indicator next to its application icon in DragThing to indicate it was running. When I quit Firefox, the indicator went away but the Firefox icon stayed. And also, if I launched an application that wasn’t in the DragThing shelf, it did not add an icon for that application to the shelf. (I used the Dock at the bottom of the screen to show me that.)

There are super-powered application switchers available for macOS, but as far as I’ve seen, they only list the applications actually running. Launch an application, its icon is added. Quit an application, its icon disappears. None of these switchers let me keep persistent static one-click shortcuts to launch a variety of applications and open commonly-used folders.

Dock Folder Disgruntlement

Now I’m on to macOS itself. Given the previous problem, the Dock is the only thing available to me, and I have gripes about it. One of the bigger ones is rooted in folders kept on the Dock, to the right of the bar that divides them from the application icons. When I click on them, I get a popup (wince) or a Stack (shudder) instead of them just opening the target folder in the Finder.

In the Before Times, I could create an alias to the folder and drop that in the Dock, the icon in the Dock would look like the target folder, and clicking on the alias opened the folder’s window. If I do that now, the click-to-open part works, but the aliases all look like blank text documents with tiny arrows. What the hell?

If I instead add actual folders (not aliases) to the Dock, holding down ⌥⌘ (option-command) when I click them does exactly what I want. Only, I don’t want to have to hold down modifier keys, especially when using the trackpad. I’ve mostly adapted to the key combo, but even on desktop I still sometimes click a folder and blink in irritation at the popup thingy for a second before remembering that things are stupider now.

Translucency Tribulation

The other problem with the Dock is that mine is too opaque. That’s because the nearly-transparent Finder menu bar was really not doing it for me, so acting on a helpful tip, I went and checked the “Reduce Transparency” option in the Accessibility settings. That fixed the menu bar nicely, but it also made the Dock opaque, which I didn’t actually want. I can pretty easily live with it, but I do wish I could make just the menu bar opaque (without having to resort to desktop wallpaper hacks, which I suspect do not do well with changes of display resolution).

Shortcut Stupidity

And while I’m on the subject of the menu bar: no matter the application or even the Finder itself, dropdown menus from the menu bar render the actions you can do in black and the actions you can’t do in washed-out gray. Cool. But also, all the keyboard shortcuts are now a washed-out gray, which I keep instinctively thinking means they’ve been disabled or something. They’re also a lot more difficult for my older eyes to pick out, and I have to flick my eyes back and forth to make sure a given keyboard shortcut corresponds to a thing I actually can do. Seriously, Apple, what the hell?

Trash Can Troubles

I used to have the Trash can on the desktop, down in the lower right corner, and now I guess I can’t. I vaguely recall this is something DragThing made possible, so maybe that’s another reason to gripe about the lack of it, but it’s still bananas to me that the Trash can is not there by default. I understand that I may be very old.

Preview Problems

On my old machine, Preview was probably the most rock-solid application on there. On the new machine, Preview occasionally hangs on closing heavily-commented PDFs when I choose not to save changes. I can force-quit it and so far haven’t experienced any data corruption, but it’s still annoying.

Those are the things that have stood out the most to me about Ventura. How about you? What bothers you about your operating system (whichever one that is) and how would you like to see it fixed?

Oh, and I’ll follow this up soon with a post about what I like in Ventura, because it’s not all frowns and grumbles.

Ventura Vexations was published on Tuesday, April 4th, 2023.
It was assigned to the Commentary, Mac, and Rants categories.
There have been five replies.

Echoed Whisper

Published 2 years, 3 months past

The two videos I was using Whisper on have been published, so you can see for yourself how the captioning worked out. Designed as trade-show booth reel pieces, they’re below three minutes each, so watching both should take less than ten minutes, even with pauses to scrutinize specific bits of captioning.

As I noted in my previous post about this, I only had to make one text correction to the second video, plus a quick find-and-replace to turn “WPE WebKit” into “WPEWebKit”. For the first video, I did make a couple of edits beyond fixing transcription errors; specifically, I added the dashes and line breaking in this part of the final SubRip Subtitle (SRT) file uploaded to YouTube:

00:00:25,000 --> 00:00:32,000
- Hey tell me, is Michael coming out?
- Affirmative, Mike's coming out.

This small snippet actually embodies the two things where Whisper falls down a bit: multiple voices, and caption line lengths.

Right now, Whisper doesn’t even try to distinguish between different voices, the technical term for which is “speaker diarisation”. This means Whisper ideal for transcribing, say, a conference talk or a single-narrator video. It’s lot less useful for things like podcasts, because while it will probably get (nearly) all the words right, it won’t even throw in a marker that the voice changed, let alone try to tell which bits belong to a given voice. You have to go into the output and add those yourself, which for an hourlong podcast could be… quite the task.

There are requests for adding this to Whisper scattered in their GitHub discussions, but I didn’t see any open pull requests or mention of it in the README, so I don’t know if that’s coming or not. If you do, please leave a comment!

As for the length of captions, I agree with J David Eisenberg: Whisper too frequently errs on the side of “too long”. For example, here’s one of the bits Whisper output:

00:01:45,000 --> 00:01:56,000
Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME, all fluidly.

That’s eleven seconds of static subtitling, with 143 characters of line length. The BBC recommends line lengths at or below 37 characters, and Netflix suggests a limit of 42 characters, with actual hard limits for a few languages. You can throw in line breaks to reduce line length, but should never have more than three lines, which wouldn’t be possible with 143 characters. But let’s be real, that 11-second caption really should be split in twain, at the absolute minimum.

Whisper does not, as of yet, have a way to request limiting caption lengths, either in time or in text. There is a fairly detailed discussion of this over on Whisper’s repository, with some code graciously shared by people working to address this, but it would be a lot better if Whisper accepted an argument to limit the length of any given bit of output. And also if it threw in line breaks on its own, say around 40 characters in English, even when not requested.

The last thing I’d like to see improved is speed. It’s not terribly slow as is, to be clear. Using the default model size (small), which is what I used for the videos I wrote about, Whisper worked at about 2:1 speed: a two-minute video took about a minute to process. I tried the next size up, the medium model, and it worked at roughly 1:1.5 speed, taking about an hour fifteen to process a 46-minute video.

The thing is, all that is running solely on the CPU, which in my case is a 12-core M2. According to this pull request, problems in one of Whisper’s dependencies, PyTorch, means GPU utilization is essentially unavailable on the hardware I have. (Thanks to Chris Adams for the pointer.) I expect that will be cleared up sooner or later, so the limitation feels minor.

Overall, it’s a powerful tool, with accuracy I still find astounding, only coming up short in quality-of-life features that aren’t critical in some applications (transcribing a talk) or relatively easily worked around in others (hand-correcting caption length in short videos; using a small script to insert line breaks in longer videos). The lack of speaker diarisation is the real letdown for me, and definitely the hardest to work around, so I hope it gets addressed soon.

Echoed Whisper was published on Friday, March 31st, 2023.
It was assigned to the Tools category.
There have been two replies.

Peerless Whisper

Published 2 years, 3 months past

What happened was, I was hanging out in an online chatter channel when a little birdy named Bruce chirped about OpenAI’s Whisper and how he was using it to transcribe audio. And I thought, Hey, I have audio that needs to be transcribed. Brucie Bird also mentioned it would output text, SRT, and WebVTT formats, and I thought, Hey, I have videos I’ll need to upload with transcription to YouTube! And then he said you could run it from the command line, and I thought, Hey, I have a command line!

So off I went to install it and try it out, and immediately ran smack into some hurdles I thought I’d document here in case someone else has similar problems. All of this took place on my M2 MacBook Pro, though I believe most of the below should be relevant to anyone trying to do this at the command line.

The first thing I did was what the GitHub repository’s README recommended, which is:

$ pip install -U openai-whisper

That failed because I didn’t have pip installed. Okay, fair enough. I figured out how to install that, setting up an alias of python for python3 along the way, and then tried again. This time, the install started and then bombed out:

Collecting openai-whisper
  Using cached openai-whisper-20230314.tar.gz (792 kB)
  Installing build dependencies ...  done
  Getting requirements to build wheel ...  done
  Preparing metadata (pyproject.toml) ...  done
Collecting numba
  Using cached numba-0.56.4.tar.gz (2.4 MB)
  Preparing metadata (setup.py) ...  error
  error: subprocess-exited-with-error

…followed by some stack trace stuff, none of which was really useful until ten or so lines down, where I found:

RuntimeError: Cannot install on Python version 3.11.2; only versions >=3.7,<3.11 are supported.

In other words, the version of Python I have installed is too modern to run AI. What a world.

I DuckDucked around a bit and hit upon pyenv, which is I guess a way of installing and running older versions of Python without having to overwrite whatever version(s) you already have. I’ll skip over the error part of my trial-and-error process and give you the commands that made it all work:

$ brew install pyenv

$ pyenv install 3.10

$ PATH="~/.pyenv/shims:${PATH}"

$ pyenv local 3.10

$ pip install -U openai-whisper

That got Whisper to install. It didn’t take very long.

At that point, I wondered what I’d have to configure to transcribe something, and the answer turned out to be precisely zilch. Once the install was done, I dropped into the directory containing my MP4 video, and typed this:

$ whisper wpe-mse-eme-v2.mp4

Here’s what I got back. I’ve marked the very few errors.

[00:00.000 --> 00:07.000]  In this video, we'll show you several demos showcasing multi-media capabilities in WPE WebKit,
[00:07.000 --> 00:11.000]  the official port of the WebKit engine for embedded devices.
[00:11.000 --> 00:18.000]  Each of these demos are running on the low-powered Raspberry Pi 3 seen in the lower right-hand side of the screen here.
[00:18.000 --> 00:25.000]  Infotainment systems and media players often need to consume digital rights-managed videos.
[00:25.000 --> 00:32.000]  They tell me, is Michael coming out?  Affirmative, Mike's coming out.
[00:32.000 --> 00:45.000]  Here you can see just that, smooth streaming playback using encrypted media extensions, or EME, with PlayReady 4.
[00:45.000 --> 00:52.000]  Media source extensions, or MSE, are used by many players for greater control over playback.
[00:52.000 --> 01:00.000]  YouTube TV has a whole conformance test suite for this, which WPE has been passing since 2021.
[01:00.000 --> 01:09.000]  The loan exceptions here are those tests requiring hardware support not available on the Raspberry Pi 4, but available for other platforms.
[01:09.000 --> 01:16.000]  YouTube TV has a conformance test for EME, which WPE WebKit passes with flying colors.
[01:22.000 --> 01:40.000]  Music
[01:40.000 --> 01:45.000]  Finally, perhaps most impressively, we can put all these things together.
[01:45.000 --> 01:56.000]  Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME all fluidly.
[01:56.000 --> 02:04.000]  Music
[02:04.000 --> 02:09.000]  Remember, all of this is being played back on the same low-powered Raspberry Pi 3.
[02:27.000 --> 02:34.000]  For more about WPE WebKit, please visit WPE WebKit.com.
[02:34.000 --> 02:42.000]  For more information about EGALIA, or to find out how we can help with your embedded device needs, please visit us at EGALIA.com.

I am, frankly, astonished. This has no business being as accurate as it is, for all kinds of reasons. There’s a lot of jargon and very specific terminology in there, and Whisper nailed pretty much every last bit of it, first time in, no special configuration, nothing. I didn’t even bump up the model size from the default of small. I felt a little like that Froyo guy in the animated Hunchback of Notre Dame meme yelling about sorcery or whatever.

True, the output isn’t absolutely perfect. Let’s review the glitches in reverse order. The last two errors, turning “Igalia” into “EGALIA”, seems fair enough given I didn’t specify that there would be languages other than English involved. I routinely have to spell it for my fellow Americans, so no reason to think a codebase could do any better.

The space inserted into “WPEWebKit” (which happens throughout) is similarly understandable. I’m impressed it understood “WebKit” at all, never mind that it was properly capitalized and not-spaced.

The place where it says Music and I marked it as an error: This is essentially an echoing countdown and then a white-noise roar from rocket engines. There’s a “music today is just noise” joke in here somewhere, but I’m too hip to find it.

Whisper turning “lone” into “loan” doesn’t particularly faze me, given the difficulty of handling soundalike words. Hell, just yesterday, I was scribing a conference call and mistakenly recorded “gamut” as “gamma”, and those aren’t even technically homophones. They just sound like they are.

Rounding out the glitch tour, “Hey” got turned into “They”, which (given the audio quality of that particular part of the video) is still pretty good.

There is one other error I couldn’t mark because there’s nothing to mark, but if you scrutinize the timeline, you’ll see a gap from 02:09.000 and 02:27.000. In there, a short clip from a movie plays, and there’s a brief dialogue between two characters in not-very-Dutch-accented English there. It’s definitely louder and more clear than the 00:25.000 –> 00:32.000 bit, so I’m not sure why Whisper just skipped over it. Manually transcribing that part isn’t a big deal, but it’s odd to see it perform so flawlessly on every other piece of speech and then drop this completely on the floor.

Before posting, I decided to give Whisper another go, this time on a different video:

$ whisper wpe-gamepad-support-v3.mp4

This was the result, with the one actual error marked:

[00:00.000 --> 00:13.760]  In this video, we demonstrate WPE WebKit's support for the W3C's GamePad API.
[00:13.760 --> 00:20.080]  Here we're running WPE WebKit on a Raspberry Pi 4, but any device that will run WPE WebKit
[00:20.080 --> 00:22.960]  can benefit from this support.
[00:22.960 --> 00:28.560]  The GamePad API provides a JavaScript interface that makes it possible for developers to access
[00:28.560 --> 00:35.600]  and respond to signals from GamePads and other game controllers in a simple, consistent way.
[00:35.600 --> 00:40.320]  Having connected a standard Xbox controller, we boot up the Raspberry Pi with a customized
[00:40.320 --> 00:43.040]  build route image.
[00:43.040 --> 00:48.560]  Once the device is booted, we run cog, which is a small, single window launcher made specifically
[00:48.560 --> 00:51.080]  for WPE WebKit.
[00:51.080 --> 00:57.360]  The window cog creates can be full screen, which is what we're doing here.
[00:57.360 --> 01:01.800]  The game is loaded from a website that hosts a version of the classic video arcade game
[01:01.800 --> 01:05.480]  Asteroids.
[01:05.480 --> 01:11.240]  Once the game has loaded, the Xbox controller is used to start the game and control the spaceship.
[01:11.240 --> 01:17.040]  All the GamePad inputs are handled by the JavaScript GamePad API.
[01:17.040 --> 01:22.560]  This GamePad support is now possible thanks to work done by Igalia in 2022 and is available
[01:22.560 --> 01:27.160]  to anyone who uses WPE WebKit on their embedded device.
[01:27.160 --> 01:32.000]  For more about WPE WebKit, please visit wpewebkit.com.
[01:32.000 --> 01:35.840]  For more information about Igalia, or to find out how we can help with your embedded device
[01:35.840 --> 01:39.000]  needs, please visit us at Igalia.com.

That should have been “buildroot”. Again, an entirely reasonable error. I’ve made at least an order of magnitude more typos writing this post than Whisper has in transcribing these videos. And this time, it got the spelling of Igalia correct. I didn’t make any changes between the two runs. It just… figured it out.

I don’t have a lot to say about this other than, wow. Just WOW. This is some real Clarke’s Third Law stuff right here, and the technovertigo is Marianas deep.

Peerless Whisper was published on Thursday, March 23rd, 2023.
It was assigned to the Technovertigo, Today I Learned, and Tools categories.
There have been five replies.

A Leap of Decades

Published 2 years, 4 months past

I’ve heard it said there are two kinds of tech power users: the ones who constantly update to stay on the bleeding edge, and the ones who update only when absolutely forced to do so. I’m in the latter camp. If a program, setup, or piece of hardware works for me, I stick by it like it’s the last raft off a sinking island.

And so it has been for my early 2013 MacBook Pro, which has served me incredibly well across all those years and many continents, but was sliding into the software update chasm: some applications, and for that matter its operating system, could no longer be run on its hardware. Oh and also, the top row of letter keys was becoming unresponsive, in particular the E-R-T sequence. Which I kind of need if I’m going to be writing English text, never mind reloading pages and opening new browser tabs.

Stepping Up

An early 2013 MacBook Pro sitting on a desk next to the box of an early 2023 MacBook Pro, the latter illuminated by shafts of sunlight. — The grizzled old veteran on the verge of retirement and the fresh new recruit that just transferred in to replace them.

So on Monday, I dropped by the Apple Store and picked up a custom-built early 2023 MacBook Pro: M2 Max with 38 GPU cores, 64GB RAM, and 2TB SSD. (Thus quadrupling the active memory and nearly trebling the storage capacity of its predecessor.) I went with that balance, or perhaps imbalance, because I intend to have this machine last me another ten years, and in that time, RAM is more likely to be in demand than SSD. If I’m wrong about that, I can always plug in an external SSD. Many thanks to the many people in my Mastodon herd who nudged me in that direction.

I chose the 14” model over the 16”, so it is a wee bit smaller than my old 15” workhorse. The thing that surprises me is the new machine looks boxier, somehow. Probably it’s that the corners of the case are not nearly as rounded as the 2013 model, and I think the thickness ratio of display to body is closer to 1:1 than before. It isn’t a problem or anything, it’s just a thing that I notice. I’ll probably forget about it soon enough.

Some things I find mildly-to-moderately annoying:

DragThing doesn’t work any more. It had stopped being updated before the 64-bit revolution, never mind the shift to Apple silicon, so this was expected, but wow do I miss it. Like a stick-shift driver uselessly stomping the floorboards and blindly grasping air while driving an automatic car, I still flip the mouse pointer toward the right edge of the screen, where I kept my DragThing dock, before remembering it’s gone. I’ve looked at alternatives, but none of them seem like they’re meant as straight up replacements, so I’ve yet to commit to one. Maybe some day I’ll ask Daniel to teach me Swift to I can build my own. (Because I definitely need more demands on my time.)
The twisty arrows in the Finder to open and close folders don’t have enough visual weight. Really, the overall UI feels like a movie’s toy representation of an operating system, not an actual operating system. I mean, the visual presentation of the OS looks like something I would create, and brother, that is not a compliment.
The Finder’s menu bar has no visually distinct background. What the hell. No, seriously, what the hell? The Notch I’m actually okay with, but removing the distinction between the active area of the menu bar and the inert rest of the desktop seems… ill-advised. Do not like. HARK, A FIX: Cory Birdsong pointed me to “System Settings… > Accessibility > Display > Reduce Transparency”, which fixes this, over on Mastodon. Thanks, Cory!
I’m not used to the system default font(s) yet, which I imagine will come with time, but still catches me here and there.
The alert and other systems sounds are different, and I don’t like them. Sosumi.

Oh, and it’s weird to me that the Apple logo on the back of the display doesn’t glow. Not annoying, just weird.

Otherwise, I’m happy with it so far. Great display, great battery life, and the keyboard works!

Getting Migratory

The 2013 MBP was backed up nightly to a 1TB Samsung SSD, so that was how I managed the migration: plugged the SSD into the new MBP and let Migration Assistant do its thing. This got me 90% of the way there, really. The remaining 10% is what I’ll talk about in a bit, in case anyone else finds themselves in a similar situation.

The only major hardware hurdle I hit was that my Dell U2713HM monitor, also of mid-2010s vintage, seems to limit HDMI signals to 1920×1080 despite supposedly supporting HDMI 1.4, which caught me by surprise. When connected to a machine via DisplayPort, even my 2013 MBP, the Dell will go up to 2560×1440. The new MBP only has one HDMI port and three USB-C ports. Fortunately, the USB-C ports can be used as DisplayPorts, so I acquired a DisplayPort–to–USB-C cable and that fixed the situation right up.

Yes, I could upgrade to a monitor that supports USB-C directly, but the Dell is a good size for my work environment, it still looks pretty good, and did I mention I’m the cling-tightly-to-what-works kind of user?

Otherwise, in the hardware space, I’ll have to figure out how I want to manage connecting all the USB-A devices I have (podcasting microphone, wireless headset, desktop speaker, secondary HD camera, etc., etc.) to the USB-C ports. I expected that to be the case, just as I expected some applications would no longer work. I expect an adapter cable or two will be necessary, at least for a while.

Trouble Brewing

I said earlier that Migration Assistant got me 90% of the way to being switched over. Were I someone who doesn’t install stuff via the Terminal, I suspect it would have been 100% successful, but I’m not, so it wasn’t. As with the cables, I anticipated this would happen. What I didn’t expect was that covering that last 10% would take me only an hour or so of actual work, most of it spent waiting on downloads and installs.

First, the serious and quite unexpected problem: my version of Homebrew used an old installation prefix, one that could break newer packages. So, I needed to migrate Homebrew itself from /usr/local to /opt/homebrew. Some searching around indicated that the best way to do this was uninstall Homebrew entirely, then install it fresh.

Okay, except that would also remove everything I’d installed with Homebrew. Which was maybe not as much as some of y’all, but it was still a fair number of fairly essential packages. When I ran brew list, I got over a hundred packages, of which most were dependencies. What I found through further searching was that brew leaves returns a list of the packages I’d installed, without their dependencies. Here’s what I got:

automake
bash
bison
chruby
ckan
cmake
composer
ffmpeg
gh
git
git-lfs
httpd
imagemagick
libksba
lynx
minetest
minimal-racket
pandoc
php
php@7.2
python@3.10
ruby
ruby-install
wget
yarn

That felt a lot more manageable. After a bit more research, boiled down to its essentials, the New Brew Shuffle I came up with was:


$ brew leaves > brewlist.txt

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/uninstall.sh)"

$ xcode-select --install

$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

$ xargs brew install < brewlist.txt

The above does elide a few things. In step two, the Homebrew uninstall script identified a bunch of directories that it couldn’t remove, and would have to be deleted manually. I saved all that to a text file (thanks to Warp’s “Copy output” feature) for later study, and pressed onward. I probably also had to sudo some of those steps; I no longer remember.

In addition to all the above, I elected to delete a few of the packages in brewlist.txt before I fed it back to brew install in the last step — things like ckan, left over from my Kerbal Space Program days  — and to remove the version dependencies for PHP and Python. Overall, the process was pretty smooth. I just had to sit there and watch Homrebrew chew through all the installs, including all the dependencies.

Cleanup

Once all the reinstalls from the last step had finished, I was left with a few things to clean up. For example, Python didn’t seem to have installed. Eventually I realized it had actually installed as python3 instead of just plain python, so that was mostly fine and I’m sure there’s a way to alias one to the other that I might get around to looking up one day.

Ruby also didn’t seem to reinstall cleanly: there was a library it was looking for that complained about the chip architecture, and attempts to overcome that spawned even more errors, and none of them were decipherable to me or my searches. Multiple attempts at uninstalling and then reinstalling Ruby through a variety of means, some with Homebrew, some other ways, either got me the same indecipherable erros or a whole new set of indecipherable errors. In the end, I just uninstalled Ruby, as I don’t actually use it for anything I’m aware of, and the default Ruby that comes with macOS is still there. If I run into some script I need for work that requires something more, I’ll revisit this, probably with many muttered imprecations.

Finally, httpd wasn’t working as intended. I could launch it with brew services httpd start, but the resulting server was pointing to a page that just said “It works!”, and not bringing up any of my local hosts. Eventually, I found where Homebrew had stuffed httpd and its various files, and then replaced its configuration files with my old configuration files. Then I went through the cycle of typing sudo apachectl start, addressing the errors it threw over directories or PHP installs or whatever by editing httpd.conf, and then trying again.

After only three or four rounds of that, everything was up and running as intended  — and as a bonus, I was able to mark httpd as a Login item in the Finder’s System Settings, so it will automatically come back up whenever I reboot! Which my old machine wouldn’t do, for some reason I never got around to figuring out.

Now I just need to decide what to call this thing. The old MBP was “CoCo”, as in the TRS-80 Color Computer, meant as a wry commentary on the feel of the keyboard and a callback to the first home computer I ever used. That joke still works, but I’m thinking the new machine will be “C64” in honor of the first actually powerful home computer I ever used and its 64 kilobytes of RAM. There’s a pleasing echo between that and the 64 gigabytes of RAM I now have at my literal fingertips, four decades later.

Now that I’m up to date on hardware and operating system, I’d be interested to hear what y’all recommend for good quality-of-life improvement applications or configuration changes. Link me up!

A Leap of Decades was published on Thursday, February 23rd, 2023.
It was assigned to the Guide and Tech categories.
There have been twelve replies.

Browse the Archive

Earlier Entries

Later Entries