Thoughts From Eric Archive

Nuclear Anchored Sidenotes

Published 1 year, 9 months past

Exactly one year ago today, which I swear is a coincidence I only noticed as I prepared to publish this, I posted an article on how I coded the footnotes for The Effects of Nuclear Weapons. In that piece, I mentioned that the footnotes I ended up using weren’t what I had hoped to create when the project first started. As I said in the original post:

Originally I had thought about putting footnotes off to one side in desktop views, such as in the right-hand grid gutter. After playing with some rough prototypes, I realized this wasn’t going to go the way I wanted it to…

I came back to this in my post “CSS Wish List 2023”, when I talked about anchor(ed) positioning. The ideal, which wasn’t really possible a year ago without a bunch of scripting, was to have the footnotes arranged structurally as endnotes, which we did, but in a way that I could place the notes as sidenotes, next to the footnote reference, when there was enough space to show them.

As it happens, that’s still not really possible without a lot of scripting today, unless you have:

A recent (as of late 2023) version of Chrome
With the “Experimental web features” flag enabled

With those things in place, you get experimental support for CSS anchor positioning, which lets you absolutely position an element in relation to any other element, anywhere in the DOM, essentially regardless of their markup relationship to each other, as long as they conform to a short set of constraints related to their containing blocks. You could reveal an embedded stylesheet and then position it next to the bit of markup it styles!

Anchoring Sidenotes

More relevantly to The Effects of Nuclear Weapons, I can enhance the desktop browsing experience by turning the popup footnotes into Tufte-style static sidenotes. So, for example, I can style the list items that contain the footnotes like this:

.endnotes li {
	position: absolute;
	top: anchor(top);
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;
}

A sidenote next to the main text column, with its number aligned with the referencing number found in the main text column.

Let me break that down. The position is absolute, and bottom is set to auto to override a previous bit of styling that’s needed in cases where a footnote isn’t being anchored. I also decided to restrain the maximum width of a sidenote to 23em, for no other reason than it looked right to me.

(A brief side note, pun absolutely intended: I’m using the physical-direction property top because the logical-direction equivalent in this context, inset-block-start, only gained full desktop cross-browser support a couple of years ago, and that’s only true if you ignore IE11’s existence, plus it arrived in several mobile browsers only this year, and I still fret about those kinds of things. Since this is desktop-centric styling, I should probably set a calendar reminder to fix these at some point in the future. Anyway, see MDN’s entry for more.)

Now for the new and unfamiliar parts.

 top: anchor(top);

This sets the position of the top edge of the list item to be aligned with the top edge of its anchor’s box. What is a footnote’s anchor? It’s the corresponding superscripted footnote mark embedded in the text. How does the CSS know that? Well, the way I set things up  — and this is not the only option for defining an anchor, but it’s the option that worked in this use case  — the anchor is defined in the markup itself. Here’s what a footnote mark and its associated footnote look like, markup-wise.

explosion,<sup><a href="#fnote01" id="fn01">1</a></sup> although

<li id="fnote01" anchor="fn01"><sup>1</sup> … </li>

The important bits for anchor positioning are the id="fn01" on the superscripted link, and the anchor="fn01" on the list item: the latter establishes the element with an id of fn01 as the anchor for the list item. Any element can have an anchor attribute, thus creating what the CSS Anchor Positioning specification calls an implicit anchor. It’s explicit in the HTML, yes, but that makes it implicit to CSS, I guess. There’s even an implicit keyword, so I could have written this in my CSS instead:

 top: anchor(implicit top);

(There are ways to mark an element as an anchor and associate other elements with that anchor, without the need for any HTML. You don’t even need to have IDs in the HTML. I’ll get to that in a bit.)

Note that the superscripted link and the list item are just barely related, structurally speaking. Their closest ancestor element is the page’s single <main> element, which is the link’s fourth-great-grandparent, and the list item’s third-great-grandparent. That’s okay! Much as a <label> can be associated with an input element across DOM structures via its for attribute, any element can be associated with an anchoring element via its anchor attribute. In both cases, the value is an ID.

So anyway, that means the top edge of the endnote will be absolutely positioned to line up with the top edge of its anchor. Had I wanted the top of the endnote to line up with the bottom edge of the anchor, I would have said:

 top: anchor(bottom);

But I didn’t. With the top edges aligned, I now needed to drop the endnote into the space outside the main content column, off to its right. At first, I did it like this:

 left: anchor(--main right);

Wait. Before you think you can just automatically use HTML element names as anchor references, well, you can’t. That --main is what CSS calls a dashed-ident, as in a dashed identifier, and I declared it elsewhere in my CSS. To wit:

main {
	anchor-name: --main;
}

That assigns the anchor name --main to the <main> element in the CSS, no HTML attributes required. Using the name --main to identify the <main> element was me following the common practice of naming things for what they are. I could have called it --mainElement or --elMain or --main-column or --content or --josephine or --📕😉 or whatever I wanted. It made the most sense to me to call it --main, so that’s what I picked.

Having done that, I can use the edges of the <main> element as positioning referents for any absolutely (or fixed) positioned element. Since I wanted the left side of sidenotes to be placed with respect to the right edge of the <main>, I set their left to be anchor(--main right).

Thus, taking these two declarations together, the top edge of a sidenote is positioned with respect to the top edge of its implicit anchor, and its left edge is positioned with respect to the right edge of the anchor named --main.

	top: anchor(top);
	left: anchor(--main right);

Yes, I’m anchoring the sidenotes with respect to two completely different anchors, one of which is a descendant of the other. That’s okay! You can do that! Literally, you could position each edge of an anchored element to a separate anchor, regardless of how they relate to each other structurally.

Once I previewed the result of those declarations, I saw I the sidenotes were too close to the main content, which makes sense: I had made the edges adjacent to each other.

Red borders showing the edges of the sidenote and the main column touching.

I thought about using a left margin on the sidenotes to push them over, and that would work fine, but I figured what the heck, CSS has calculation functions and anchor functions can go inside them, and any engine supporting anchor positioning will also support calc(), so why not? Thus:

 left: calc(anchor(--main right) + 0.5em);

I wrapped those in a media query that only turned the footnotes into sidenotes at or above a certain viewport width, and wrapped that in a feature query so as to keep the styles away from non-anchor-position-understanding browsers, and I had the solution I’d envisioned at the beginning of the project!

Except I didn’t.

Fixing Proximate Overlap

What I’d done was fine as long as the footnotes were well separated. Remember, these are absolutely positioned elements, so they’re out of the document flow. Since we still don’t have CSS Exclusions, there needs to be a way to deal with situations where there are two footnotes close to each other. Without it, you get this sort of thing.

Two sidenotes completely overlapping with each other. This will not do.

I couldn’t figure out how to fix this problem, so I did what you do these days, which is I posted my problem to social media. Pretty quickly, I got a reply from the brilliant Roman Komarov, pointing me at a Codepen that showed how to do what I needed, plus some very cool highlighting techniques. I forked it so I could strip it down to the essentials, which is all I really needed for my use case, and also have some hope of understanding it.

Once I’d worked through it all and applied the results to TEoNW, I got exactly what I was after.

The same two sidenotes, except now there is no overlap.

But how? It goes like this:

.endnotes li {
	position: absolute;
	anchor-name: --sidenote;
	top: max(anchor(top) , calc(anchor(--sidenote bottom) + 0.67em));
	bottom: auto;
	left: calc(anchor(--main right) + 0.5em);
	max-width: 23em;
}

Whoa. That’s a lot of functions working together there in the top value. (CSS is becoming more and more functional, which I feel some kind of way about.) It can all be verbalized as, “the position of the top edge of the list item is either the same as the top edge of its anchor, or two-thirds of an em below the bottom edge of the previous sidenote, whichever is further down”.

The browser knows how to do this because the list items have all been given an anchor-name of --sidenote (again, that could be anything, I just picked what made sense to me). That means every one of the endnote list items will have that anchor name, and other things can be positioned against them.

Those styles mean that I have multiple elements bearing the same anchor name, though. When any sidenote is positioned with respect to that anchor name, it has to pick just one of the anchors. The specification says the named anchor that occurs most recently before the thing you’re positioning is what wins. Given my setup, this means an anchored sidenote will use the previous sidenote as the anchor for its top edge.

At least, it will use the previous sidenote as its anchor if the bottom of the previous sidenote (plus two-thirds of an em) is lower than the top edge of its implicit anchor. In a sense, every sidenote’s top edge has two anchors, and the max() function picks which one is actually used in every case.

CSS, man.

Remember that all this is experimental, and the specification (and thus how anchor positioning works) could change. The best practices for accessibility are also not clear yet, from what I’ve been able to find. As such, this may not be something you want to deploy in production, even as a progressive enhancement. I’m holding off myself for the time being, which means none of the above is currently used in the published version of The Effects of Nuclear Weapons. If people are interested, I can create a Codepen to illustrate.

I do know this is something the CSS Working Group is working on pretty hard right now, so I have hopes that things will finalize soon and support will spread.

My thanks to Roman Komarov for his review of and feedback on this article. For more use cases of anchor positioning, see his lengthy (and quite lovely) article “Future CSS: Anchor Positioning”.

Nuclear Anchored Sidenotes was published on Tuesday, September 12th, 2023.
It was assigned to the CSS category.
There have been eight replies.

Memories of Molly

Published 1 year, 10 months past

The Web is a little bit darker today, a fair bit poorer: Molly Holzschlag is dead. She lived hard, but I hope she died easy. I am more sparing than most with my use of the word “friend”, and she was absolutely one. To everyone.

If you don’t know her name, I’m sorry. Too many didn’t. She was one of the first web gurus, a title she adamantly rejected  — “We’re all just people, people!”  — but it fit nevertheless. She was a groundbreaker, expanding and explaining the Web at its infancy. So many people, on hearing the mournful news, have described her as a force of nature, and that’s a title she would have accepted with pride. She was raucous, rambunctious, open-hearted, never ever close-mouthed, blazing with fire, and laughed (as she did everything) with her entire chest, constantly. She was giving and took and she hurt and she wanted to heal everyone, all the time. She was messily imperfect, would tell you so loudly and repeatedly, and gonzo in all the senses of that word. Hunter S. Thompson should have written her obituary.

I could tell so many stories. The time we were waiting to check into a hotel, talking about who knows what, and realized Little Richard was a few spots ahead of us in line. Once he’d finished checking in, Molly walked right over to introduce herself and spend a few minutes talking with him. An evening a group of us had dinner one the top floor of a building in Chiba City and I got the unexpectedly fresh shrimp hibachi. The time she and I were chatting online about a talk or training gig, somehow got onto the subject of Nick Drake, and coordinated a playing of “ Three Hours” just to savor it together. A night in San Francisco where the two of us went out for dinner before some conference or other, stopped at a bar just off Union Square so she could have a couple of drinks, and she got propositioned by the impressively drunk couple seated next to her after they’d failed to talk the two of us into hooking up. The bartender couldn’t stop laughing.

At SXSW 2005 with Dave Shea, her co-author on The Zen of CSS, and wearing an XFN shirt.

Standing outside Moscone Center in San Francisco with Cia Romano. I think this is that time we all got evacuated due to a fire alarm.

Or the time a bunch of us were gathered in New Orleans (again, some conference or other) and went to dinner at a jazz club, where we ended up seated next to the live jazz trio and she sang along with some of the songs. She had a voice like a blues singer in a cabaret, brassy and smoky and full of hard-won joys, and she used it to great effect standing in front of Bill Gates to harangue him about Internet Explorer. She raised it to fight like hell for the Web and its users, for the foundational principles of universal access and accessible development. She put her voice on paper in some three dozen books, and was working on yet another when she died. In one book, she managed to sneak past the editors an example that used a stick-figure Kama Sutra custom font face. She could never resist a prank, particularly a bawdy one, as long as it didn’t hurt anyone.

Holding court in somebody’s hotel suite, with a baby Matt Mullenweg in attendance.

Once again holding court, this time at a bar with Jason Santa Maria.

She made the trek to Cleveland at least once to attend and be part of the crew for one of our Bread and Soup parties. We put her to work rolling tiny matzoh balls and she immediately made ribald jokes about it, laughing harder at our one-up jokes than she had at her own. She stopped by the house a couple of other times over the years, when she was in town for consulting work, “Auntie Molly” to our eldest and one of my few colleagues to have spent any time with Rebecca. Those pictures were lost, and I still keenly regret that.

Rolling matzoh balls in our kitchen, *still* holding court.

On top of a bus somewhere in the world, probably London, with my partner Kat.

There were so many things about what the Web became that she hated, that she’d spent so much time and energy fighting to avert, but she still loved it for what it could be and what it had been originally designed to be. She took more than one fledgling web designer under her wing, boosted their skills and careers, and beamed with pride at their accomplishments. She told a great story about one, I think it was Dunstan Orchard but I could be wrong, and his afternoon walk through a dry Arizona arroyo.

I could go on for pages, but I won’t; if this were a toast and she were here, she would have long ago heckled me (affectionately) into shutting up. But if you have treasured memories of Molly, I’d love to hear them in the comments below, or on your own blog or social media or podcasts or anywhere. She loved stories. Tell hers.

Memories of Molly was published on Wednesday, September 6th, 2023.
It was assigned to the Personal and Web categories.
There have been forty replies.

Designing the Igalia Chats Logo

Published 1 year, 10 months past

One of the things I’ve been doing at Igalia of late is podcasting with Brian Kardell. It’s called “Igalia Chats”, and last week, I designed it a logo. I tried out a number of different ideas, ran them past the Communication team for feedback, and settled on this one.

D&AD Awards committee, you know where to find me.

And there you have it, the first logo I’ve designed in… well, in quite a while. My work this time around was informed by a few things.

Podcast apps, sites, etc. expect a square image for the podcast’s logo. This doesn’t mean you have to make the visible part of it square, exactly, but it does mean any wide-and-short logo will simultaneously feel cramped and lost in a vast void. Or maybe just very far away. The version shown in this post is not the square version, because this is not a podcast app and because I could. The square version just adds more empty whitespace at the top and bottom, anyway.
I couldn’t really alter the official logo in any major way: the brand guidelines are pretty strong and shouldn’t be broken without collective approval. Given the time that would take, I decided to just work with the logo as-is, and think about possible variants (say, the microphone icon in the blank diamond of the logo) in a later stage. I did think about just not using the official logo at all, but that felt like it would end up looking too generic. Besides, we hav e a pretty nifty logo there, so why not use it?
A typeface for the word “Chats” that works well with Igalia’s official logo. I used Etelka, which is a font we already use on the web site, and I think is the basis of the semi-serifed letters in the official logo anyway. Though I could be wrong about that; while I definitely have opinions about typefaces these days, I’m not very good at identifying them, or being able to distinguish between two similar fonts. Call it typeface blindness.
Using open-source resources where possible; thus, the microphone icon came from The Noun Project. I then modified it a bit (rounded the linecaps, shortened the pickup’s brace) to balance its visual weight with the rest of the design, and not crowd the letters too much. I also added a subtle vertical gradient to the icon, which helped the word “Chats” to stand out a little more. Gotta make the logo pop, donchaknow?

There are probably some adjustments I’ll make after a bit of time, but I was determined not to let perfect be the enemy of shipping. As for how I came to create the logo, you’re probably thinking fancy CSS Grid layout and custom fonts and all that jazz, but no, I just dumped everything into Keynote and fiddled with ideas until I had some I liked. It’s not a fantastic environment for this sort of work, I expect, but it’s Good Enough For Me™.

So, if you’re subscribed to Igalia Chats via your listening channel of choice, you should be seeing a new logo. If you aren’t subscribed… try us, won’t you? Brian and I talk about a lot of web-related stuff with a lot of really interesting people  — most recently, with Kilian Valkhof about the web development application Polypane, with Stephen Shankland about undersea data cables, with Zach Leatherman about open-source work and funding, and many more. Plus sometimes we just talk with each other about what’s new in Web land, things like Google Baseline or huge WebKit updates. And, yes, sometimes we talk about what Igalia is up to, like our work on the Servo engine or the Steam Deck.

This is one of the things I quite enjoy about working for Igalia  — the way I can draw upon all the things I’ve learned over my many (many) years to create different things. A logo last week, a thumbnail-building tool the week before, writing news posts, recording podcasts, doing audio production, figuring out transcription technology, and on and on and on. It can sometimes be frustrating in the way all work can be, but it rarely gets boring. (And if that sounds good to you, we are hiring for a number of roles!)

Designing the Igalia Chats Logo was published on Tuesday, August 22nd, 2023.
It was assigned to the Design and Work categories.
There have been no replies.

First-Person Scrollers

Published 2 years, 2 weeks past

I’ve played a lot of video games over the years, and the thing that just utterly blows my mind about them is how every frame is painted from scratch. So in a game running at 30 frames per second, everything in the scene has to be calculated and drawn every 33 milliseconds, no matter how little or much has changed from one frame to the next. In modern games, users generally demand 60 frames per second. So everything you see on-screen gets calculated, placed, colored, textured, shaded, and what-have-you in 16 milliseconds (or less). And then, in the next 16 milliseconds (or less), it has to be done all over again. And there are games that render the entire scene in single-digits numbers of milliseconds!

I mean, I’ve done some simple 3D render coding in my day. I’ve done hobbyist video game development; see Gravity Wars, for example (which I really do need to get back to and make less user-hostile). So you’d think I’d be used to this concept, but somehow, I just never get there. My pre-DOS-era brain rebels at the idea that everything has to be recalculated from scratch every frame, and doubly so that such a thing can be done in such infinitesimal slivers of time.

So you can imagine how I feel about the fact that web browsers operate in exactly the same way, and with the same performance requirements.

Maybe this shouldn’t come as a surprise. After all, we have user interactions and embedded videos and resizable windows and page scrolling and stuff like that, never mind CSS animations and DOM manipulation, so the viewport often needs to be re-rendered to reflect the current state of things. And to make all that feel smooth like butter, browser engines have to be able to display web pages at a minimum of 60 frames per second.

Admittedly, this would be a popular UI for browsing social media.

This demand touches absolutely everything, and shapes the evolution of web technologies in ways I don’t think we fully appreciate. You want to add a new selector type? It has to be performant. This is what blocked :has() (and similar proposals) for such a long time. It wasn’t difficult to figure out how to select ancestor elements — it was very difficult to figure out how to do it really, really fast, so as not to lower typical rendering speed below that magic 60fps. The same logic applies to new features like view transitions, or new filter functions, or element exclusions, or whatever you might dream up. No matter how cool the idea, if it bogs rendering down too much, it’s a non-starter.

I should note that none of this is to say it’s impossible to get a browser below 60fps: pile on enough computationally expensive operations and you’ll still jank like crazy. It’s more that the goal is to keep any new feature from dragging rendering performance down too far in reasonable situations, both alone and in combination with already-existing features. What constitutes “down too far” and “reasonable situations” is honestly a little opaque, but that’s a conversation slash vigorous debate for another time.

I’m sure the people who’ve worked on browser engines have fascinating stories about what they do internally to safeguard rendering speed, and ideas they’ve had to spike because they were performance killers. I would love to hear those stories, if any BigCo devrel teams are looking for podcast ideas, or would like to guest on Igalia Chats. (We’d love to have you on!)

Anyway, the point I’m making is that performance isn’t just a matter of low asset sizes and script tuning and server efficiency. It’s also a question of the engine’s ability to redraw the contents of the viewport, no matter what changes for whatever reason, with reasonable anticipation of things that might affect the rendering, every 15 milliseconds, over and over and over and over and over again, just so we can scroll our web pages smoothly. It’s kind of bananas, and yet, it also makes sense. Welcome to the web.

First-Person Scrollers was published on Tuesday, June 20th, 2023.
It was assigned to the Browsers and Web categories.
There has been one reply.

From ABC’s to 9999999

Published 2 years, 2 months past

The other week I crossed a midpoint, of sorts: as I was driving home from a weekly commitment, my iPhone segued from Rush’s “Mystic Rhythms” to The Seatbelts’ “N.Y. Rush”, which is, lexicographically speaking, the middle of my iTu — oh excuse me, the middle of my Music Dot App Library, where I passed from the “M” songs into the “N” songs.

See, about a year or so ago, I took inspiration from Kevin Smokler to set about listening through my entire music library alphabetically by song title. Thus, I started with “ABC’s” by K’naan and will end, probably in a year or so, with “9999999” by Mike Morasky (a.k.a Aperture Science Psychoacoustics Laboratory).

Every time I have to drive my car for more than a few minutes, I’ll plug in my iPhone and continue the listen from where I left off. This mainly happens during the aforementioned weekly commitment, which usually sees me driving for an hour or so. I also listen to it while I’m doing chores around the house like installing ceiling fans or diagnosing half-dead Christmas light strings.

This sort of listen is, in many ways, like listening to the entire library on shuffle, because, as Jared Spool used to point out (and probably still does), alphabetically sorting a long list of things is indistinguishable from having it randomized. For me, the main difference between alphabetical and random is that it’s a lot easier to pick back up where you left off when working through alphabetically. (Yes, Music Dot App should do that automatically, but sometimes it forgets where it was.) You can also be a lot more certain that every song gets a listen, something that’s harder to ensure if you’re listening to a random shuffle of a couple thousand tracks and your software loses its place.

There are other advantages: sometimes, artists will use the same song title, and you get interesting combinations. For example, there was “America”, which gave me a song by K’naan and then a same-titled, but very different, song by Spinal Tap. Similarly, there are titular combinations that pop out, like “Come On” by The Goo Goo Dolls, “Come On In, The Dreams Are Fine” by Dee-Lite, and “Come On Over” by Elana Stone.

Some of these combinations groove, some delight, some earn the stank face, and some make me literally laugh out loud. And some aren’t related by title but still go together really, really well. A recent example was the segue from The Prodigy’s “Narayan” to Radiohead’s “The National Anthem”, which sonically flowed just right at the switchover, almost like they’d been composed to have that effect. It made this old long-ago radio DJ smile.

I say I took inspiration from Kevin because my listen has a couple of differences to his:

Kevin has a “no skips, ever” rule, but I will skip songs that are repeats. This happens a lot when you have both live and studio albums, as I do for a few artists (particularly Rush), or have copied tracks for lightly-engineered playlists, as I have a few times. That said, if I have a song by one artist and a cover of that song by another, I don’t skip either of them. For remixes or alternate recordings of a song by the same artist, I generally don’t skip, unless the remix is just the original song with a vaguely different beat track.
I filtered out most of my classical content before starting. This is not because I dislike classical, but because they tend to sort together in unrelenting clumps  — all of Beethoven’s and Mozart’s symphonies one after another after another, for example  — and I wanted a varietal mix. I did keep “classical” albums like Carreras Domingo Pavarotti in Concert and Carmina Burana because they have normal-length tracks with titles that scatter them throughout the sort. The same reasoning was used to retain classic film and TV scores, even if I was stretching it a bit to leave in The Music of Cosmos (the 1980 one), which prefixes all its tracks with Roman numerals… but each track is a medley, so it got a pass. The whole-album-in-a-single-MP3 The Music of Osmos, on the other hand, did not.

All that said, I have a much shorter road than Kevin: he has a library of over twelve thousand tracks, whereas my slightly-filtered library is just shy of 2,500 tracks, or right around 160 hours. The repeated-song skips knock the total time down a bit, probably by a few hours but not much more than that. So, figure at an average of 80 minutes per week, that’s about 120 weeks, or two years and four months to get from beginning to end.

And what will I do when I reach the end? Probably go back to better curate the sorting (e.g., configuring Soundgarden’s “4th of July” to be sorted as “Fourth of July”), create a playlist that cuts out the repeats ahead of time, and start over. But we’ll see when I get there. Maybe next time I’ll listen to it in reverse alphabetical order instead.

From ABC’s to 9999999 was published on Monday, April 10th, 2023.
It was assigned to the Personal category.
There has been one reply.

Ventura Vexations

Published 2 years, 3 months past

I’ve been a bit over a month now on my new 14” MacBook Pro, and I have complaints. Not about the hardware, which is solid yet lightweight, super-quiet yet incredibly fast and powerful, long-lived on battery, and decent enough under the fingertips. Plus, all the keyboard keys Just Work™, unlike the MBP it replaced! So that’s nice.

No, my complaints are entirely about the user environment. At first I thought this was because I skipped directly from OS X 10.14 to macOS 13, and simply wasn’t used to How The Kids Do Things These Days®, but apparently I would’ve felt the same even if I’d kept current with OS updates. So I’m going to gripe here in hopes someone who knows more than me will have recommendations to ameliorate my annoyance.

DragThing Dismay

This isn’t on Apple, but still, it’s a huge loss for me. I know I already complained about the lack of DragThing, but I really, really do miss what it did for me. You never know what you’ve got ’til it’s gone, right? But let me be clear about exactly what it did for me, which so far as I can tell no macOS application does, nor does macOS itself.

The way I used DragThing was to have a long shelf down the right side of my monitor containing small-but-recognizable icons representing my most-used folders (home directory, Downloads, Documents, Applications, a few other folders) and a number of applications. It stayed there all the time, and the icons were always there whether or not the application was running.

When I launched, say, Firefox, then there would be a little indicator next to its application icon in DragThing to indicate it was running. When I quit Firefox, the indicator went away but the Firefox icon stayed. And also, if I launched an application that wasn’t in the DragThing shelf, it did not add an icon for that application to the shelf. (I used the Dock at the bottom of the screen to show me that.)

There are super-powered application switchers available for macOS, but as far as I’ve seen, they only list the applications actually running. Launch an application, its icon is added. Quit an application, its icon disappears. None of these switchers let me keep persistent static one-click shortcuts to launch a variety of applications and open commonly-used folders.

Dock Folder Disgruntlement

Now I’m on to macOS itself. Given the previous problem, the Dock is the only thing available to me, and I have gripes about it. One of the bigger ones is rooted in folders kept on the Dock, to the right of the bar that divides them from the application icons. When I click on them, I get a popup (wince) or a Stack (shudder) instead of them just opening the target folder in the Finder.

In the Before Times, I could create an alias to the folder and drop that in the Dock, the icon in the Dock would look like the target folder, and clicking on the alias opened the folder’s window. If I do that now, the click-to-open part works, but the aliases all look like blank text documents with tiny arrows. What the hell?

If I instead add actual folders (not aliases) to the Dock, holding down ⌥⌘ (option-command) when I click them does exactly what I want. Only, I don’t want to have to hold down modifier keys, especially when using the trackpad. I’ve mostly adapted to the key combo, but even on desktop I still sometimes click a folder and blink in irritation at the popup thingy for a second before remembering that things are stupider now.

Translucency Tribulation

The other problem with the Dock is that mine is too opaque. That’s because the nearly-transparent Finder menu bar was really not doing it for me, so acting on a helpful tip, I went and checked the “Reduce Transparency” option in the Accessibility settings. That fixed the menu bar nicely, but it also made the Dock opaque, which I didn’t actually want. I can pretty easily live with it, but I do wish I could make just the menu bar opaque (without having to resort to desktop wallpaper hacks, which I suspect do not do well with changes of display resolution).

Shortcut Stupidity

And while I’m on the subject of the menu bar: no matter the application or even the Finder itself, dropdown menus from the menu bar render the actions you can do in black and the actions you can’t do in washed-out gray. Cool. But also, all the keyboard shortcuts are now a washed-out gray, which I keep instinctively thinking means they’ve been disabled or something. They’re also a lot more difficult for my older eyes to pick out, and I have to flick my eyes back and forth to make sure a given keyboard shortcut corresponds to a thing I actually can do. Seriously, Apple, what the hell?

Trash Can Troubles

I used to have the Trash can on the desktop, down in the lower right corner, and now I guess I can’t. I vaguely recall this is something DragThing made possible, so maybe that’s another reason to gripe about the lack of it, but it’s still bananas to me that the Trash can is not there by default. I understand that I may be very old.

Preview Problems

On my old machine, Preview was probably the most rock-solid application on there. On the new machine, Preview occasionally hangs on closing heavily-commented PDFs when I choose not to save changes. I can force-quit it and so far haven’t experienced any data corruption, but it’s still annoying.

Those are the things that have stood out the most to me about Ventura. How about you? What bothers you about your operating system (whichever one that is) and how would you like to see it fixed?

Oh, and I’ll follow this up soon with a post about what I like in Ventura, because it’s not all frowns and grumbles.

Ventura Vexations was published on Tuesday, April 4th, 2023.
It was assigned to the Commentary, Mac, and Rants categories.
There have been five replies.

Echoed Whisper

Published 2 years, 3 months past

The two videos I was using Whisper on have been published, so you can see for yourself how the captioning worked out. Designed as trade-show booth reel pieces, they’re below three minutes each, so watching both should take less than ten minutes, even with pauses to scrutinize specific bits of captioning.

As I noted in my previous post about this, I only had to make one text correction to the second video, plus a quick find-and-replace to turn “WPE WebKit” into “WPEWebKit”. For the first video, I did make a couple of edits beyond fixing transcription errors; specifically, I added the dashes and line breaking in this part of the final SubRip Subtitle (SRT) file uploaded to YouTube:

00:00:25,000 --> 00:00:32,000
- Hey tell me, is Michael coming out?
- Affirmative, Mike's coming out.

This small snippet actually embodies the two things where Whisper falls down a bit: multiple voices, and caption line lengths.

Right now, Whisper doesn’t even try to distinguish between different voices, the technical term for which is “speaker diarisation”. This means Whisper ideal for transcribing, say, a conference talk or a single-narrator video. It’s lot less useful for things like podcasts, because while it will probably get (nearly) all the words right, it won’t even throw in a marker that the voice changed, let alone try to tell which bits belong to a given voice. You have to go into the output and add those yourself, which for an hourlong podcast could be… quite the task.

There are requests for adding this to Whisper scattered in their GitHub discussions, but I didn’t see any open pull requests or mention of it in the README, so I don’t know if that’s coming or not. If you do, please leave a comment!

As for the length of captions, I agree with J David Eisenberg: Whisper too frequently errs on the side of “too long”. For example, here’s one of the bits Whisper output:

00:01:45,000 --> 00:01:56,000
Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME, all fluidly.

That’s eleven seconds of static subtitling, with 143 characters of line length. The BBC recommends line lengths at or below 37 characters, and Netflix suggests a limit of 42 characters, with actual hard limits for a few languages. You can throw in line breaks to reduce line length, but should never have more than three lines, which wouldn’t be possible with 143 characters. But let’s be real, that 11-second caption really should be split in twain, at the absolute minimum.

Whisper does not, as of yet, have a way to request limiting caption lengths, either in time or in text. There is a fairly detailed discussion of this over on Whisper’s repository, with some code graciously shared by people working to address this, but it would be a lot better if Whisper accepted an argument to limit the length of any given bit of output. And also if it threw in line breaks on its own, say around 40 characters in English, even when not requested.

The last thing I’d like to see improved is speed. It’s not terribly slow as is, to be clear. Using the default model size (small), which is what I used for the videos I wrote about, Whisper worked at about 2:1 speed: a two-minute video took about a minute to process. I tried the next size up, the medium model, and it worked at roughly 1:1.5 speed, taking about an hour fifteen to process a 46-minute video.

The thing is, all that is running solely on the CPU, which in my case is a 12-core M2. According to this pull request, problems in one of Whisper’s dependencies, PyTorch, means GPU utilization is essentially unavailable on the hardware I have. (Thanks to Chris Adams for the pointer.) I expect that will be cleared up sooner or later, so the limitation feels minor.

Overall, it’s a powerful tool, with accuracy I still find astounding, only coming up short in quality-of-life features that aren’t critical in some applications (transcribing a talk) or relatively easily worked around in others (hand-correcting caption length in short videos; using a small script to insert line breaks in longer videos). The lack of speaker diarisation is the real letdown for me, and definitely the hardest to work around, so I hope it gets addressed soon.

Echoed Whisper was published on Friday, March 31st, 2023.
It was assigned to the Tools category.
There have been two replies.

Peerless Whisper

Published 2 years, 3 months past

What happened was, I was hanging out in an online chatter channel when a little birdy named Bruce chirped about OpenAI’s Whisper and how he was using it to transcribe audio. And I thought, Hey, I have audio that needs to be transcribed. Brucie Bird also mentioned it would output text, SRT, and WebVTT formats, and I thought, Hey, I have videos I’ll need to upload with transcription to YouTube! And then he said you could run it from the command line, and I thought, Hey, I have a command line!

So off I went to install it and try it out, and immediately ran smack into some hurdles I thought I’d document here in case someone else has similar problems. All of this took place on my M2 MacBook Pro, though I believe most of the below should be relevant to anyone trying to do this at the command line.

The first thing I did was what the GitHub repository’s README recommended, which is:

$ pip install -U openai-whisper

That failed because I didn’t have pip installed. Okay, fair enough. I figured out how to install that, setting up an alias of python for python3 along the way, and then tried again. This time, the install started and then bombed out:

Collecting openai-whisper
  Using cached openai-whisper-20230314.tar.gz (792 kB)
  Installing build dependencies ...  done
  Getting requirements to build wheel ...  done
  Preparing metadata (pyproject.toml) ...  done
Collecting numba
  Using cached numba-0.56.4.tar.gz (2.4 MB)
  Preparing metadata (setup.py) ...  error
  error: subprocess-exited-with-error

…followed by some stack trace stuff, none of which was really useful until ten or so lines down, where I found:

RuntimeError: Cannot install on Python version 3.11.2; only versions >=3.7,<3.11 are supported.

In other words, the version of Python I have installed is too modern to run AI. What a world.

I DuckDucked around a bit and hit upon pyenv, which is I guess a way of installing and running older versions of Python without having to overwrite whatever version(s) you already have. I’ll skip over the error part of my trial-and-error process and give you the commands that made it all work:

$ brew install pyenv

$ pyenv install 3.10

$ PATH="~/.pyenv/shims:${PATH}"

$ pyenv local 3.10

$ pip install -U openai-whisper

That got Whisper to install. It didn’t take very long.

At that point, I wondered what I’d have to configure to transcribe something, and the answer turned out to be precisely zilch. Once the install was done, I dropped into the directory containing my MP4 video, and typed this:

$ whisper wpe-mse-eme-v2.mp4

Here’s what I got back. I’ve marked the very few errors.

[00:00.000 --> 00:07.000]  In this video, we'll show you several demos showcasing multi-media capabilities in WPE WebKit,
[00:07.000 --> 00:11.000]  the official port of the WebKit engine for embedded devices.
[00:11.000 --> 00:18.000]  Each of these demos are running on the low-powered Raspberry Pi 3 seen in the lower right-hand side of the screen here.
[00:18.000 --> 00:25.000]  Infotainment systems and media players often need to consume digital rights-managed videos.
[00:25.000 --> 00:32.000]  They tell me, is Michael coming out?  Affirmative, Mike's coming out.
[00:32.000 --> 00:45.000]  Here you can see just that, smooth streaming playback using encrypted media extensions, or EME, with PlayReady 4.
[00:45.000 --> 00:52.000]  Media source extensions, or MSE, are used by many players for greater control over playback.
[00:52.000 --> 01:00.000]  YouTube TV has a whole conformance test suite for this, which WPE has been passing since 2021.
[01:00.000 --> 01:09.000]  The loan exceptions here are those tests requiring hardware support not available on the Raspberry Pi 4, but available for other platforms.
[01:09.000 --> 01:16.000]  YouTube TV has a conformance test for EME, which WPE WebKit passes with flying colors.
[01:22.000 --> 01:40.000]  Music
[01:40.000 --> 01:45.000]  Finally, perhaps most impressively, we can put all these things together.
[01:45.000 --> 01:56.000]  Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME all fluidly.
[01:56.000 --> 02:04.000]  Music
[02:04.000 --> 02:09.000]  Remember, all of this is being played back on the same low-powered Raspberry Pi 3.
[02:27.000 --> 02:34.000]  For more about WPE WebKit, please visit WPE WebKit.com.
[02:34.000 --> 02:42.000]  For more information about EGALIA, or to find out how we can help with your embedded device needs, please visit us at EGALIA.com.

I am, frankly, astonished. This has no business being as accurate as it is, for all kinds of reasons. There’s a lot of jargon and very specific terminology in there, and Whisper nailed pretty much every last bit of it, first time in, no special configuration, nothing. I didn’t even bump up the model size from the default of small. I felt a little like that Froyo guy in the animated Hunchback of Notre Dame meme yelling about sorcery or whatever.

True, the output isn’t absolutely perfect. Let’s review the glitches in reverse order. The last two errors, turning “Igalia” into “EGALIA”, seems fair enough given I didn’t specify that there would be languages other than English involved. I routinely have to spell it for my fellow Americans, so no reason to think a codebase could do any better.

The space inserted into “WPEWebKit” (which happens throughout) is similarly understandable. I’m impressed it understood “WebKit” at all, never mind that it was properly capitalized and not-spaced.

The place where it says Music and I marked it as an error: This is essentially an echoing countdown and then a white-noise roar from rocket engines. There’s a “music today is just noise” joke in here somewhere, but I’m too hip to find it.

Whisper turning “lone” into “loan” doesn’t particularly faze me, given the difficulty of handling soundalike words. Hell, just yesterday, I was scribing a conference call and mistakenly recorded “gamut” as “gamma”, and those aren’t even technically homophones. They just sound like they are.

Rounding out the glitch tour, “Hey” got turned into “They”, which (given the audio quality of that particular part of the video) is still pretty good.

There is one other error I couldn’t mark because there’s nothing to mark, but if you scrutinize the timeline, you’ll see a gap from 02:09.000 and 02:27.000. In there, a short clip from a movie plays, and there’s a brief dialogue between two characters in not-very-Dutch-accented English there. It’s definitely louder and more clear than the 00:25.000 –> 00:32.000 bit, so I’m not sure why Whisper just skipped over it. Manually transcribing that part isn’t a big deal, but it’s odd to see it perform so flawlessly on every other piece of speech and then drop this completely on the floor.

Before posting, I decided to give Whisper another go, this time on a different video:

$ whisper wpe-gamepad-support-v3.mp4

This was the result, with the one actual error marked:

[00:00.000 --> 00:13.760]  In this video, we demonstrate WPE WebKit's support for the W3C's GamePad API.
[00:13.760 --> 00:20.080]  Here we're running WPE WebKit on a Raspberry Pi 4, but any device that will run WPE WebKit
[00:20.080 --> 00:22.960]  can benefit from this support.
[00:22.960 --> 00:28.560]  The GamePad API provides a JavaScript interface that makes it possible for developers to access
[00:28.560 --> 00:35.600]  and respond to signals from GamePads and other game controllers in a simple, consistent way.
[00:35.600 --> 00:40.320]  Having connected a standard Xbox controller, we boot up the Raspberry Pi with a customized
[00:40.320 --> 00:43.040]  build route image.
[00:43.040 --> 00:48.560]  Once the device is booted, we run cog, which is a small, single window launcher made specifically
[00:48.560 --> 00:51.080]  for WPE WebKit.
[00:51.080 --> 00:57.360]  The window cog creates can be full screen, which is what we're doing here.
[00:57.360 --> 01:01.800]  The game is loaded from a website that hosts a version of the classic video arcade game
[01:01.800 --> 01:05.480]  Asteroids.
[01:05.480 --> 01:11.240]  Once the game has loaded, the Xbox controller is used to start the game and control the spaceship.
[01:11.240 --> 01:17.040]  All the GamePad inputs are handled by the JavaScript GamePad API.
[01:17.040 --> 01:22.560]  This GamePad support is now possible thanks to work done by Igalia in 2022 and is available
[01:22.560 --> 01:27.160]  to anyone who uses WPE WebKit on their embedded device.
[01:27.160 --> 01:32.000]  For more about WPE WebKit, please visit wpewebkit.com.
[01:32.000 --> 01:35.840]  For more information about Igalia, or to find out how we can help with your embedded device
[01:35.840 --> 01:39.000]  needs, please visit us at Igalia.com.

That should have been “buildroot”. Again, an entirely reasonable error. I’ve made at least an order of magnitude more typos writing this post than Whisper has in transcribing these videos. And this time, it got the spelling of Igalia correct. I didn’t make any changes between the two runs. It just… figured it out.

I don’t have a lot to say about this other than, wow. Just WOW. This is some real Clarke’s Third Law stuff right here, and the technovertigo is Marianas deep.

Peerless Whisper was published on Thursday, March 23rd, 2023.
It was assigned to the Technovertigo, Today I Learned, and Tools categories.
There have been five replies.

Browse the Archive

Earlier Entries

Later Entries

Thoughts From Eric Archive

Anchoring Sidenotes

Fixing Proximate Overlap

DragThing Dismay

Dock Folder Disgruntlement

Translucency Tribulation

Shortcut Stupidity

Trash Can Troubles

Preview Problems

Browse the Archive

Feeds

Categories

Archives