The two videos I was using Whisper on have been published, so you can see for yourself how the captioning worked out. Designed as trade-show booth reel pieces, they’re below three minutes each, so watching both should take less than ten minutes, even with pauses to scrutinize specific bits of captioning.
As I noted in my previous post about this, I only had to make one text correction to the second video, plus a quick find-and-replace to turn “WPE WebKit” into “WPEWebKit”. For the first video, I did make a couple of edits beyond fixing transcription errors; specifically, I added the dashes and line breaking in this part of the final SubRip Subtitle (SRT) file uploaded to YouTube:
00:00:25,000 --> 00:00:32,000
- Hey tell me, is Michael coming out?
- Affirmative, Mike's coming out.
This small snippet actually embodies the two things where Whisper falls down a bit: multiple voices, and caption line lengths.
Right now, Whisper doesn’t even try to distinguish between different voices, the technical term for which is “speaker diarisation”. This means Whisper ideal for transcribing, say, a conference talk or a single-narrator video. It’s lot less useful for things like podcasts, because while it will probably get (nearly) all the words right, it won’t even throw in a marker that the voice changed, let alone try to tell which bits belong to a given voice. You have to go into the output and add those yourself, which for an hourlong podcast could be… quite the task.
There are requests for adding this to Whisper scattered in their GitHub discussions, but I didn’t see any open pull requests or mention of it in the README, so I don’t know if that’s coming or not. If you do, please leave a comment!
As for the length of captions, I agree with J David Eisenberg: Whisper too frequently errs on the side of “too long”. For example, here’s one of the bits Whisper output:
00:01:45,000 --> 00:01:56,000
Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME, all fluidly.
That’s eleven seconds of static subtitling, with 143 characters of line length. The BBC recommends line lengths at or below 37 characters, and Netflix suggests a limit of 42 characters, with actual hard limits for a few languages. You can throw in line breaks to reduce line length, but should never have more than three lines, which wouldn’t be possible with 143 characters. But let’s be real, that 11-second caption really should be split in twain, at the absolute minimum.
Whisper does not, as of yet, have a way to request limiting caption lengths, either in time or in text. There is a fairly detailed discussion of this over on Whisper’s repository, with some code graciously shared by people working to address this, but it would be a lot better if Whisper accepted an argument to limit the length of any given bit of output. And also if it threw in line breaks on its own, say around 40 characters in English, even when not requested.
The last thing I’d like to see improved is speed. It’s not terribly slow as is, to be clear. Using the default model size (small), which is what I used for the videos I wrote about, Whisper worked at about 2:1 speed: a two-minute video took about a minute to process. I tried the next size up, the medium model, and it worked at roughly 1:1.5 speed, taking about an hour fifteen to process a 46-minute video.
The thing is, all that is running solely on the CPU, which in my case is a 12-core M2. According to this pull request, problems in one of Whisper’s dependencies, PyTorch, means GPU utilization is essentially unavailable on the hardware I have. (Thanks to Chris Adams for the pointer.) I expect that will be cleared up sooner or later, so the limitation feels minor.
Overall, it’s a powerful tool, with accuracy I still find astounding, only coming up short in quality-of-life features that aren’t critical in some applications (transcribing a talk) or relatively easily worked around in others (hand-correcting caption length in short videos; using a small script to insert line breaks in longer videos). The lack of speaker diarisation is the real letdown for me, and definitely the hardest to work around, so I hope it gets addressed soon.
What happened was, I was hanging out in an online chatter channel when a little birdy named Bruce chirped about OpenAI’s Whisper and how he was using it to transcribe audio. And I thought, Hey, I have audio that needs to be transcribed. Brucie Bird also mentioned it would output text, SRT, and WebVTT formats, and I thought, Hey, I have videos I’ll need to upload with transcription to YouTube! And then he said you could run it from the command line, and I thought, Hey, I have a command line!
So off I went to install it and try it out, and immediately ran smack into some hurdles I thought I’d document here in case someone else has similar problems. All of this took place on my M2 MacBook Pro, though I believe most of the below should be relevant to anyone trying to do this at the command line.
The first thing I did was what the GitHub repository’s README recommended, which is:
$ pip install -U openai-whisper
That failed because I didn’t have pip installed. Okay, fair enough. I figured out how to install that, setting up an alias of python for python3 along the way, and then tried again. This time, the install started and then bombed out:
…followed by some stack trace stuff, none of which was really useful until ten or so lines down, where I found:
RuntimeError: Cannot install on Python version 3.11.2; only versions >=3.7,<3.11 are supported.
In other words, the version of Python I have installed is too modern to run AI. What a world.
I DuckDucked around a bit and hit upon pyenv, which is I guess a way of installing and running older versions of Python without having to overwrite whatever version(s) you already have. I’ll skip over the error part of my trial-and-error process and give you the commands that made it all work:
That got Whisper to install. It didn’t take very long.
At that point, I wondered what I’d have to configure to transcribe something, and the answer turned out to be precisely zilch. Once the install was done, I dropped into the directory containing my MP4 video, and typed this:
$ whisper wpe-mse-eme-v2.mp4
Here’s what I got back. I’ve marked the very few errors.
[00:00.000 --> 00:07.000] In this video, we'll show you several demos showcasing multi-media capabilities in WPE WebKit,
[00:07.000 --> 00:11.000] the official port of the WebKit engine for embedded devices.
[00:11.000 --> 00:18.000] Each of these demos are running on the low-powered Raspberry Pi 3 seen in the lower right-hand side of the screen here.
[00:18.000 --> 00:25.000] Infotainment systems and media players often need to consume digital rights-managed videos.
[00:25.000 --> 00:32.000] They tell me, is Michael coming out? Affirmative, Mike's coming out.
[00:32.000 --> 00:45.000] Here you can see just that, smooth streaming playback using encrypted media extensions, or EME, with PlayReady 4.
[00:45.000 --> 00:52.000] Media source extensions, or MSE, are used by many players for greater control over playback.
[00:52.000 --> 01:00.000] YouTube TV has a whole conformance test suite for this, which WPE has been passing since 2021.
[01:00.000 --> 01:09.000] The loan exceptions here are those tests requiring hardware support not available on the Raspberry Pi 4, but available for other platforms.
[01:09.000 --> 01:16.000] YouTube TV has a conformance test for EME, which WPE WebKit passes with flying colors.
[01:22.000 --> 01:40.000] Music
[01:40.000 --> 01:45.000] Finally, perhaps most impressively, we can put all these things together.
[01:45.000 --> 01:56.000] Here is the dash.js player using MSE, running in a page, and using Widevine DRM to decrypt and play rights-managed video with EME all fluidly.
[01:56.000 --> 02:04.000] Music
[02:04.000 --> 02:09.000] Remember, all of this is being played back on the same low-powered Raspberry Pi 3.
[02:27.000 --> 02:34.000] For more about WPE WebKit, please visit WPE WebKit.com.
[02:34.000 --> 02:42.000] For more information about EGALIA, or to find out how we can help with your embedded device needs, please visit us at EGALIA.com.
I am, frankly, astonished. This has no business being as accurate as it is, for all kinds of reasons. There’s a lot of jargon and very specific terminology in there, and Whisper nailed pretty much every last bit of it, first time in, no special configuration, nothing. I didn’t even bump up the model size from the default of small. I felt a little like that Froyo guy in the animated Hunchback of Notre Dame meme yelling about sorcery or whatever.
True, the output isn’t absolutely perfect. Let’s review the glitches in reverse order. The last two errors, turning “Igalia” into “EGALIA”, seems fair enough given I didn’t specify that there would be languages other than English involved. I routinely have to spell it for my fellow Americans, so no reason to think a codebase could do any better.
The space inserted into “WPEWebKit” (which happens throughout) is similarly understandable. I’m impressed it understood “WebKit” at all, never mind that it was properly capitalized and not-spaced.
The place where it says Music and I marked it as an error: This is essentially an echoing countdown and then a white-noise roar from rocket engines. There’s a “music today is just noise” joke in here somewhere, but I’m too hip to find it.
Whisper turning “lone” into “loan” doesn’t particularly faze me, given the difficulty of handling soundalike words. Hell, just yesterday, I was scribing a conference call and mistakenly recorded “gamut” as “gamma”, and those aren’t even technically homophones. They just sound like they are.
Rounding out the glitch tour, “Hey” got turned into “They”, which (given the audio quality of that particular part of the video) is still pretty good.
There is one other error I couldn’t mark because there’s nothing to mark, but if you scrutinize the timeline, you’ll see a gap from 02:09.000 and 02:27.000. In there, a short clip from a movie plays, and there’s a brief dialogue between two characters in not-very-Dutch-accented English there. It’s definitely louder and more clear than the 00:25.000 –> 00:32.000 bit, so I’m not sure why Whisper just skipped over it. Manually transcribing that part isn’t a big deal, but it’s odd to see it perform so flawlessly on every other piece of speech and then drop this completely on the floor.
Before posting, I decided to give Whisper another go, this time on a different video:
$ whisper wpe-gamepad-support-v3.mp4
This was the result, with the one actual error marked:
[00:00.000 --> 00:13.760] In this video, we demonstrate WPE WebKit's support for the W3C's GamePad API.
[00:13.760 --> 00:20.080] Here we're running WPE WebKit on a Raspberry Pi 4, but any device that will run WPE WebKit
[00:20.080 --> 00:22.960] can benefit from this support.
[00:22.960 --> 00:28.560] The GamePad API provides a JavaScript interface that makes it possible for developers to access
[00:28.560 --> 00:35.600] and respond to signals from GamePads and other game controllers in a simple, consistent way.
[00:35.600 --> 00:40.320] Having connected a standard Xbox controller, we boot up the Raspberry Pi with a customized
[00:40.320 --> 00:43.040] build route image.
[00:43.040 --> 00:48.560] Once the device is booted, we run cog, which is a small, single window launcher made specifically
[00:48.560 --> 00:51.080] for WPE WebKit.
[00:51.080 --> 00:57.360] The window cog creates can be full screen, which is what we're doing here.
[00:57.360 --> 01:01.800] The game is loaded from a website that hosts a version of the classic video arcade game
[01:01.800 --> 01:05.480] Asteroids.
[01:05.480 --> 01:11.240] Once the game has loaded, the Xbox controller is used to start the game and control the spaceship.
[01:11.240 --> 01:17.040] All the GamePad inputs are handled by the JavaScript GamePad API.
[01:17.040 --> 01:22.560] This GamePad support is now possible thanks to work done by Igalia in 2022 and is available
[01:22.560 --> 01:27.160] to anyone who uses WPE WebKit on their embedded device.
[01:27.160 --> 01:32.000] For more about WPE WebKit, please visit wpewebkit.com.
[01:32.000 --> 01:35.840] For more information about Igalia, or to find out how we can help with your embedded device
[01:35.840 --> 01:39.000] needs, please visit us at Igalia.com.
That should have been “buildroot”. Again, an entirely reasonable error. I’ve made at least an order of magnitude more typos writing this post than Whisper has in transcribing these videos. And this time, it got the spelling of Igalia correct. I didn’t make any changes between the two runs. It just… figured it out.
I don’t have a lot to say about this other than, wow. Just WOW. This is some real Clarke’s Third Law stuff right here, and the technovertigo is Marianas deep.
The CSSWG (CSS Working Group) is currently debating what to name a conditional structure, and it’s kind of fascinating. There are a lot of strong opinions, and I’m not sure how many of them are weakly held.
Boiled down to the bare bones, the idea is to take the conditional structures CSS already has, like @supports and @media, and allow more generic conditionals that combine and enhance what those structures make possible. To pick a basic example, this:
Except nobody wants to have to type @conditional and @otherwise, so the WG went in search of shorter names.
The Sass-savvy among you are probably jumping up and down right now, shouting “We have that! We have that already! Just call them @if and @else and finally get on our level!” And yes, you do have that already: Sass uses exactly those keywords. There are some minor syntactic differences (Sass doesn’t require parentheses around the conditional tests, for example) and it’s not clear whether CSS would allow testing of variable values the way Sass does, but they’re very similar.
And that’s a problem, because if CSS starts using @if and @else, there is the potential for syntactic train wrecks. If you’re writing with Sass, how will it tell the difference between its @if and the CSS @if? Will you be forever barred from using CSS conditionals in Sass, if that’s what goes into CSS? Or will Sass be forced to rename those conditionals to something else, in order to avoid clashing — and if so, how much upheaval will that create for Sass authors?
The current proposal, as I write this, is to use @when and @else in CSS Actual. Thus, something like:
Even though there is overlap with @else, apparently starting the overall structure with @when would allow Sass to tell the difference. So that would sidestep clashing with Sass.
But should the CSS WG even care that a third-party code base’s syntax gets trampled on by CSS syntax? I imagine Sass authors would say, “Uh, hell yeah they should”, but does that outweigh the potential learning hurdle of all the non-Sass authors, both now and over the next few decades, learning that @when doesn’t actually have temporal meaning and is just an alias for the more recognizable if statement?
Because while it’s true that some programming languages have a when conditional structure (kOS being the one I’ve used most recently), they usually also have an if structure, and the two sometimes mean different things. There is a view held by some that using the label when when we really mean if is a mistake, one that will stand out as a weird choice and a design blunder, 10 years hence, and will create a cognitive snag in the process of learning CSS. Others hold the view that when is a relatively common programming term, it’s sometimes synonymous with if, every language has quirks that new learners need to learn, and it’s worth avoiding a clash with tools and authors that already exist.
If you ask me, both views are true, and that’s the real problem. I imagine most of the participants in the discussion, even if their strong opinions are strongly held, can at least see where the other view is rooted, and sympathize with it. And it’s very likely the case that even if Sass and other tools didn’t exist, the WG would still be having the same debate, because both terms work in context. I suspect if would have won by now, but who knows? Maybe not. There have been longer debates over less fundamental concepts over the years.
A lot of my professional life has been spent explaining CSS to people new to it, so that may be why I personally lean toward @if over @when. It’s a bit easier to explain, it looks more familiar to anyone who’s done programming at just about any level, and semantically it makes a bit more sense to me. It’s also true that I come from a place of not having to worry about Sass changing on me, because I’ve basically never used it (or any other CSS pre-processor, for that matter) and I don’t have to do the heavy lifting of rewriting Sass to deal with this. So, easy for me to say!
That said, I have an instinctive distrust of arguments by majority. Yes, the number of Sass developers who’d have to adapt Sass to @if in CSS Actual is vanishingly small compared to the population of current and future CSS authors, and the number of Sass authors is likely much smaller than the number of total CSS authors. That doesn’t automatically mean they should be discounted. It’s good to keep CSS as future-proof as possible, but it should also be kept as present-proof as possible.
The rub comes in with “as possible”, though. This isn’t a situation where all things are possible. Something’s going to give, and there will be a group of people ill-served by the result. Will it be Sass authors? Future CSS learners? Another group? Everyone? We’ll see!
Thanks to the long and winding history of my blog, I write posts in Markdown in BBEdit, export them to HTML, and paste the resulting HTML into WordPress. I do it that way because switching WordPress over to auto-parsing Markdown in posts causes problems with rendering the markup of some posts I wrote 15-20 years ago, and finding and fixing every instance is a lengthy project for which I do not have the time right now.
(And I don’t use the block editor because whenever I use it to edit an old post, the markup in those posts get mangled so much that it makes me want to hurl. This is as much the fault of my weird idiosyncratic bespoke-ancient setup as of WordPress itself, but it’s still super annoying and so I avoid it entirely.)
Anyway, the point here is that I write Markdown in BBEdit, and export it from there. This works okay, but there have always been things missing, like a way to easily add attributes to elements like my code blocks. BBEdit’s default Markdown exporter, CommonMark, sort of supports that, except it doesn’t appear to give me control over the class names: telling it I want a class value of css on a preformatted block means I get a class value of language-css instead. Also it drops that class value on the code element it inserts into the pre element, instead of attaching it directly to the pre element. Not good, unless I start using Prism, which I may one day but am not yet.
Pandoc, another exporter you can use in BBEdit, offers much more robust and yet simple element attribute attachment: you put {.class #id} or whatever at the beginning of any element, and you get those things attached directly to the element. But by default, it also wraps elements around, and adds attributes to, the pre element, apparently in anticipation of some other kind of syntax highlighting.
I spent an hour reading the Pandoc man page (just kidding, I was actually skimming, that’s the only way I could possibly get through all that in an hour) and found the --no-highlight option. Perfect! So I dropped into Preferences > Languages > Language-specific settings:Markdown > Markdown, set the “Markdown processor” dropdown to “Custom”, and filled in the following:
Command
pandoc
Arguments
--no-highlight
Done and done. I get a more powerful flavor of Markdown in an editor I know and love. It’s not perfect — I still have to manually tweak table markup by hand, for example — but it’s covering probably 95% of my use cases for writing blog posts.
Now all I need to do is find a Pandoc Markdown option or extensions or whatever that keeps it from collapsing the whitespace between elements in its HTML output, and I’ll be well and truly satisfied.
A sizeable chunk of my work at Igalia so far involves editing and updating the Mozilla Developer Network (MDN), and a smaller chunk has me working on the Web Platform Tests (WPT). In both cases, the content is stored in large public repositories (MDN, WPT) and contributors are encouraged to fork the repositories, clone them locally, and push updates via the fork as PRs (Pull Requests). And while both repositories roll in localhost web server setups so you can preview your edits locally, each has its own.
As useful as these are, if you ignore the whole “auto-force a browser page reload every time the file is modified in any way whatsoever” thing that I’ve been trying very hard to keep from discouraging me from saving often, each has to be started in its own way, from within their respective repository directories, and it’s generally a lot more convenient to do so in a separate Terminal window.
I was getting tired of constantly opening a new Terminal window, cding into the correct place, remembering the exact invocation needed to launch the local server, and on and on, so I decided to make my life slightly easier with a few short scripts and aliases. Maybe this will be useful to you as well.
First, I decided to keep things relatively simple. Instead of writing a small program that would handle all server startups by parsing shell arguments and what have you, I wrote a couple of very similar shell scripts. Here’s the script for launching MDN’s localhost:
alias mdn-server="open -a Terminal.app ~/bin/mdn-start.bsh"
Translated into English, that means “open the file ~/bin/mdn-start.bsh using the -application Terminal.app”.
Thus, when I type mdn-server in any command prompt, a new Terminal window will open and the shell script mdn-start.bsh will be run; the script switches into the needed directory and launches the localhost server using yarn, as per the MDN instructions. What’s more, when I’m done working on MDN, I can switch to the window running the server, stop the server with ⌃C (control-C), and the Terminal window closes automatically.
I did something very similar for WPT, except in this case the alias reads:
alias wpt-server="open -a Terminal.app ~/bin/wpt-serve.bsh"
And the script to which it points reads:
#!/bin/bash
cd ~/repos/wpt/
./wpt serve
As I mentioned before, I chose to do it this way rather than writing a single alias (say, local-server) that would accept arguments (mdn, wpt, etc.) and fire off scripts accordingly, but that’s also an option and a viable one at that.
So that’s my little QoL (Quality of Life) upgrade to make working on MDN and WPT a little easier. I hope it helps you in some way!
For years, I’ve had a bash alias that re-runs the previous command via sudo. This is useful in situations where I try to do a thing that requires root access, and I’m not root (because I am never root). Rather than have to retype the whole thing with a sudo on the front, I just type please and it does that for me. It looked like this in my .bashrc file:
alias please='sudo "$BASH" -c "$(history -p !!)"'
But then, the other day, I saw Kat Maddox’s tweet about how she aliases please straight to sudo, so to do things as root, she types please apt update, which is equivalent to sudo apt update. Which is pretty great, and I want to do that! Only, I already have that word aliased.
What to do? A bash function! After commenting out my old alias, here’s what I added to .bash_profile:
please() {
if [ "$1" ]; then
sudo $@
else
sudo "$BASH" -c "$(history -p !!)"
fi
}
That way, if I remember to type please apachectl restart, as in Kat’s setup, it will ask for the root password and then execute the command as root; if I forget my manners and simply type apachectl restart, then when I’m told I don’t have privileges to do that, I just type please and the old behavior happens. Best of both worlds!
What happened was, I was preparing to roll out new designs for the News section and eventpages of An Event Apart, and I had each rollout in its own branch. Somewhere in the process of bringing both into the master branch, I managed to create a merge conflict that rapidly led to more and more conflicts. I very nearly had to take off and nuke the entire site from orbit, just to start over. (A couple of branches, including dev, did have to get erased and re-pulled.)
Part of what made it worse was that at one point, I accidentally committed a quick edit to master, because I’d forgotten to check out the branch I was trying to edit, and my attempts to undo that mistake just compounded whatever other mistakes already existed. Once all the dust settled and things were back into good shape, I said to myself, “Self, I bet there’s a way to prevent commits to the master branch, because git is second only to emacs in the number of things you can do to/with it.” So I went looking, and yes, there is a way: add the following to your .git/hooks/pre-commit file.
#!/bin/sh
branch="$(git rev-parse --abbrev-ref HEAD)"
if [ "$branch" = "master" ]; then
echo "You can't commit directly to master branch"
exit 1
fi
I got that from this StackOverflow answer, and it was perfect for me, since I use the bash shell. So I created the pre-commit file, made a trivial README.md edit, and tried to commit to master. That’s when OS X Mojave’s Terminal spit back:
fatal: cannot exec '.git/hooks/pre-commit': Operation not permitted
Huh. I mean, it prevented me from committing to master, but not in a useful way. Once I verified that it happened in all branches, not just master, I knew there was trouble.
I checked permissions and all the rest, but I was still getting the error. If I went into .git/hooks and ran the script from the command line by ./pre-commit, I got a slightly different error:
-bash: ./pre-commit: /bin/bash: bad interpreter: Operation not permitted
So I submitted my own StackOverflow question, detailing what I’d done and the file and directory permissions and all the rest. I was stunned to find out the answer was that Mojave itself was blocking things, through its System Integrity Protection feature. Why did this simple file trigger SIP? I don’t know.
The fix, shared by both Jeff and Rich, was to go into .git/hooks and then type the following to check for SIP status:
xattr -l pre-commit
It showed a com.apple.quarantine value, so I then typed:
xattr -d com.apple.quarantine pre-commit
And that was it! Now if I try to commit a change to the master branch, the commit is rejected and I get a warning message. At that point, I can git stash the changes, check out the proper branch (or a new one), and then git stash pop to bring the changes into that branch, where I can commit them and then merge the changes in properly.
I may modify the script to reject commits to the dev branch as well, but I’m holding off on that for now, since the dev branch is often where merge conflicts are worked out before going to master. Either way, at least I’ll be less likely to accidentally foul up master when I’m hip-deep in other problems.
I’ve relied on a mouse for about a decade and a half. I don’t mean “relied on a mouse” in the generic sense, but rather in the sense that I’ve relied on one very specific and venerable mouse: a Logitech MX500.
I’ve had it for so long, I’d forgotten how long I’ve had it. I searched for information about its production dates and wouldn’t you know it, Wikipedia has an article devoted solely to Logitech products throughout history, because of course it does, and it lists (among other things) their dates of release. The MX500 was released in 2002, and superseded by the MX510 in 2004. I then remembered a photo I took of my eldest child when she was an infant, trying to chew on a computer mouse. I dug it out of my iPhoto library and yep, it’s my MX500. The picture is dated June 2004.
So I have photographic evidence that I’ve used this specific mouse for 15 years or more. The logo plate on top of the mouse has been worn half-smooth and half-paintless by the palm of my hand, much like the shiny-smooth areas worn into the subtle matte surface texture where the thumb and pinky finger grip the sides. The model and technical information printed on the underside has similarly worn away. It started out with four little oval glide nubs on the underside that held the bottom away from the desk surface; only one remains. Even though, as an optical mouse, it can be used on any surface, I eventually went back to soft mousepads, so as to limit further undercarriage damage.
The old gray mare — er, mouse — proving that it’s not the years, it’s the mileage
Why have I been so devoted to this mouse? Well, it’s incredibly well engineered, for one — it’s put up with 15 years of daily use. It’s exactly the right shape for my hand, and it has multiple configurable inputs right where I expect them. There are arrow buttons just above my thumb which I use as forward/backward in browsers, buttons above and below the scroll wheel that I map to Page Up/Page Down, an extra button at almost the apex of the mouse’s back mapped to ⌥⇥ (Option-Tab), and the usual right/left mouse click buttons. Plus the scroll wheel is itself a push-down-to-click button.
Most of these features can be found on one mouse or another, but it’s rare to find them all in one mouse — and next to impossible to find them in a shape and size that feels comfortable to me. I’d occasionally looked at the secondary market, but even used, the MX500 can command three figures. I checked Amazon as I wrote this, and an unused MX500 was listing for two hundred fifty dollars. Unused copies of its successor, the MX510, were selling for even more.
Now, if you were into gaming in the first decade of the 2000s, you may have heard of or used the MX510’s successor, the MX518. Released in 2005, it was basically an MX500/MX510, but branded for gaming, with some optical-sensor upgrades for more tracking precision. The MX518 lasted until 2011, when it was superseded by a different model, which itself was superseded, which et cetera, et cereta, et cetera.
Which brings me to the point of all this. A few weeks ago, after several weeks of sporadic glitches, the scroll wheel on my MX500 almost completely stopped responding to being scrolled. Which maybe doesn’t sound like a big deal, but try going without your scroll wheel for a while. I was surprised to discover how much I relied on it. So, glumly, knowing the model was long out of production and incredibly expensive to buy, I went searching for equivalents.
And that’s when I discovered that Logitech had literally announced less than a week earlier that they were releasing an updated MX518, available for pre-order.
Friends, I have never pre-ordered anything so fast.
This past Thursday afternoon, it arrived. I got it set up and have been working with it since. And I have some impressions.
Physically, the MX518 Legendary (as Logitech has branded it) is 95% a match for my old MX500. It’s ever so slightly smaller, just enough that I can tell but not quite enough to be annoying, odd as that may seem. Otherwise, everything feels like it should. The buttons are crisp and clicky, and right where I expect them. And the scroll wheel… well, it works.
The coloration is different — the surface and buttons are all black, as opposed to the MX500’s black-and-silver two-tone styling. While I miss the two-tone a bit, there’s an upgrade: the smooth black top surface has subtle little sparkles embedded in the paint. Shiny!
The changing of the guard
On the other hand, configuring the mouse was a bit of an odyssey. First off, let me make clear that I have a weird setup, even for a grumpy old Mac user. I plug a circa-2000 Macally original iKey 104-key keyboard into my 2013 MacBook Pro. (Yes, you have sensed a trend here: when I find hardware I really like, I hang onto it like a rabid weasel. Ditto software.) The “extra” keys on the Macally like Page Up, Home, and so on don’t get recognized by a lot of current software. Even the Finder can’t read the keyboard’s function keys properly. I’ve restored their functionality with the entirely excellent BetterTouchTool, but it remains that the keyboard is just odd in its ancientness.
Anyway, I first opened System Preferences and then the Logitech Control Center pane. It couldn’t find the MX518 Legendary at all. So next I opened the (separate) Logitech Options pane, which drives the wireless mouse I use when I travel. It too was unable to find the MX518.
Where my paging functions at?
Some Bing-ing led me to a download for Logitech Gaming Software (hereafter LGS), which I installed. That could see the MX518 just fine. Once I stumbled my way into an understanding of LGS’s UI, I set about trying to configure the MX518’s buttons to do what I wanted.
And could not. In the list of predefined mouse actions that could be assigned to the buttons, precisely none of my desires were listed. No ⌘-arrow combos, no page up or down, not even ⌥⇥ to switch apps. I mean, I guess that’s to be expected: it’s sold as a gaming mouse. LGS has plenty of support for on-the-fly-dee-pee-eye switching and copy-paste and all that. Not so much for document editing and code browsing.
There is a way to assign keyboard combos to buttons, but again, the software could understand precisely none of the combos I wanted to record when I typed them on my Macally. So I went to the MacBook Pro’s built-in keyboard, where I was able to register ⌥⇥, ⌘→, and ⌘←. I could not, however much I tried, register Page Up or Page Down. I pressed Fn, which showed “Fn” in the LGS software, and then pressed the down arrow for Page Down, and as long as I held down both keys, it showed “Page Down”. But as soon as I let go of the down arrow, “Fn” was registered again. No Page Down for me.
Now, recall, this was happening on the laptop’s built-in keyboard. I can’t really blame this one on age of the external Macally. I really think this one might fall on LGS itself; while a 2013 MacBook is old, it’s not that old.
I thought I might be stuck, but I intuited a workaround: I opened the Keyboard Viewer app built into the Finder. With that, I could just click the virtual Page Up and Page Down keys, and LGS registered them without a hiccup. While I was in there, I used it to set the scroll wheel’s middle-button click to trigger Mission Control (F3).
The following key-repeat problem has been fixed and was not the fault of the MX518; see my comment for details on how I resolved it. The one letdown I have is that the buttons don’t appear to repeat keystrokes. So if I hold the button I’ve assigned to Page Down for example, I get exactly one page-down, and that’s it until I release and click the button again. On the MX500, holding down the button assigned to Page Down would just constantly page down until I let go. This was sometimes preferable to scrolling with the scroll wheel, especially for long documents I wanted to very quickly scan for a certain figure or other piece of the page. The same was true for all the buttons: hold it down, and the thing it was configured to do happened repeatedly until you let go.
The MX518 Legendary isn’t doing that. I don’t know if this is an inherent limitation of the mouse, its software, my configuration of it, the interaction of software and operating system, or something else entirely. It’s not an issue forty-nine times out of fifty, but that fiftieth time is annoying.
The other annoyance is one of possibly missed potential. The mouse software has, in keeping with its gaming focus, the ability to set up multiple profiles; that way, you can assign unique actions to the buttons on a per-application basis. I set up a couple of profiles to test it out, but LGS is completely opaque about how to make profiles switch automatically when you switch to an app. I’ll look for an answer online, but it’s annoying that the software promises per-app profiles, and then apparently fails to deliver on that promise.
So after all that, am I happy? Yes. It’s essentially my old mouse, except brand new. My heartfelt thanks to Logitech for bringing this workhorse out of retirement. I look forward to a decade or more with it.