Finding UnicodePublished 11 years, 3 months past
A little while back, I was reading some text when I realized the hyphens didn’t look quite right. A little too wide, I thought. Not em-dash wide, but still…wide. Wide-ish? But when I copied some of the text into a BBEdit window, they looked just like the hyphens I typed into the document.
Of course, I know Unicode is filled with all manner of symbols and that the appearance of those symbols can vary from one font face to another. So I changed the font face, made the size really huge, and behold: they were indeed different characters. At this point, I was really curious about what I’d found. What exactly was it? How would I find out?
For the record, here’s the character in question:
Googling “−” and “− Unicode” got me nothing useful. I knew I could try the Character Viewer in OS X, and eventually I did, but I was wondering if there was a better (read: lazier) solution. I asked the Twittersphere for advice, and while I don’t know if these solutions are any lazier, here are the best of the suggestions I received.
- Unicode Lookup, a site that lets you input or paste in any character and get a report on what it is and how one might call it in various encodings.
- Richard Ishida’s UniView Lite, which does much the same as Unicode Lookup with the caveat that once you’ve input your character, you have to hit the “Chars” button, not the “Search” button. The latter is apparently how you search Unicode character names for a word or other string, like “dash” or “quot”.
- UnicodeChecker (OS X), a nice utility that includes a character list pane as well as the ability to type or paste a character into an input and instantly get its gritty details.
Any of those will tell you that the − in question is MINUS SIGN, codepoint 8722 (decimal) / 2212 (UTF-16 hex) / U+2212 (Unicode hex) / et cetera, et cetera. Did you know it was designated in Unicode 1.1? Now you do, thanks to UnicodeChecker and this post. You’re welcome.
Update 2 Mar 12: Philippe Wittenberg points out in the comments that you can add a UnicodeChecker service. With that enabled, all you have to do is highlight a character, summon the contextual menu (right-click, for most of us), and have it shown in UnicodeChecker. Now that’s the kind of laziness I was trying to attain!
I think I must be the only person that’s specifically used this character in web content as part of an equation instead of a hyphen.
“Punch the keys, for God’s sake!” ~ Finding Forrester (2000) Or was that reference unintended?
I’ve known about and used the minus character since I can remember.
It’s had an HTML entity since, at least, HTML4: −
I meant −
http://live.gnome.org/Gucharmap is native on Linux systems
I found this jewel several days ago: http://graphemica.com/
Just paste your strange Unicode insect in the search box and hit Enter and you’ll know what it is.
I think the character that you get when you hit the regular hyphen key on most keyboards is technically called “hyphen-minus” or something.
Emacs has a describe-char function. It’s old school, but it works for any encoding.
See also http://shapecatcher.com/ — you can actually draw a shape in a
<canvas>element, and it finds Unicode characters that are similar to your drawing.
Different use case, but pretty cool, no?
Ctr+Shift+K in FireFox to open the web console. Type ‘−’.charCodeAt(0). copy the number, then google unicode 8722. The first result was relevant:
Took approx. 15 seconds to do.
Another Richard Ishida gem http://people.w3.org/rishida/tools/conversion/
I wrote a Firefox extension called Character Identifier that does something similar. It’s a little clunky, but provides this sort of information quickly.
How is Character Viewer not the laziest method of all? Pop it up from the ever-present keyboard menu (you have it enabled, don’t you), and drag the character into it. Instant Unicode info!
Of course, I think you could set up a keyboard shortcut for a UnicodeChecker text service to display info for the selected character.
Robin, Tim: Me too!
Michael: I use
describe-chara lot too. The only issue is Emacs’s quirky handling of UTF-8, which gives characters strange names at times. (Particularly with non-Latin scripts.)
Tim: totally unintentional; I’ve never seen it. I can’t even figure out the connection!
Rob L.: yep, I love that tool! It’s fun to scribble in the input and see what it returns.
Michael: I have a lot of trouble successfully selecting and dragging a single character anywhere. What I’d really hoped for was something like a system extension that would add “Identify this character…” to the text-selection contextual menu. Still, the alternatives (which I did explicitly say might not be any lazier) are pretty nice tools for other use cases.
UnicodeChecker has a services menu item (it also pops up in the context menu): select character, right click, ‘Display Character Information’ and it opens UnicodeChecker. You may need to activate it though (System Preferences > Keyboard > Keyboard Shortcuts, under Services). I use it all the time with a custom keyboard shortcut.
Philippe: THANK YOU! I don’t think I ever would have found that on my own. Now, do you know if there’s a way to rearrange the Services menu?
I use this:
Graphemica is faster but it’s useful to know how to get Python to tell you what character names are, in case you want to process them further.
What OS are you running ? On 10.6 and 10.7 you can enable or disable services in the Keyboards pane of System Preferences. On 10.5, not so much, although I seem to remember an utility that allow you to do have some more control (check macupdate.com for ‘services’ maybe?).
Eric / Philippe:
I’m running Lion (10.7) and I can’t find anything in the System Preferences > Keyboard pane for UnicodeChecker. Which section is it supposed to be under (Internet, Searching, Text, Development, …) and what is the item called? Do you have to do anything else to make UnicodeChecker show up as an option there? TIA
In the keyboard shortcuts pane, under Services, the UnicodeChecker services are listed under ‘Text’. On my 10.7 machine, they are the first ones listed (various ‘Convert…’, ‘Display Character information’, ‘HTML entities → …’ etc, 16 in total), ymmv. They should be available without doing anything (except – maybe – running UnicodeChecker at least once, I think that is needed for the OS to know about them). If they don’t show up, log out of your current account, and log back in.
(I just checked under a fresh OS X account and they are all disabled by default, but they are present)
Thanks for the reply! Well, I did run UnicodeChecker, and I tried logging out and back in, but I don’t see them. Under ‘Text’ I have these four ‘Convert’ options, though they’re long enough that I can’t fully read them:
Convert Selected Simplifi…
Convert Selected Simplifi…
Convert Selected Traditio…
Convert Selected Traditio…
Those four are all checked (enabled). And then underneath those I have these, but none that start with ‘Display Character’ or ‘HTML’
Create Collection From Text
Create Font Library From…
Make New Sticky Note
New TextEdit Window Co…
Oopen man Page in Terminal
Search man Pages in Ter…
Show Address in Google…
and I don’t get any new options in the right-click context menu (that was the part I was really interested in). I don’t see any relevant options in UnicodeChecker Preferences. Not sure how I would get these to show up.
Hey, Brent. As a disclaimer, I’m still using Snow Leopard, so I don’t know if this is all invalidated by Lion or not.
After I ran UnicodeChecker, I did as Phillippe recommended and went into the Keyboard preferences. Under “Text” in the “Services” section, I have a whole bunch of “Convert…” entries. Right after them is “Display Character Information”, and right after that is “HTML Entities → Unicode”. My guess is that what actually shows up in “Services” depends greatly on the applications you’re run over the lifetime of the system. Weirdly enough, the list doesn’t seem to be alphabetical by service name, so which makes finding anything in the list more of a challenge.
Sorry I can’t provide any more guidance than that! You’d think there would be an easier way to go about all this.
Thanks again, to you both, for your replies. I looked at every entry in “Text” regardless of order, in fact I checked all the other sections too, but there’s nothing that starts with “Display Character” or “HTML”.
There must be something different with my box. I’ve left a note on the UnicodeChecker site and if I figure anything out I’ll follow up here. I would love to be able to highlight a character and right-click to run UnicodeChecker on it immediately, without some copy+start-app+paste steps.
Wikipedia is also quite good in finding unicode characters and shows some information how it should be used and other similar characters.
Eric, fwiw, you may want to try my String Analyser too at http://rishida.net/tools/analysestring/?list=%E2%88%92. You can paste/type any number of characters in the input field, top right.
(Of course, there’s also the full version of UniView, which provides additional information not in the lite version – such as the date a character appeared in Unicode. See http://rishida.net/scripts/uniview/?char=2212)
There are two advantages to the apps at the ends of these links: (a) you don’t have to download anything, and it works on all platforms, and (b) you can see what the character looks like, even if you don’t have a font that supports it, eg. http://rishida.net/tools/analysestring/index.php?list=%E1%AC%AB%E1%AC%A6%E1%AC%84
To see fancy code copy it to Winword, e. g.
»Sarantanӕ Vallis deſcriptio, & incolarū mores«.
Now mark the ū, for example. Then type Alt-C (C as for code). You’ll get:
»Sarantanӕ Vallis deſcriptio, & incolar016B mores«.
Voilà. You can go the reverse direction as well, typing in unicode, marking it, and Ctrl-C gets you the character, in this case LATIN SMALL LETTER U WITH MACRON (used to shorten um or un in olden texts). Example.)