Finding Unicode

Published 13 years, 4 months past

A little while back, I was reading some text when I realized the hyphens didn’t look quite right. A little too wide, I thought. Not em-dash wide, but still…wide. Wide-ish? But when I copied some of the text into a BBEdit window, they looked just like the hyphens I typed into the document.

Of course, I know Unicode is filled with all manner of symbols and that the appearance of those symbols can vary from one font face to another. So I changed the font face, made the size really huge, and behold: they were indeed different characters. At this point, I was really curious about what I’d found. What exactly was it? How would I find out?

For the record, here’s the character in question:

−

Googling “−” and “− Unicode” got me nothing useful. I knew I could try the Character Viewer in OS X, and eventually I did, but I was wondering if there was a better (read: lazier) solution. I asked the Twittersphere for advice, and while I don’t know if these solutions are any lazier, here are the best of the suggestions I received.

Unicode Lookup, a site that lets you input or paste in any character and get a report on what it is and how one might call it in various encodings.
Richard Ishida’s UniView Lite, which does much the same as Unicode Lookup with the caveat that once you’ve input your character, you have to hit the “Chars” button, not the “Search” button. The latter is apparently how you search Unicode character names for a word or other string, like “dash” or “quot”.
UnicodeChecker (OS X), a nice utility that includes a character list pane as well as the ability to type or paste a character into an input and instantly get its gritty details.

Any of those will tell you that the − in question is MINUS SIGN, codepoint 8722 (decimal) / 2212 (UTF-16 hex) / U+2212 (Unicode hex) / et cetera, et cetera. Did you know it was designated in Unicode 1.1? Now you do, thanks to UnicodeChecker and this post. You’re welcome.

Update 2 Mar 12: Philippe Wittenberg points out in the comments that you can add a UnicodeChecker service. With that enabled, all you have to do is highlight a character, summon the contextual menu (right-click, for most of us), and have it shown in UnicodeChecker. Now that’s the kind of laziness I was trying to attain!

Finding Unicode was published on Thursday, March 1st, 2012.
It was assigned to the Tools category.
There have been twenty-eight replies.

Comments (28)

I think I must be the only person that’s specifically used this character in web content as part of an equation instead of a hyphen.

Signed,
Robin
Thursday, March 1st, 2012 11:22am
“Punch the keys, for God’s sake!” ~ Finding Forrester (2000) Or was that reference unintended?

Signed,
Tim
Thursday, March 1st, 2012 11:40am
I’ve known about and used the minus character since I can remember.

It’s had an HTML entity since, at least, HTML4: −

Signed,
Pawel Decowski
Thursday, March 1st, 2012 11:42am
I meant −

Signed,
Pawel Decowski
Thursday, March 1st, 2012 11:44am
http://live.gnome.org/Gucharmap is native on Linux systems

Signed,
Steven
Thursday, March 1st, 2012 11:52am
I found this jewel several days ago: http://graphemica.com/
Just paste your strange Unicode insect in the search box and hit Enter and you’ll know what it is.

Signed,
Cristian Tincu
Thursday, March 1st, 2012 12:02pm
I think the character that you get when you hit the regular hyphen key on most keyboards is technically called “hyphen-minus” or something.

Signed,
Paul D. Waite
Thursday, March 1st, 2012 12:08pm
Emacs has a describe-char function. It’s old school, but it works for any encoding.

Signed,
Michael
Thursday, March 1st, 2012 1:27pm
See also http://shapecatcher.com/ — you can actually draw a shape in a <canvas> element, and it finds Unicode characters that are similar to your drawing.

Different use case, but pretty cool, no?

Signed,
Rob L.
Thursday, March 1st, 2012 2:39pm
Ctr+Shift+K in FireFox to open the web console. Type ‘−’.charCodeAt(0). copy the number, then google unicode 8722. The first result was relevant:

Took approx. 15 seconds to do.

Signed,
Michael Haufe
Thursday, March 1st, 2012 4:12pm
Another Richard Ishida gem http://people.w3.org/rishida/tools/conversion/

Signed,
sam
Thursday, March 1st, 2012 5:05pm
I wrote a Firefox extension called Character Identifier that does something similar. It’s a little clunky, but provides this sort of information quickly.

Signed,
David Baron
Thursday, March 1st, 2012 5:42pm
How is Character Viewer not the laziest method of all? Pop it up from the ever-present keyboard menu (you have it enabled, don’t you), and drag the character into it. Instant Unicode info!

Of course, I think you could set up a keyboard shortcut for a UnicodeChecker text service to display info for the selected character.

Signed,
Michael Zajac
Thursday, March 1st, 2012 8:01pm
Terminal:
unicode −

Signed,
runoid
Thursday, March 1st, 2012 9:48pm
Robin, Tim: Me too!

Michael: I use describe-char a lot too. The only issue is Emacs’s quirky handling of UTF-8, which gives characters strange names at times. (Particularly with non-Latin scripts.)

Signed,
Aankhen
Friday, March 2nd, 2012 6:13am
Tim: totally unintentional; I’ve never seen it. I can’t even figure out the connection!

Rob L.: yep, I love that tool! It’s fun to scribble in the input and see what it returns.

Michael: I have a lot of trouble successfully selecting and dragging a single character anywhere. What I’d really hoped for was something like a system extension that would add “Identify this character…” to the text-selection contextual menu. Still, the alternatives (which I did explicitly say might not be any lazier) are pretty nice tools for other use cases.

Signed,
Eric Meyer
Friday, March 2nd, 2012 9:15am
Eric Meyer.
UnicodeChecker has a services menu item (it also pops up in the context menu): select character, right click, ‘Display Character Information’ and it opens UnicodeChecker. You may need to activate it though (System Preferences > Keyboard > Keyboard Shortcuts, under Services). I use it all the time with a custom keyboard shortcut.

Signed,
Philippe Wittenbergh
Friday, March 2nd, 2012 9:35am
Philippe: THANK YOU! I don’t think I ever would have found that on my own. Now, do you know if there’s a way to rearrange the Services menu?

Signed,
Eric Meyer
Friday, March 2nd, 2012 9:44am
I use this:

gist.github.com: letters.py.

Graphemica is faster but it’s useful to know how to get Python to tell you what character names are, in case you want to process them further.

Signed,
Pat
Friday, March 2nd, 2012 1:54pm
Eric,
What OS are you running ? On 10.6 and 10.7 you can enable or disable services in the Keyboards pane of System Preferences. On 10.5, not so much, although I seem to remember an utility that allow you to do have some more control (check macupdate.com for ‘services’ maybe?).

Signed,
Philippe Wittenbergh
Friday, March 2nd, 2012 7:11pm
Eric / Philippe:

I’m running Lion (10.7) and I can’t find anything in the System Preferences > Keyboard pane for UnicodeChecker. Which section is it supposed to be under (Internet, Searching, Text, Development, …) and what is the item called? Do you have to do anything else to make UnicodeChecker show up as an option there? TIA

Signed,
Brent J. Nordquist
Monday, March 5th, 2012 3:36pm
Brent,

In the keyboard shortcuts pane, under Services, the UnicodeChecker services are listed under ‘Text’. On my 10.7 machine, they are the first ones listed (various ‘Convert…’, ‘Display Character information’, ‘HTML entities → …’ etc, 16 in total), ymmv. They should be available without doing anything (except – maybe – running UnicodeChecker at least once, I think that is needed for the OS to know about them). If they don’t show up, log out of your current account, and log back in.
(I just checked under a fresh OS X account and they are all disabled by default, but they are present)

Signed,
Philippe Wittenbergh
Tuesday, March 6th, 2012 12:31am
Thanks for the reply! Well, I did run UnicodeChecker, and I tried logging out and back in, but I don’t see them. Under ‘Text’ I have these four ‘Convert’ options, though they’re long enough that I can’t fully read them:

Convert Selected Simplifi…
Convert Selected Simplifi…
Convert Selected Traditio…
Convert Selected Traditio…

Those four are all checked (enabled). And then underneath those I have these, but none that start with ‘Display Character’ or ‘HTML’

Create Collection From Text
Create Font Library From…
Make New Sticky Note
New TextEdit Window Co…
Oopen man Page in Terminal
Search man Pages in Ter…
Show Address in Google…
Summarize

and I don’t get any new options in the right-click context menu (that was the part I was really interested in). I don’t see any relevant options in UnicodeChecker Preferences. Not sure how I would get these to show up.

Signed,
Brent J. Nordquist
Tuesday, March 6th, 2012 2:53pm
Hey, Brent. As a disclaimer, I’m still using Snow Leopard, so I don’t know if this is all invalidated by Lion or not.

After I ran UnicodeChecker, I did as Phillippe recommended and went into the Keyboard preferences. Under “Text” in the “Services” section, I have a whole bunch of “Convert…” entries. Right after them is “Display Character Information”, and right after that is “HTML Entities → Unicode”. My guess is that what actually shows up in “Services” depends greatly on the applications you’re run over the lifetime of the system. Weirdly enough, the list doesn’t seem to be alphabetical by service name, so which makes finding anything in the list more of a challenge.

Sorry I can’t provide any more guidance than that! You’d think there would be an easier way to go about all this.

Signed,
Eric Meyer
Wednesday, March 7th, 2012 11:42am
Thanks again, to you both, for your replies. I looked at every entry in “Text” regardless of order, in fact I checked all the other sections too, but there’s nothing that starts with “Display Character” or “HTML”.

There must be something different with my box. I’ve left a note on the UnicodeChecker site and if I figure anything out I’ll follow up here. I would love to be able to highlight a character and right-click to run UnicodeChecker on it immediately, without some copy+start-app+paste steps.

Signed,
Brent J. Nordquist
Wednesday, March 7th, 2012 2:03pm
Wikipedia is also quite good in finding unicode characters and shows some information how it should be used and other similar characters.

Signed,
n
Friday, March 9th, 2012 9:15am
Eric, fwiw, you may want to try my String Analyser too at http://rishida.net/tools/analysestring/?list=%E2%88%92. You can paste/type any number of characters in the input field, top right.

(Of course, there’s also the full version of UniView, which provides additional information not in the lite version – such as the date a character appeared in Unicode. See http://rishida.net/scripts/uniview/?char=2212)

There are two advantages to the apps at the ends of these links: (a) you don’t have to download anything, and it works on all platforms, and (b) you can see what the character looks like, even if you don’t have a font that supports it, eg. http://rishida.net/tools/analysestring/index.php?list=%E1%AC%AB%E1%AC%A6%E1%AC%84

Signed,
Richard Ishida
Sunday, March 18th, 2012 4:44pm
To see fancy code copy it to Winword, e. g.
»Sarantanӕ Vallis deſcriptio, & incolarū mores«.
Now mark the ū, for example. Then type Alt-C (C as for code). You’ll get:
»Sarantanӕ Vallis deſcriptio, & incolar016B mores«.
Voilà. You can go the reverse direction as well, typing in unicode, marking it, and Ctrl-C gets you the character, in this case LATIN SMALL LETTER U WITH MACRON (used to shorten um or un in olden texts). Example.)

Signed,
Fritz Jörn
Tuesday, December 8th, 2015 4:01am

Finding Unicode

Comments (28)

Add Your Thoughts

Comment Preview

Browse the Archive

Feeds