meyerweb.com

Skip to: site navigation/presentation
Skip to: Thoughts From Eric

Finding Unicode

A little while back, I was reading some text when I realized the hyphens didn’t look quite right.  A little too wide, I thought.  Not em-dash wide, but still…wide.  Wide-ish?  But when I copied some of the text into a BBEdit window, they looked just like the hyphens I typed into the document.

Of course, I know Unicode is filled with all manner of symbols and that the appearance of those symbols can vary from one font face to another.  So I changed the font face, made the size really huge, and behold: they were indeed different characters.  At this point, I was really curious about what I’d found.  What exactly was it?  How would I find out?

For the record, here’s the character in question:

Googling “−” and “− Unicode” got me nothing useful.  I knew I could try the Character Viewer in OS X, and eventually I did, but I was wondering if there was a better (read: lazier) solution.  I asked the Twittersphere for advice, and while I don’t know if these solutions are any lazier, here are the best of the suggestions I received.

  • Unicode Lookup, a site that lets you input or paste in any character and get a report on what it is and how one might call it in various encodings.
  • Richard Ishida’s UniView Lite, which does much the same as Unicode Lookup with the caveat that once you’ve input your character, you have to hit the “Chars” button, not the “Search” button.  The latter is apparently how you search Unicode character names for a word or other string, like “dash” or “quot”.
  • UnicodeChecker (OS X), a nice utility that includes a character list pane as well as the ability to type or paste a character into an input and instantly get its gritty details.

Any of those will tell you that the − in question is MINUS SIGN, codepoint 8722 (decimal) / 2212 (UTF-16 hex) / U+2212 (Unicode hex) / et cetera, et cetera.  Did you know it was designated in Unicode 1.1?  Now you do, thanks to UnicodeChecker and this post.  You’re welcome.

Update 2 Mar 12:  Philippe Wittenberg points out in the comments that you can add a UnicodeChecker service.  With that enabled, all you have to do is highlight a character, summon the contextual menu (right-click, for most of us), and have it shown in UnicodeChecker.  Now that’s the kind of laziness I was trying to attain!

27 Responses»

    • #1
    • Comment
    • Thu 1 Mar 2012
    • 1122
    Robin wrote in to say...

    I think I must be the only person that’s specifically used this character in web content as part of an equation instead of a hyphen.

    • #2
    • Comment
    • Thu 1 Mar 2012
    • 1140
    Tim wrote in to say...

    “Punch the keys, for God’s sake!” ~ Finding Forrester (2000) Or was that reference unintended?

    • #3
    • Comment
    • Thu 1 Mar 2012
    • 1142
    Pawel Decowski wrote in to say...

    I’ve known about and used the minus character since I can remember.

    It’s had an HTML entity since, at least, HTML4: −

    • #4
    • Comment
    • Thu 1 Mar 2012
    • 1144
    Pawel Decowski wrote in to say...

    I meant −

    • #5
    • Comment
    • Thu 1 Mar 2012
    • 1152
    Steven wrote in to say...

    http://live.gnome.org/Gucharmap is native on Linux systems

    • #6
    • Comment
    • Thu 1 Mar 2012
    • 1202
    Cristian Tincu wrote in to say...

    I found this jewel several days ago: http://graphemica.com/
    Just paste your strange Unicode insect in the search box and hit Enter and you’ll know what it is.

    • #7
    • Comment
    • Thu 1 Mar 2012
    • 1208
    Paul D. Waite wrote in to say...

    I think the character that you get when you hit the regular hyphen key on most keyboards is technically called “hyphen-minus” or something.

    • #8
    • Comment
    • Thu 1 Mar 2012
    • 1327
    Michael wrote in to say...

    Emacs has a describe-char function. It’s old school, but it works for any encoding.

    • #9
    • Comment
    • Thu 1 Mar 2012
    • 1439
    Rob L. wrote in to say...

    See also http://shapecatcher.com/ — you can actually draw a shape in a <canvas> element, and it finds Unicode characters that are similar to your drawing.

    Different use case, but pretty cool, no?

    • #10
    • Comment
    • Thu 1 Mar 2012
    • 1612
    Michael Haufe wrote in to say...

    Ctr+Shift+K in FireFox to open the web console. Type ‘−’.charCodeAt(0). copy the number, then google unicode 8722. The first result was relevant:

    Took approx. 15 seconds to do.

    • #11
    • Comment
    • Thu 1 Mar 2012
    • 1705
    sam wrote in to say...

    Another Richard Ishida gem http://people.w3.org/rishida/tools/conversion/

    • #12
    • Comment
    • Thu 1 Mar 2012
    • 1742
    David Baron wrote in to say...

    I wrote a Firefox extension called Character Identifier that does something similar. It’s a little clunky, but provides this sort of information quickly.

    • #13
    • Comment
    • Thu 1 Mar 2012
    • 2001
    Michael Zajac wrote in to say...

    How is Character Viewer not the laziest method of all? Pop it up from the ever-present keyboard menu (you have it enabled, don’t you), and drag the character into it. Instant Unicode info!

    Of course, I think you could set up a keyboard shortcut for a UnicodeChecker text service to display info for the selected character.

    • #14
    • Comment
    • Thu 1 Mar 2012
    • 2148
    runoid wrote in to say...

    Terminal:
    unicode −

    • #15
    • Comment
    • Fri 2 Mar 2012
    • 0613
    Aankhen wrote in to say...

    Robin, Tim: Me too!

    Michael: I use describe-char a lot too. The only issue is Emacs’s quirky handling of UTF-8, which gives characters strange names at times. (Particularly with non-Latin scripts.)

    • #16
    • Comment
    • Fri 2 Mar 2012
    • 0915
    Eric Meyer wrote in to say...

    Tim: totally unintentional; I’ve never seen it. I can’t even figure out the connection!

    Rob L.: yep, I love that tool! It’s fun to scribble in the input and see what it returns.

    Michael: I have a lot of trouble successfully selecting and dragging a single character anywhere. What I’d really hoped for was something like a system extension that would add “Identify this character…” to the text-selection contextual menu. Still, the alternatives (which I did explicitly say might not be any lazier) are pretty nice tools for other use cases.

    • #17
    • Comment
    • Fri 2 Mar 2012
    • 0935
    Philippe Wittenbergh wrote in to say...

    Eric Meyer.
    UnicodeChecker has a services menu item (it also pops up in the context menu): select character, right click, ‘Display Character Information’ and it opens UnicodeChecker. You may need to activate it though (System Preferences > Keyboard > Keyboard Shortcuts, under Services). I use it all the time with a custom keyboard shortcut.

    • #18
    • Comment
    • Fri 2 Mar 2012
    • 0944
    Eric Meyer wrote in to say...

    Philippe: THANK YOU! I don’t think I ever would have found that on my own. Now, do you know if there’s a way to rearrange the Services menu?

    • #19
    • Comment
    • Fri 2 Mar 2012
    • 1354
    Pat wrote in to say...

    I use this:

    gist.github.com: letters.py.

    Graphemica is faster but it’s useful to know how to get Python to tell you what character names are, in case you want to process them further.

    • #20
    • Comment
    • Fri 2 Mar 2012
    • 1911
    Philippe Wittenbergh wrote in to say...

    Eric,
    What OS are you running ? On 10.6 and 10.7 you can enable or disable services in the Keyboards pane of System Preferences. On 10.5, not so much, although I seem to remember an utility that allow you to do have some more control (check macupdate.com for ‘services’ maybe?).

    • #21
    • Comment
    • Mon 5 Mar 2012
    • 1536
    Brent J. Nordquist wrote in to say...

    Eric / Philippe:

    I’m running Lion (10.7) and I can’t find anything in the System Preferences > Keyboard pane for UnicodeChecker. Which section is it supposed to be under (Internet, Searching, Text, Development, …) and what is the item called? Do you have to do anything else to make UnicodeChecker show up as an option there? TIA

    • #22
    • Comment
    • Tue 6 Mar 2012
    • 0031
    Philippe Wittenbergh wrote in to say...

    Brent,

    In the keyboard shortcuts pane, under Services, the UnicodeChecker services are listed under ‘Text’. On my 10.7 machine, they are the first ones listed (various ‘Convert…’, ‘Display Character information’, ‘HTML entities → …’ etc, 16 in total), ymmv. They should be available without doing anything (except – maybe – running UnicodeChecker at least once, I think that is needed for the OS to know about them). If they don’t show up, log out of your current account, and log back in.
    (I just checked under a fresh OS X account and they are all disabled by default, but they are present)

    • #23
    • Comment
    • Tue 6 Mar 2012
    • 1453
    Brent J. Nordquist wrote in to say...

    Thanks for the reply! Well, I did run UnicodeChecker, and I tried logging out and back in, but I don’t see them. Under ‘Text’ I have these four ‘Convert’ options, though they’re long enough that I can’t fully read them:

    Convert Selected Simplifi…
    Convert Selected Simplifi…
    Convert Selected Traditio…
    Convert Selected Traditio…

    Those four are all checked (enabled). And then underneath those I have these, but none that start with ‘Display Character’ or ‘HTML’

    Create Collection From Text
    Create Font Library From…
    Make New Sticky Note
    New TextEdit Window Co…
    Oopen man Page in Terminal
    Search man Pages in Ter…
    Show Address in Google…
    Summarize

    and I don’t get any new options in the right-click context menu (that was the part I was really interested in). I don’t see any relevant options in UnicodeChecker Preferences. Not sure how I would get these to show up.

    • #24
    • Comment
    • Wed 7 Mar 2012
    • 1142
    Eric Meyer wrote in to say...

    Hey, Brent. As a disclaimer, I’m still using Snow Leopard, so I don’t know if this is all invalidated by Lion or not.

    After I ran UnicodeChecker, I did as Phillippe recommended and went into the Keyboard preferences. Under “Text” in the “Services” section, I have a whole bunch of “Convert…” entries. Right after them is “Display Character Information”, and right after that is “HTML Entities → Unicode”. My guess is that what actually shows up in “Services” depends greatly on the applications you’re run over the lifetime of the system. Weirdly enough, the list doesn’t seem to be alphabetical by service name, so which makes finding anything in the list more of a challenge.

    Sorry I can’t provide any more guidance than that! You’d think there would be an easier way to go about all this.

    • #25
    • Comment
    • Wed 7 Mar 2012
    • 1403
    Brent J. Nordquist wrote in to say...

    Thanks again, to you both, for your replies. I looked at every entry in “Text” regardless of order, in fact I checked all the other sections too, but there’s nothing that starts with “Display Character” or “HTML”.

    There must be something different with my box. I’ve left a note on the UnicodeChecker site and if I figure anything out I’ll follow up here. I would love to be able to highlight a character and right-click to run UnicodeChecker on it immediately, without some copy+start-app+paste steps.

    • #26
    • Comment
    • Fri 9 Mar 2012
    • 0915
    n wrote in to say...

    Wikipedia is also quite good in finding unicode characters and shows some information how it should be used and other similar characters.

    • #27
    • Comment
    • Sun 18 Mar 2012
    • 1644
    Richard Ishida wrote in to say...

    Eric, fwiw, you may want to try my String Analyser too at http://rishida.net/tools/analysestring/?list=%E2%88%92. You can paste/type any number of characters in the input field, top right.

    (Of course, there’s also the full version of UniView, which provides additional information not in the lite version – such as the date a character appeared in Unicode. See http://rishida.net/scripts/uniview/?char=2212)

    There are two advantages to the apps at the ends of these links: (a) you don’t have to download anything, and it works on all platforms, and (b) you can see what the character looks like, even if you don’t have a font that supports it, eg. http://rishida.net/tools/analysestring/index.php?list=%E1%AC%AB%E1%AC%A6%E1%AC%84

Leave a Comment

Line and paragraph breaks automatic, e-mail address required but never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>



Remember to encode character entities if you're posting markup examples! Management reserves the right to edit or remove any comment—especially those that are abusive, irrelevant to the topic at hand, or made by anonymous posters—although honestly, most edits are a matter of fixing mangled markup. Thus the note about encoding your entities. If you're satisfied with what you've written, then go ahead...


March 2012
SMTWTFS
February April
 123
45678910
11121314151617
18192021222324
25262728293031

Sidestep

Feeds

Extras