Snipe.Net - Geeky Stuff
Twitter
Currently: @alphabitch we still love you baby in reply to alphabitch 5 hrs ago

Fixing Curly Quotes and Em Dashes in PHP

The curly quotes, or “smart quotes” generated by Microsoft Word and other applications can be a real headache to developers. If you’ve built an administration area for your content publishers, and the publishers frequently compose their posts in Word and then copy+paste into your form to publish to the web, you may run into the situation where the curly quotes are replaced by your browser’s version of an unrecognized symbol, often a question mark. This can be particularly frustrating when Word-generated characters such as these curly quotes or em dashes break content-generated XML feeds, even after you’ve been careful enough to convert “normal” HTML special characters so that your XML would be valid. Fortunately, there is an easy workaround.


Rather than try to convince your publishers to stop using Word to compose their content, the easier (and more effective) solution will be to replace the curly quotes with “normal” quotes before the data is inserted into the database.

The function below will convert curly quotes and em dashes into standard quotes and dashes “-”. If you’ve got a handful of classes or functions that you routinely use as part of your data scrubbing process (to clean data before it gets sent to the server), you may want to include this function in that group, that way you don’t ever have to think about it again.

function convert_smart_quotes($string)
{
$search = array(chr(145),
chr(146),
chr(147),
chr(148),
chr(151));

$replace = array("'",
"'",
'"',
'"',
'-');

return str_replace($search, $replace, $string);
}

Also check out:

If you think this article kicked ass, subscribe to the RSS feed or follow me on Twitter! Share with your friends, or leave a comment below (or better still, do both!) My entire concept of self-worth is in your hands, so that makes you kind of a big deal. Srsly.

This entry was posted on Thursday, December 11th, 2008 at 10:23 am and is filed under PHP/mySQL. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
  • http://realm3.com/ Brian D.

    Couldn’t you also just make sure you’re using Unicode? If you’re both receiving data in Unicode and putting it out to the browser in Unicode, then everything should work properly. (It’s possible I’m misunderstanding how this works, since I’m no expert on Unicode).

  • http://realm3.com/ Brian D.

    Couldn’t you also just make sure you’re using Unicode? If you’re both receiving data in Unicode and putting it out to the browser in Unicode, then everything should work properly. (It’s possible I’m misunderstanding how this works, since I’m no expert on Unicode).

  • http://www.snipe.net snipe

    You probably could, but in some cases (existing system, etc) there may be reasons why you can’t just switch to Unicode. Also, for me, I prefer to clean out characters like that before the data hits the database. If the system changes, or we decide to no longer show curly quotes (but now have a database full of them), this can create problems. I’m a bit of a purist in that respect – I’d rather have clean data and then decide how I want to display it, so that its standard regardless of the encoding, style, etc.

  • http://www.snipe.net snipe

    You probably could, but in some cases (existing system, etc) there may be reasons why you can’t just switch to Unicode. Also, for me, I prefer to clean out characters like that before the data hits the database. If the system changes, or we decide to no longer show curly quotes (but now have a database full of them), this can create problems. I’m a bit of a purist in that respect – I’d rather have clean data and then decide how I want to display it, so that its standard regardless of the encoding, style, etc.

  • hel

    Great Post

  • hel

    Great Post

  • Don De.

    I don’t get it. How do I use this function? Do I use it on the page where the text is? Do I say “” etc. Sorry I’m so stupid.

    Don

  • Don De.

    I don’t get it. How do I use this function? Do I use it on the page where the text is? Do I say “” etc. Sorry I’m so stupid.

    Don

  • http://www.snipe.net snipe

    @Don – you could either include the function on the same page as the script where your text would be displayed, or within a functions file that you include in that file. Check out this tutorial on using functions in PHP for more info: http://www.php-mysql-tutorial.com/wikis/php-tutorial/php-functions.aspx

  • http://www.snipe.net snipe

    @Don – you could either include the function on the same page as the script where your text would be displayed, or within a functions file that you include in that file. Check out this tutorial on using functions in PHP for more info: http://www.php-mysql-tutorial.com/wikis/php-tutorial/php-functions.aspx

  • Scott

    There's no reason to stop with just smart quotes. You can fix all manner of illegal characters with the following:

    $allEntities = get_html_translation_table(HTML_ENTITIES, ENT_NOQUOTES);
    $specialEntities = get_html_translation_table(HTML_SPECIALCHARS, ENT_NOQUOTES);
    $noTags = array_diff($allEntities, $specialEntities);

    And, that will leave tags alone, assuming that's what you want.
    $valid = strtr($invalid, $noTags);

blog comments powered by Disqus