I thought that would fix everything, but it didn’t. Some of our URLs started causing Apache to explode, with an unexpected 404 “NOT FOUND” error. This link hated me: http://www.wikiDOMO.com/toronto_on/results/Caf√©I Googled around for a good 4 hours, trying to find something about mod_rewrite, UTF-8, accented character URIs, internationalization, etc. I found lots, but nothing helped. I even enlisted the help of
Chris Hartjes, Julian Simpson, and Jeff Kolesnikowicz, but we all came up empty-handed… so I went back to basics. Modifying all the RewriteRules one by one (we have 129 lines of them). Eventually I figured it out.UTF-8 characters are not part of the a-zA-Z character set, so many of our re-write rules now failed. To fix it, I simply had to change them from this ([a-zA-Z0-9_-]_) To this (._*) Period means “any character”, and * means as many times as you like. A few key articles: - The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
- Portable php-mysql connection charset fix
- MySQL and UTF-8 at WACT -
- Turning MySQL data in Latin1 to UTF-8
Comments from my old blog:
Juan said: you’re in for a lot of surprises if you continue playing with UTF-8…. watch out for most string functions, usually they’ll have a mb_* equivalent.
I’ve been “enjoying” utf-8 programming since I joined this company. at 2008-11-24 18:28:25
Lex said: You just saved me your 4 hours looking for a solution. Thanks a lot! at 2009-04-12 15:02:19
Ankzu said: You just made two days worth of headaches disappear :D at 2012-08-24 18:56:23
aggregate 35 English language feeds
aggregate a few Russian feeds
translate them into English AND create a super-feed out of the union of the English and (previously) Russian feeds
filter out all the feed items that don’t match my ‘wanted’ keywords (having to do with UFO sightings, the paranormal, etc)
filter out all the feed items that DO match my ‘unwanted’ keywords (such as ‘illegal alien’, and ‘hoax’)
sort the feed items by publication date descending
remove duplicates based on the item’s link URL
extract keywords from each item’s description, and append them to each item as an extra “keywords” attribute
extract location information from each item’s content (mentions of cities, etc), translate it into latitudes & longitudes, and append them to each item as an extra “location” attribute
publish it as a native-PHP array, for easy consumption by file_get_contents($url)
Note: My pipe is automatically also available as RSS, JSON, & KML
I’m not even kidding. One Hour. It’s insane. There’s one little bug in it somewhere, having to do with ATOM feeds not having an item.description, but it still works. In fact, you can see it working here: -
as an RSS feed
- as JSON
- as native PHP
- as KML
- embedded in Yahoo’s interface. Click the [LIST] tab.
Let me know what you think in the comments on this blog post!
Things are going great…I should be in London, Ontario sometime for a meeting and hopefully we can get together for a pint!!
Naj. at 2009-01-14 22:24:36
Jen said: Hi Derek! I am surfing taking a break from report cards…I am so sorry to hear your grandma passed. As you know, I really respected her and thought she was a cool woman! On another note, when did you move to town? Are you liking it? What part of town are you and your lady in??? Be well, jen at 2009-03-09 01:13:02