Migrating a Rails app from MySQL with Spatial Extensions to PostgreSQL with PostGIS

There’s much documentation about converting a Rails app from MySQL to PostgreSQL (if you haven’t written any SQL in your code, you can often just use Taps), but very little about converting a Rails app (or any app, actually) from using MySQL Spatial Extensions to using PostGIS.

In my case, I’m working on a Rails 2.3.14 app using the spatial_adapter, geokit, geokit-rails and georuby gems. If you’re using Rails 3 with rgeo and activerecord-postgis-adapter, these tips will probably help you, too.

Code changes

If you wrote any SQL in your code (or if you’re not sure), you may have to update it to work with PostgreSQL. Code snippets to search for:

Read things to know when moving from MySQL to PostgreSQL to learn that, for example:

  • PostgreSQL uses single quotes to quote values, never double quotes.
  • PostgreSQL use double quotes, not backticks, to quote system identifiers.
  • PostgreSQL is case-sensitive for string comparisons.
  • All non-aggregate SELECT columns must appear in the GROUP BY clause.
  • An INNER JOIN without an ON clause must become a CROSS JOIN.
  • Column aliases in SELECT clauses require the AS keyword.

Using Jinja2 in Django with Coffin: The Easy Way

There is a lot of old, incorrect and just plain bad documentation on how to use the Jinja2 templating language in Django. Here’s how to do it The Easy Way.

  1. Install Jinja2, Pygments and Coffin with easy_install Jinja2 Pygments Coffin or by editing your Pip requirements.txt file.

  2. In views.py, change:

    from django.shortcuts import render_to_response
    

    to:

    from coffin.shortcuts import render_to_response
    

    as described in the Coffin docs.

  3. Update your templates to use Jinja2 syntax. Change block.super to super().

  4. If you use {% compress %} tags, add:

    JINJA2_EXTENSIONS = [
      'compressor.contrib.jinja2ext.CompressorExtension',
    ]
    

    to your settings.py as described here.

Fix Facebook Like button for IE6 and IE7

You may have noticed in your server logs or analytics a significant number of requests to pages on your site with the query string ?fb_xd_fragment=. This behaviour is a symptom of the Facebook Like button working improperly in IE6 and IE7. In order to correct this bug, you must set a channelUrl for your Facebook Like button, following Facebook’s documentation on FB.init.

window.fbAsyncInit = function() {
  FB.init({
    appId  : 'YOUR_APP_ID',
    status : true,
    cookie : true,
    oauth  : true,
    xfbml  : true,
    channelUrl  : 'YOUR_CHANNEL_URL'
  });
};
(function(d, s, id) {
  var js, fjs = d.getElementsByTagName(s)[0];
  if (d.getElementById(id)) {return;}
  js = d.createElement(s); js.id = id;
  js.src = "//connect.facebook.net/en_US/all.js#appId=YOUR_APP_ID&xfbml=1";
  fjs.parentNode.insertBefore(js, fjs);
}(document, 'script', 'facebook-jssdk'));

A few notes on the above: As documented by Facebook, “the function assigned to window.fbAsyncInit is run as soon as the Facebook SDK is loaded.” Importantly, YOUR_CHANNEL_URL should be a URL that returns a single line of HTML:

<script src="http://connect.facebook.net/en_US/all.js"></script>

Google News Search parameters (The Missing Manual)

There is a severe lack of documentation of Google News Search’s query parameters. Here is the most comprehensive list I am aware of. A number of these overlap with Google Search’s query parameters. I exclude any Google Search parameters that don’t make sense in the context of Google News Search. These parameters are primarily of interest to anyone querying Google News for articles using its RSS and Atom output formats.

Search

The following query parameters control Google’s interpretation of keywords. Although as_epq may be attractive for some use cases, it breaks sorting by date (scoring=n) when output is “rss” or “atom”, so stick with q for consistent sorting. Update: It seems the as_* parameters don’t work when output is “rss” or “atom”.

  • q=TERMS retrieve results with all of the terms. Alias: as_q
  • as_epq=TERMS retrieve results with the exact phrase
  • as_oq=TERMS retrieve results with at least one of the words
  • as_eq=TERMS retrieve results without the terms
  • as_occt=SECTION retrieve results where my terms occur…
    • any anywhere in the article
    • title in the headline of the article (same as using “allintitle:” in q)
    • body in the body of the article (same as using “allintext:” in q)
    • url in the URL of the article (same as using “allinurl:” in q)

Filter

The following query parameters filter results by Google News edition, topic, location, date, news source or author.

  • ned=EDITION limits results to a specific edition. Possible values: editions
  • topic=TOPIC limits results to a specific topic. Possible values: topics
  • geo=LOCATION limits results to a specific location
    • detect_metro_area determines location based on IP
    • a city, state, country, or US zip code
  • as_drrb=q retrieves articles added by Google News…
    • as_qdr=a anytime
    • as_qdr=h last hour
    • as_qdr=d last day
    • as_qdr=w past week
    • as_qdr=m past month
    • as_qdr=y past year
  • as_drrb=b retrieves articles added by Google News between…
    • as_minm=NUM minimum month. Possible values: [1, 12]
    • as_mind=NUM minimum day. Possible values: [1, 31]
    • as_maxm=NUM maximum month. Possible values: [1, 12]
    • as_maxd=NUM maximum day. Possible values: [1, 31]
  • as_nsrc=SOURCE limits results to a specific news source (same as using “source:” in q)
  • as_nloc=LOCATION limits results to news sources from a specific location (same as using “location:” in q)
  • as_author limits results to a specific author (same as using “author:” in q)

The as_ddrb family of parameters is occasionally set using the tbs parameter when using the web interface. You do not need to learn the tbs syntax. Note that as_nloc and geo are not synonymous.

Boost

I’m not confident that these parameters do anything when output is “rss” or “atom”, but here they are for completeness.

  • gl=COUNTRY boosts search results from a specific country of origin. Possible values: country codes
  • gll=LATITUDE,LONGITUDE boosts search results near that point. Latitude and longitude must be integer microdegrees. In otherwords, multiply each number by a million and round to the nearest integer.
  • gr=REGION boosts search results from a specific region. Possible values: province codes
  • gm=METRO boosts search results from a specific metropolitan area. Possible values: metro codes
  • gpc=ZIPCODE boosts search results from a specific zip code. gl must be “us”.
  • gcs=CITY boosts search results from a specific city. Possible values: city names in the United States and worldwide

Order

  • scoring=ORDER sorts search results. Default: “r”. Alias: “as_scoring”
    • "r" by relevance
    • "n" by date (newest first)
    • "d" by date (newest first) with duplicates
    • "o" by date (oldest first)

Paginate

  • num=NUM retrieves NUM results. Default: 10. Possible values: if q present [1,100], otherwise [1, 30]
  • start=OFFSET retrieves results starting from OFFSET. NUM plus OFFSET must be less than 1000, otherwise you will get zero results. Ignored if output is “rss” or “atom”. Default: 0. Requires q.

Output

  • output=FORMAT sets the output format
    • rss retrieves RSS feed
    • atom retrieves Atom feed
  • hl=LANGUAGE sets host language. Default: “us”. Possible values: languages
  • hdlOnly=1 displays headlines only
  • qsid=ID used in combination with cf=q. Update: This feature has been removed.

In older versions of Google News, it was possible to change the output of the web interface using a cf parameters. It no longer seems to work, but it is here for completeness. Note that when output is set to “rss” or “atom”, this parameter is in fact ignored.

  • cf=CODE
    • all retrieve any content
    • q retrieve only quotes. Requires qsid. Update: This feature has been removed.
    • i retrieve only images
    • b retrieve only blogs

Google tracking

Google uses parameters to track how users are using the web interface, which may include aq, authuser, btnmeta_news_search, edchanged, client, rls, oi, oq, resnum, sa, source, sourceid, swrnum, tab. You don’t need to worry about these, unless you want to bias Google’s internal statistics on user behavior.

Google Search

If using the web interface, performing a keyword search from Google News will redirect you to a Google Search page. The tbm=nws query parameter informs Google to display news results only.

Undocumented

I haven’t yet figured out what these do, but their impact seems minimal. If you have a clue, please mention it in the comments!

  • pz is usually set to 1. Default: 1. Possible values: [0, 1]
  • ict Possible values: “ln”, “itn0”, “tnv0”
  • csid

Deprecated

The following parameters work only on the deprecated Google News Search API.

  • v=1.0 sets the API version. Possible values: “1.0”
  • rsz=SIZE sets results size
    • small retrieves four results
    • large retrieves eight results
  • userip sets user’s IP as an abuse counter-mesure
  • callback runs JavaScript callback
  • context sets callback context
  • key sets API key

Sources

Google Refine fingerprint clustering algorithm in Ruby

If you’ve used Google Refine, you know how useful its clustering algorithms are for finding and merging alternative representations of the same thing, e.g. “Gödel, Escher, Bach”, “Godel, Escher, Bach” (accents), “Gödel Escher Bach” (punctuation), “gödel, escher, bach” (case).

In my open data and data journalism projects, I perform similar data consolidation and reconciliation steps to avoid unwanted duplicates. Following Google Refine’s algorithm description, I wrote this snippet that adds a #fingerprint method to Ruby’s String class.

http://gist.github.com/1374639

Remove all accents and diacritics from string in Ruby

As a follow-up to this post, if you want to remove all accents and diacritics from a text string in Ruby, while preserving case, use this snippet:

http://gist.github.com/1374687

Reserved methods in Rails

I occasionally waste many hours debugging code that fails because I accidentally redefine a method inherited from a Rails class. For future reference:

ActionMailer

How to cache an internationalized site with Rack::Cache in Rails 3

I have bilingual and international websites that serve different translated content depending on the current locale. The current locale may be set according to the domain, subdomain, path, query string, or a cookie. If it is set according to a cookie, read on! If it is set using anything in the URL, Rails 3 (using Rack::Cache internally) will cache pages correctly for you.

Now, in my cookie-based site, I want to implement caching. However, if I naively set a Cache-Control header on a page, I’ll end up caching a single language’s content, and serving that language’s content to all visitors, independent of their current locale! I can’t have French visitors reading English just because an English visitor was the first to visit the page.

Rails 3 uses the Rack::Cache middleware internally for caching. Rack::Cache generates a “cache key” based on the URL used to access a page. You can read the code to generate the cache key. It then looks up that key in its cache. If it exists, it returns the cached content. If not, Rails serves the request, and Rack::Cache caches the response under that key for next time.

We can solve the above caching problem by setting a custom cache key for Rack::Cache that includes the current locale, so that we cache one version of the page for each language. The code to generate the default cache key is:

Rack::Cache::Key.new(request).generate

If you follow all documentation on this subject, you’ll probably write something like this in production.rb to set a custom cache key:

config.middleware.use Rack::Cache,
  :metastore => 'file:/var/cache/rack/meta',
  :entitystore => 'file:/var/cache/rack/body',
  :cache_key => Proc.new { |request|
    [I18n.locale, ':', Rack::Cache::Key.new(request).generate].join
  }

This won’t work! If you read the code for ActionDispatch’s Railtie, you’ll see the line:

config.action_dispatch.rack_cache = {
  :metastore => "rails:/",
  :entitystore => "rails:/",
  :verbose => true,
}

The solution is then to set the custom cache key in production.rb like this:

config.action_dispatch.rack_cache[:cache_key] = Proc.new { |request|
  [I18n.locale, ':', Rack::Cache::Key.new(request).generate].join
}

Tada!

How to safely create friendly From addresses including name and email

You have a contact form that collects a person’s name and email address. When sending their email, you want to get these two pieces of information from looking like this:

Name: John Doe
Email: john@example.com

to this:

From: John Doe <john@example.com>

or:

Reply-To: John Doe <john@example.com>

If you simple concatenate strings, then you will either send a malformed email or introduce a security risk. Here’s a safe, proper way of doing it using the Mail gem:

http://gist.github.com/1360883

Next page Something went wrong, try loading again? Loading more posts