Clone My Fields, Please

The Introduction

I recently started using the SearchManager from the Mercury Tide white paper on using MySQL full-text search with Django. It's been helpful, but I ran into a bug recently while trying to add a default filter to a SearchManager subclass.

The Boring Context

Rather than deleting objects from the database, my application sets a boolean flag to indicate that the content is not longer relevant. I wanted my manager to apply a filter to every query set to include only items that are not disabled. Here's what the manager class looks like:

class SearchableItemManager(SearchMangager):
    def __init__(self):
        zuper = super(SearchableItemManager, self)
        zuper.__init__(('name','description',))

    def get_query_set(self):
        query = super(SearchableItemManager, self).get_query_set()
        return query.filter(is_enabled=True)

The Ugly Crash

When I made the change, I found that calling the search() method raised a TypeError: "'NoneType' object is not iterable." The error occurred when the SearchQuerySet tried to construct the SQL for the MATCH…AGAINST clause. Somehow, the _search_fields tuple on the SearchQuerySet was None.

The Mystery Solved

This had me baffled until I had a look at the _QuerySet code in Django. It seems obvious now, but adding an additional filter to a query set returns a clone of the original with the new filter added. The _QuerySet object contains a _clone method that copies a hard-coded list of fields from the old QS to the new one. Naturally, that hard-coded list doesn't know anything about my _search_fields, so the property has no value on the clone.

The Fix

Now, depending on how much of a zealot you are about modifying “private” functions, there are two ways to fix this. The easiest method is to simply override the _clone method and add the _search_fields tuple to the clone. The alternative is to override every method that depends on the _clone method, and copy over the _search_fields tuple for each one. I think that would be stupid, and will speak of it no further. Here's the code I added to generate happiness:

class SearchQuerySet(models.query.QuerySet):
    # ... code from the original Mercury Tide class
    def _clone(self, klass=None, **kwargs):
        zuper = super(SearchQuerySet, self)
        clone = zuper._clone(klass, **kwargs)
        clone._search_fields = self._search_fields
        return clone
  • http://www.yeago.net/works/ Yeago

    Hmm.. tried to adapt that white-paper.

    I keep getting the following error upon MyModel.objects.search(“x”):

    to give me “module has no attribute quote_name”

    Any clue?

  • http://subakva.com Jason Wadsworth

    That sounds familiar, but I don’t remember the cause. This is probably a stupid question, but you are using MySQL, right?

  • http://www.yeago.net/works/ Yeago

    Yes, mysql.

    I’m using Django SVN and I’ve verified that quote_name exists in that class.

  • http://www.yeago.net/works/ Yeago

    Actually, I’m going to ignore the stuff at mercurytide as this has since been built-in to Django.

    Also, for those non-mysqlers:

    http://www.davidcramer.net/code/79/in-depth-django-sphinx-tutorial.html

  • http://www.yeago.net/works/ Yeago

    Mmm…. ok, I retract the above remark about it being built in. Apparently Django doesn’t provide for full-text across columns.

  • http://www.yeago.net/works/ Yeago

    Maybe the whole comment was dumb. Sphinx is for MySql+Django only.

  • http://subakva.com Jason Wadsworth

    Last I checked, the built-in Django support only supported single columns, and only in boolean mode. That’s definitely useful, but it’s not appropriate for every situation.

    I looked at my code and remembered the fix for the quote_name problem. The organization of the backends in Django changed in the trunk so that quote_name was accessed through a DatabaseOperations interface.

    Replace:
    backend.quote_name(...)
    
    With:
    ops = backend.DatabaseOperations()
    ops.quote_name(...)
    
  • http://www.yeago.net/works/ Yeago

    Don’t suppose the SearchManager is choking upon the last svn update?

    django/db/models/query.py

    line c = klass(model=self.model, query=self.query.clone())

    “__init__() got an unexpected keyword argument ‘query’”

    Digging around. Letcha know if I find something out.

  • http://www.softwarevoices.com/ Craig Ogg

    I haven’t checked it to any real extent, but it looks like just bad subclassing form. Adding *args and **kwargs in the usual way should make the problem disappear:

    def __init__(self, index_column, *args, **kwargs):
    super(SearchManager, self).__init__(*args, **kwargs)

    Similarly for SearchQuerySet.

  • http://www.yeago.net/works/ Yeago

    I think that fixed one issue, Craig.

    Now I’m attempting to track down an odd bug whereby attempting to filter() a result-set results in [], regardless of match.

  • JR

    This article and the accompanying comments were most helpful in converting MercuryTide’s search to Django 1.0+ compatibility, thank you!

    I ended up using MercuryTide’s search as a starting point, but wrote the actual search aspect myself, because InnoDB doesn’t support MySQL’s fulltext search.

  • http://www.djangodummy.com Andrew Pelt

    Thanks for your article. I am new at python and this will be a big help.

  • gabriel

    Hi,
    have you guys made this work?

    I get a stopiteration exception at:
    /usr/local/lib/python2.6/dist-packages/django/db/models/sql/query.py in add_extra, line 1691

    here is my code:
    class SearchQuerySet(models.query.QuerySet):
    def __init__(self, model=None, fields=None, *args, **kwargs):
    super(SearchQuerySet, self).__init__(model, *args, **kwargs)
    self._search_fields = fields

    def search(self, query):
    meta = self.model._meta

    # Get the table name and column names from the model
    # in `table_name`.`column_name` style
    columns = [meta.get_field(name, many_to_many=False).column for name in self._search_fields]
    full_names = ["%s.%s" % (connection.ops.quote_name(meta.db_table), connection.ops.quote_name(column)) for column in columns]

    # Create the MATCH…AGAINST expressions
    fulltext_columns = “, “.join(full_names)
    match_expr = (“MATCH(%s) AGAINST (%%s)” % fulltext_columns)

    # Add the extra SELECT and WHERE options
    return self.extra(select={‘relevance’: match_expr}, where=[match_expr], params=[query, query])

    class SearchManager(models.Manager):
    def __init__(self, fields, *args, **kwargs):
    super(SearchManager, self).__init__(*args, **kwargs)
    self._search_fields = fields

    def get_query_set(self):
    return SearchQuerySet(self.model, self._search_fields)

    def search(self, query):
    return self.get_query_set().search(query)

  • http://subakva.com Jason Wadsworth

    Gabriel,

    It’s been a long time since I’ve looked at this code, and both Django and Python have moved on without me.

    You might want to try posting your question on Stack Overflow:

    http://stackoverflow.com/questions/ask

  • iff

    Gabriel,
    did you solve that problem?
    I am having the same one
    I am using Pyhon 2,6 and Django 1.3, it works fine on Django 1.0

  • Wieslaw

    Gabriel, here you have the correct line of code:

    # Add the extra SELECT and WHERE options
    return self.extra(select={‘relevance’: match_expr},
    where=[match_expr],
    params=[query],
    select_params=[query])

    One parameter (i.e. variable ‘query’) has to be passed to additional select clause, another parameter (the same query variable in this case) – to where clause. For the first select_params is used, for the latter – params.