Tuesday, September 11, 2012

filter, spurious tuples and distinct

In django ORM, we can use filter to query our models; we can use double-underscores __ to follow relationships in our model, much like doing JOIN in SQL (in fact, it gets translated to join statements); however, the filter method introduces 'spurious tuples' when doing joins; this may or may not be a big deal, but it is good to know. Using distinct() will eliminate those extra tuples.

For example, imagine a simple model, consisting of a person class, which may have more than one email.
class Person(models.Model):
name=models.CharField(max_length=20);
def __unicode__(self):
return self.name;
class Phone(models.Model):
person=models.ForeignKey(Person);
area_code=models.CharField(max_length=3);
number=models.CharField(max_length=7);
def __unicode__(self):
return "("+self.area_code+")"+self.number +" for " +self.person.name;
view raw models.py hosted with ❤ by GitHub
Then, if we add data for a person with one email, as follows:
from joins.models import *;
p1=Person.objects.create(name='Orlando');
ph1=Phone.objects.create(person=p1,area_ code='770',number='1234567');
view raw sample data1.py hosted with ❤ by GitHub
We can use the following code to get the people with a phone number in area '770'
Person.objects.filter(phone_set__area='770');
view raw filter.py hosted with ❤ by GitHub
However, if the same person has TWO phones with that area code, we get TWO rows for the person. Basically, the filter function with the double underscores is doing something equivalent to the following SQL:
SELECT Person.*
FROM Person JOIN Phone ON (Person.id=Phone.person_id)
WHERE Phone.area_code='770'
view raw sql_join hosted with ❤ by GitHub
where it should be doing:
SELECT *
FROM Person
WHERE id in (
SELECT person_id
FROM Phone
WHERE area_code='770'
)
view raw sql_in hosted with ❤ by GitHub
however, adding a call to distinct() will fix it; our query, adjusting for django's idiosyncrasy, is:
Person.objects.filter(phone_set__area='770').distinct();
and we can see the different results returned:
In [8]: Person.objects.filter(phone__area_code='770')
Out[8]: [<Person: Orlando>, <Person: Orlando>]
In [9]: Person.objects.filter(phone__area_code='770').distinct()
Out[9]: [<Person: Orlando>]
view raw output hosted with ❤ by GitHub
And, for completeness, here's the code for adding the second phone number.
ph2=Phone.objects.create(person=p1,area_code='770',number='1234567');
view raw data2.py hosted with ❤ by GitHub

No comments:

Post a Comment