When to use or not use iterator() in the django ORM

Each Answer to this Q is separated by one/two green lines.

This is from the django docs on the queryset iterator() method:

A QuerySet typically caches its results internally so that repeated evaluations
do not result in additional queries. In contrast, iterator() will read results
directly, without doing any caching at the QuerySet level (internally, the default iterator calls iterator() and caches the return value). For a QuerySet which
returns a large number of objects that you only need to access once,
this can results in better performance and a significant reduction in memory.

After reading, I’m still confused: The line about increased performance and memory reduction suggests we should just use the iterator() method. Can someone give some examples of good and bad cases iterator() usage?

Even if the query results are not cached, if they really wanted to access the models more than once, can’t someone just do the following?

saved_queries = list(Model.objects.all().iterator())

Note the first part of the sentence you call out:
For a QuerySet which returns a large number of objects that you only need to access once

So the converse of this is: if you need to re-use a set of results, and they are not so numerous as to cause a memory problem then you should not use iterator. Because the extra database round trip is always going to reduce your performance vs. using the cached result.

You could force your QuerySet to be evaluated into a list but:

  • it requires more typing than just saved_queries = Model.objects.all()
  • say you are paginating results on a web page: you will have forced all results into memory (back to possible memory problems) rather than allowing the subsequent paginator to select the slice of 20 results it needs
  • QuerySets are lazy, so you can have a context processor, for instance, that puts a QuerySet into the context of every request but only gets evaluated when you access it on certain requests but if you’ve forced evaluation that database hit happens every request

The typical web app case is for relatively small result sets (they have to be delivered to a browser in a timely fashion, so pagination or a similar technique is employed to decrease the data volume if required) so generally the standard QuerySet behaviour is what you want. As you are no doubt aware, you must store the QuerySet in a variable to get the benefit of the caching.

Good use of iterator: processing results that take up a large amount of available memory (lots of small objects or fewer large objects). In my experience this is often in management commands when doing heavy data processing.

I agree with Steven and I would like to had an observation:

  • “it requires more typing than just saved_queries = Model.objects.all()”. Yes it does but there is a major difference why you should use list(Model.objects.all()). Let me give you an example, if you put the that assigned to a variable, it will execute the query and than save it there, let’s imagine you have +1M records, so that means, you will have +1M records in a list that you may or may not use immediately after, so I would recommend only using as Steven said, only using Model.objects.all(), because this assigned to a variable, it won’t execute until you call the variable, saving you DB calls.

  • You should use the prefetch_related() to save you from doing too many calls into a Database and therefore, it will use the Django reverse lookup to help you and save you tons of time.

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .