Is Haskell’s laziness an elegant alternative to Python’s generators?

Each Answer to this Q is separated by one/two green lines.

In a programming exercise, it was first asked to program the factorial function and then calculate the sum: 1! + 2! + 3! +... n! in O(n) multiplications (so we can’t use the factorial directly). I am not searching the solution to this specific (trivial) problem, I’m trying to explore Haskell abilities and this problem is a toy I would like to play with.

I thought Python’s generators could be a nice solution to this problem. For example :

from itertools import islice
def ifact():
    i , f = 1, 1
    yield 1
    while True:
        f *= i
        i += 1
        yield f

def sum_fact(n):
    return sum(islice(ifact(),5))

Then I’ve tried to figure out if there was something in Haskell having a similar behavior than this generator and I thought that laziness do all the staff without any additional concept.

For example, we could replace my Python ifact with

fact = scan1 (*) [1..]

And then solve the exercise with the following :

sum n = foldl1 (+) (take n fact)

I wonder if this solution is really “equivalent” to Python’s one regarding time complexity and memory usage. I would say that Haskell’s solution never store all the list fact since their elements are used only once.

Am I right or totally wrong ?

I should have check more precisely:

Prelude> foldl1 (+) (take 4 fact)
Prelude> :sprint fact
fact = 1 : 2 : 6 : 24 : _

So (my implementation of) Haskell store the result, even if it’s no longer used.

Indeed, lazy lists can be used this way. There are some subtle differences though:

  • Lists are data structures. So you can keep them after evaluating them, which can be both good and bad (you can avoid recomputation of values and to recursive tricks as @ChrisDrost described, at the cost of keeping memory unreleased).
  • Lists are pure. In generators you can have computations with side effects, you can’t do that with lists (which is often desirable).
  • Since Haskell is a lazy language, laziness is everywhere and if you just convert a program from an imperative language to Haskell, the memory requirements can change considerably (as @RomanL describes in his answer).

But Haskell offers more advanced tools to accomplish the generator/consumer pattern. Currently there are three libraries that focus on this problem: pipes, conduit and iteratees. My favorite is conduit, it’s easy to use and the complexity of its types is kept low.

They have several advantages, in particular that you can create complex pipelines and you can base them on a chosen monad, which allows you to say what side effects are allowed in a pipeline.

Using conduit, your example could be expressed as follows:

import Data.Functor.Identity
import Data.Conduit
import qualified Data.Conduit.List as C

ifactC :: (Num a, Monad m) => Producer m a
ifactC = loop 1 1
    loop r n = let r' = r * n
                in yield r' >> loop r' (n + 1)

sumC :: (Num a, Monad m) => Consumer a m a
sumC = C.fold (+) 0

main :: IO ()
main = (print . runIdentity) (ifactC $= C.isolate 5 $$ sumC)
-- alternatively running the pipeline in IO monad directly:
-- main = (ifactC $= C.isolate 5 $$ sumC) >>= print

Here we create a Producer (a conduit that consumes no input) that yields factorials indefinitely. Then we compose it with isolate, which ensures that no more than a given number of values are propagated through it, and then we compose it with a Consumer that just sums values and returns the result.

Your examples are not equivalent in memory usage. It is easy to see if you replace * with a + (so that the numbers don’t get big too quickly) and then run both examples on a big n such as 10^7. Your Haskell version will consume a lot of memory and python will keep it low.

Python generator will not generate a list of values then sum it up. Instead, the sum function will get values one-by-one from the generator and accumulate them. Thus, the memory usage will remain constant.

Haskell will evaluate functions lazily, but in order to calculate say foldl1 (+) (take n fact) it will have to evaluate the complete expression. For large n this will unfold into a huge expression the same way as (foldl (+) 0 [0..n]) does. For more details on evaluation and reduction have a look here:

You can fix your sum n by using foldl1' instead of foldl1 as described on the link above. As @user2407038 explained in his comment, you’d also need to keep fact local. The following works in GHC with a constant memory use:

let notfact = scanl1 (+) [1..]
let n = 20000000
let res = foldl' (+) 0 (take n notfact)

Note that in case of the actual factorial in place of notfact memory considerations are less of a concern. The numbers will get big quickly, arbitrary-precision arithmetic will slow things down so you won’t be able get to big values of n in order to actually see the difference.

Basically, yes: Haskell’s lazy-lists are a lot like Python’s generators, if those generators were effortlessly cloneable, cacheable, and composeable. Instead of raising StopIteration you return [] from your recursive function, which can thread state into the generator.

They do some cooler stuff due to self-recursion. For example, your factorial generator is more idiomatically generated like:

facts = 1 : zipWith (*) facts [1..]

or the Fibonaccis as:

fibs = 1 : 1 : zipWith (+) fibs (tail fibs)

In general any iterative loop can be converted to a recursive algorithm by promoting the loop-state to arguments of a function and then calling it recursively to get the next loop cycle. Generators are just like that, but we prepend some elements each iteration of the recursive function, `go ____ = (stuff) : go ____.

The perfect equivalent is therefore:

ifact :: [Integer]
ifact = go 1 1
  where go f i = f : go (f * i) (i + 1)

sum_fact n = sum (take n ifact)

In terms of what’s fastest, the absolute fastest in Haskell will probably be the “for loop”:

sum_fact n = go 1 1 1
  where go acc fact i
          | i <= n = go (acc + fact) (fact * i) (i + 1)
          | otherwise = acc

The fact that this is “tail-recursive” (a call of go does not pipe any sub-calls to go to another function like (+) or (*)) means that the compiler can package it into a really tight loop, and that’s why I’m comparing it with “for loops” even though that’s not really a native idea to Haskell.

The above sum_fact n = sum (take n ifact) is a little slower than this but faster than sum (take n facts) where facts is defined with zipWith. The speed differences are not very large and I think mostly just come down to memory allocations that don’t get used again.

The answers/resolutions are collected from stackoverflow, are licensed under cc by-sa 2.5 , cc by-sa 3.0 and cc by-sa 4.0 .