These aren’t the reducing functions you are looking for

unsorted — cgrand, 8 October 2014 @ 1 h 12 min

Transducers are powerful and easy to grasp when they claim they transform reducing functions. However once you scratch their surface you quickly realize that’s not their true nature: they transform stateful processes.

In a previous post, I explained why seeded transduce forces transducers to return stateful reducing functions. However this can be fixed. The current implementation of transduce reads:

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f coll] (transduce xform f (f) coll))
  ([xform f init coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f init)]
       (f ret))))

To fix it you have to make the seeded case the special case and not the normal case:

(defn fseed [f init]
    ([] init)
    ([x] (f x))
    ([x y] (f x y))))

(defn transduce
  "reduce with a transformation of f (xf). If init is not
  supplied, (f) will be called to produce it. f should be a reducing
  step function that accepts both 1 and 2 arguments, if it accepts
  only 2 you can add the arity-1 with 'completing'. Returns the result
  of applying (the transformed) xf to init and the first item in coll,
  then applying xf to that result and the 2nd item, etc. If coll
  contains no items, returns init and f is not called. Note that
  certain transforms may inject or skip items."  {:added "1.7"}
  ([xform f init coll] (transduce xform (fseed f init) coll))
  ([xform f coll]
     (let [f (xform f)
           ret (clojure.core.protocols/coll-reduce coll f (f))]
       (f ret))))

By making the seeded reduce the special case, the init value can be wrapped in a composite init value by transducer-returned reducing functions. Now writing, for example, a pure take is possible.

This modification fixes the seeded reduce case but it requires allocating intermediate objects at each step which goes against the promise that transducers alleviate allocative pressure (promise inherited from reducers). So it’s fixable but for performance reasons transducer-returned reducing functions remain stateful.

Once you go stateful…

Once you assume transducer-returned reducing functions are stateful, you realize they have some cruft from their functional origins (this was discussed at the last Paris Clojure Meetup (fr))):

  • 0-arity always delegate to the 0-arity of the downstream reducing function,
  • in any arity, the accumulator must be used in a linear manner with the only allowed operation being the 2-arity of the downstream function.
  • don’t forget to check for reduced return values!

So the accumulator values are passed around but never used except by the most downstream reducing function: the one that was passed to transduce by the user! Why not, then, encapsulates the accumulator state in a process?

A process in this model has only two operations: process one input and complete which could be respectively mapped to 1-arity and 0-arity of a function:

  ([] ...) ; completes the process, return value is unspecified
  ([x] ...)) ; processes x, returns true when no more input should be fed in.

So now transduce could be written as:

(defn transduce [xform f init coll]
  (let [vacc (volatile! init)
        p (fn 
            ([x] (reduced? (vswap! vacc f x))))
        p (xform p)]
    (feed! p coll) ; transducers are now process -> process
    (f (let [acc @vacc] (if (reduced? acc) @acc acc)))))

Where feed! is the iteration primitive – for exposition it can be implemented using reduce but it’s backwards: reduce should be implemented on top of feed!.

(defn feed!
 "Returns true if the process can't process more input."
 [p coll]
  (reduce #(and (p %2) (reduced true)) false coll))

Now we can try to reimplement some transducers as process-transforming functions:

(defn map [f]
  (fn [p1]
      ([] (p1))
      ([x] (p1 (f x))))))

(defn filter [pred]
  (fn [p1]
      ([] (p1))
      ([x] (and (pred x) (p1 x))))))

(defn take [n]
  (fn [p1]
    (let [vn (volatile! (dec n))]   
        ([] (p1))
        ([x] (or (neg? @vn) (p1 x) (neg? (vswap! vn dec))))))))

(defn partition-by [f]
  (fn [p]
    (let [a (java.util.ArrayList.)
          pv (volatile! ::none)]
          (when-not (.isEmpty a)
            (let [v (vec (.toArray a))]
              ;;clear first!
              (.clear a)
              (p v)))
          (let [pval @pv
                val (f input)]
            (vreset! pv val)
            (if (or (identical? pval ::none)
                  (= val pval))
              (do (.add a input) false) ; .add returns true
              (let [v (vec (.toArray a))]
                (.clear a)
                (or (p v)
                  (do (.add a input) false))))))))))


To me the main advantage of transducers as process transformers is that their interface is a bit less complected; especially by not having to deal with reduced wrappers – because of that it may be easier to have them support primitive types.

The process model also seems to be less of a mismatch when reasoning about transducers in other contexts (e.g. sequences, channels).

A better name

There has been much discussion on how to best type them but not that much about how to name them in a more patterned way. What about BuilderDecoratorFactories?

The Rules of Transducer Club

unsorted — cgrand, 11 September 2014 @ 15 h 46 min

First rule: you don’t call a transducer.
Second rule: only composition is allowed.
Third rule: better stateful than pure.

Here are the rationale behind each rule:

First rule: you don’t call a transducer.

Transducers may be so-called “stateful” that is they create stateful reducing functions. As a consequence you should not hold onto them for too long (otherwise state goes sour…). And there’s no betetr way to not hold them for too long that to never get a hold on them at all!

That’s why transduce, sequence, iteration and chan all take a transducer to avoid you the perils of having to call it.

Second rule: only composition is allowed.

It’s a direct consequence of the first rule: you should only compose transducers.

Third rule: better stateful than pure.

This one is a bit more subtle and caused exclusively by transduce.

A reducing fn has now threes arities: [] -> acc, [acc x] -> acc and [acc] -> result.

When one wants to pass state from one step to the next there are two options: be pure and pass it in the accumulator (and you’ll use the 1-arity to clean up) or be stateful.

It turns out that the pure option is a bad one, for two reasons. First reason, it’s going to increase object churn. Second reason, it’s going to change the type of the accumulator and this is a problem because:

  • in transduce an init value may be specified and this init value must be of the type of the accumulator,
  • we don’t have a way to map from the result domain to the accumulator domain.

So it means the user has to be aware of an implementation detail of your transducer (the way you smuggle state in the accumulator) to craft a proper init value. It’s an abstraction leak, it’s bad. Be stateful.

Optimal justification of text with Dynamic Programming

unsorted — cgrand, 29 August 2014 @ 11 h 46 min

This is an exercise in dynamic programming I found on /ftp.

The goal is to tightly arrange a given sequence of n words within page margins, maximizing overall neatness. To be more precise, we wish to minimize the sum, over all lines except the last, of the cubes of the number of blank characters at the end of each line. See the comments in the code for more details.

The algorithm has O(n^2) time and space complexities.

Below is my take on it and I believe the complexity to be O(n*width). I achieve linearity in the words count by laying out the text back to front: the key insight is that the optimal layout of some words only depends on the amount of space left on the current line, it does not depend on the layout of the words before them.

(defn layout [words width]
  (let [cat (fn [word {u :ugliness [l & ls] :lines}] ; adds the word to the start of the first line
              {:ugliness u :lines (cons (cons word l) ls)})
        br (fn [rem {u :ugliness ls :lines}] ; adds a break (creates a new line)
              {:ugliness (+ u (* rem rem rem)) :lines (cons () ls)})
        layout ; layout words, knowing that the first line shouldn't have more than rem characters
          (fn [layout words rem]
            (if-let [[word & ws] (seq words)]
                (= rem width) ; blank line ahead
                  (cat word (layout ws (- rem (count word))))
                (< (count word) rem) ; enough room for a space and the current word
                  (cat word
                    (min-key :ugliness
                      (layout ws (- rem (count word) 1))
                      (br rem (layout ws width))))
                :else (br rem (layout words width)))
              {:ugliness 0 :lines (list ())}))
        mlayout (memoize layout)
        layout (fn self [words rem] (mlayout self words rem))]
    (:lines (layout words width))))

This is an exercise I prepared for a Lambda Next workshop but that we never used.

Macros, closures and unexpected object retention

unsorted — cgrand, 11 September 2013 @ 12 h 37 min

The common advice about macros is that they should emit as little code as possible and delegate to ancillary functions as soon as possible. Here is an example from

(defmacro transaction
  [& body]
  `(transaction* (fn [] ~@body)))

I still think this is good advice but it has unintended consequences. The problem with this piece of code is that all closed-over objects in body are going to be retained longer than expected, longer than they would have been retained if the macro had emitted all the logic implemented in transaction* instead of delegating to it. (See this discussion as an example of issues created by such code.)

The closure object has references to all closed-over objects and since a closure can be called many times, it can’t get rid of them. So the only time where they are going to be garbage collectible is once the closure itself becomes collectible… and a closure can’t be collected while it’s executing.

Helpfully there’s a low-level feature for that:

(defmacro transaction
  [& body]
  `(transaction* (^:once fn* [] ~@body)))

It instructs the compiler that the closure is one-shot and that closed-over references should be cleared, thus allowing referenced objects to be garbage collected before the closure returns.

This problem is not specific to macros but can easily be solved in most cases: the closure is an implementation detail and the macro writer knows enough about its life-cycle to fix it. However any regular closure (fn or reify) is going to prevent closed-overs to be garbage-collected while one of its (java) methods is running because the closure is referenced by the stack.

During the last LambdaNext workshop a delegate stumbled on such a case while experimenting with reducers (and incidentally it made me understand a memory issue I worked around last year):

=> (time (reduce + 0 (map identity (range 1e8))))
"Elapsed time: 5729.579 msecs"
=> (time (reduce + 0 (r/map identity (range 1e8))))
;; Interrupting...
Expression was interrupted: null

(Depending on your memory settings, you may have to tweak the length of the range to exhibit the problem; more details here)

If one modifies reducers to not use (java) methods but external extensions:

(in-ns 'clojure.core.reducers)

(defrecord Folder [coll xf])

(defn folder
  "Given a foldable collection, and a transformation function xf,
  returns a foldable collection, where any supplied reducing
  fn will be transformed by xf. xf is a function of reducing fn to
  reducing fn."
  {:added "1.5"}
  ([coll xf]
     (Folder. coll xf)))

(extend-type Folder
      (coll-reduce [fldr f1]
                   (clojure.core.protocols/coll-reduce (:coll fldr) ((:xf fldr) f1) (f1)))
      (coll-reduce [fldr f1 init]
                   (clojure.core.protocols/coll-reduce (:coll fldr) ((:xf fldr) f1) init))

      (coll-fold [fldr n combinef reducef]
                 (coll-fold (:coll fldr) n combinef ((:xf fldr) reducef))))

Then the problem disappears:

(in-ns 'user)
=> (time (reduce + 0 (r/map identity (range 1e8))))
"Elapsed time: 4437.012 msecs"

This is because the protocol methods is not a java method of the reducer object anymore and thus it can be reclaimed while the (protocol) method is executing.

So next time you have a memory issue, look for closures tying the life-cycle of their closed-overs to theirs!

Tarjan’s strongly connected components algorithm

unsorted — cgrand, 18 March 2013 @ 19 h 48 min

I dislike algorithms that are full of indices and mutations. Not because they are bad but because I always have the feeling that the core idea is buried. As such, Tarjan’s SCC algorithm irked me.

So I took the traditional algorithm, implemented it in Clojure with explicit environment passing, then I replaced indices by explicit stacks (thanks to persistence!) and after some tweaks, I realized that I’ve gone full circle and could switch to stacks lengths instead of stacks themselves and get rid of the loop. However the whole process made the code cleaner to my eye. You can look at the whole history.

Here is the resulting code:

(defn tarjan 
  "Returns the strongly connected components of a graph specified by its nodes
   and a successor function succs from node to nodes.
   The used algorithm is Tarjan's one."
  [nodes succs]
  (letfn [(sc [env node]
            ; env is a map from nodes to stack length or nil,
            ; nil means the node is known to belong to another SCC
            ; there are two special keys: ::stack for the current stack 
            ; and ::sccs for the current set of SCCs
            (if (contains? env node)
              (let [stack (::stack env)
                    n (count stack)
                    env (assoc env node n ::stack (conj stack node))
                    env (reduce (fn [env succ]
                                  (let [env (sc env succ)]
                                    (assoc env node (min (or (env succ) n) (env node)))))
                          env (succs node))]
                (if (= n (env node)) ; no link below us in the stack, call it a SCC
                  (let [nodes (::stack env)
                        scc (set (take (- (count nodes) n) nodes))
                        ; clear all stack lengths for these nodes since this SCC is done
                        env (reduce #(assoc %1 %2 nil) env scc)]
                    (assoc env ::stack stack ::sccs (conj (::sccs env) scc)))
    (::sccs (reduce sc {::stack () ::sccs #{}} nodes))))

As always, if you need some short-term help with Clojure (code review, consulting, training etc.), contact me!

Decaying lists: log scale for lists

pondering — cgrand, 12 February 2013 @ 13 h 44 min

The concept of exponentially decaying lists by David Barbour piqued my interest and I implemented it. It can be summed up as: log scale for lists!

As for log scale, decaying lists allow to manage large range of values and thus to get a better understanding of the data.

So a decaying list grows logarithmically with the number of conjed items. It follows that some items are dropped when others are inserted. Let’s see:

=> (seq (into (decaying-list) (range 100)))
(99 98 97 96 95 94 92 91 90 89 88 87 86 84 83 80 70 65 61 59 57 49 44 30 29 27 25 23 22 18 12 4 0)

So most of the recent items are there but the older elements have a greater probability to have been dropped.

A decaying list is parametrized by its half-life. The default half-life is 20, it means that an item in the list has one chance out of two to “survive” the insertion of 20 other items.

=> (seq (into (decaying-list 5) (range 100))) ; a half-life of 5 conjs
(99 98 97 96 95 94 93 91 83 81 79 78 68 61 57 54 52 31 15)

You can know where the next decay (if any: nil is returned when no decay) is going to happen:

=> (def dl (into (decaying-list 5) (range 100)))
=> (decay-loc dl)
[(93 95 97 98 99) (92 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)]

So here the decay is going to happen between 93 and 92 (the first items of the “left” and “right” sequences). By default the most recent one is kept.

=> (seq (conj dl 100))
(100 99 98 97 95 93 91 89 87 85 81 77 72 71 63 54 52 46 31 20 16)

Indeed 92 has been dropped. (Repeated calls to decay-loc always yield the same result.)

However we may elect to synthetize a new value rather than just keep one of the two candidates to decay. For example we can compute their average:

=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (/ (+ l r) 2.0))))
          (range 100)))
(99 98 97 95.5 93.5 91.5 90 89 87.125 82.75 79.5 72.875 64.8125 60.875 58 50.732421875 27.0 17.75 11.25 2.8125)

Or something a bit more elaborate (and precise):

=> (defn stats [x]
     (if (map? x)
       {:min x :max x :avg x :n 1}))
=> (defn merge-stats [{mina :min maxa :max avga :avg na :n}
                      {minb :min maxb :max avgb :avg nb :n}]
     {:min (min mina minb)
      :max (max maxa maxb)
      :avg (/ (+ (* na avga) (* nb avgb)) (double (+ na nb)))
      :n (+ na nb)})
=> (seq (into (decaying-list 5 
                :collapse (fn [dl]
                            (let [[[l] [r]] (decay-loc dl)] 
                              (merge-stats (stats l) (stats r)))))
          (range 100)))
(99 98 97 {:min 92, :max 96, :avg 94.0, :n 5} 91 {:min 87, :max 90, :avg 88.5, :n 4} 86 85 84 83 {:min 80, :max 82, :avg 81.0, :n 3} {:min 78, :max 79, :avg 78.5, :n 2} 77 {:min 70, :max 76, :avg 73.0, :n 7} 69 68 {:min 49, :max 67, :avg 58.0, :n 19} {:min 41, :max 48, :avg 44.5, :n 8} {:min 36, :max 40, :avg 38.0, :n 5} 35 {:min 32, :max 34, :avg 33.0, :n 3} {:min 10, :max 31, :avg 20.5, :n 22} {:min 3, :max 9, :avg 6.0, :n 7} {:min 0, :max 2, :avg 1.0, :n 3})

Decaying lists can also maintain a state – that is updated after each conj or decay. Below the implementation, there is an example of state management.

You can also cap the length of the decaying list by using the :capacity option. A quick note on capacity: increasing the capacity by the half-life value (eg going from 1000 to 1050 when half-life is 50), doubles the range the decaying list remembers.

From lazy seqs to reducers and back

unsorted — cgrand, 11 February 2013 @ 17 h 27 min

Sometimes in a seq pipeline, you know that some intermediate results are, well, intermediate and as such don’t need to be persistent but, on the whole, you still need the laziness.

You can’t always opt for reducers because while being non-persistent (and thus faster), they imply getting rid of laziness.

So you can’t mix and match transformations on sequences and on reducers when you care for laziness. Or, wait!, maybe you can.

At core, reducers are just a clever way to compose big functions while giving the user the illusion of manipulating collections. So we should be able to apply a reducer pipeline if we get hold of the composed function. This is exactly what the below seq-seq function does:

(defn reverse-conses 
  ([s tail] 
    (if (identical? (rest s) tail)
      (reverse-conses s tail tail)))
  ([s from-tail to-tail]
    (loop [f s b to-tail]
      (if (identical? f from-tail)
        (recur (rest f) (cons (first f) b))))))

(defn seq-seq [f s]
  (let [f1 (reduce #(cons %2 %1) nil 
              (f (reify clojure.core.protocols.CollReduce
                   (coll-reduce [this f1 init]
    ((fn this [s]
         (when-let [s (seq s)]
           (let [more (this (rest s)) 
                 x (f1 more (first s))]
             (if (reduced? x)
               (reverse-conses @x more nil)
               (reverse-conses x more)))))) s)))

(defmacro seq->> [s & forms]
  `(seq-seq (fn [n#] (->> n# ~@forms)) ~s))

Note that the captured function (f1) may be impure, so don’t share it!

Now one can specifies non-persistent lazy seq pipelines:

=> (seq->> (range) (r/map str) (r/take 25) (r/drop 5))
("5" "6" "7" "8" "9" "10" "11" "12" "13" "14" "15" "16" 
 "17" "18" "19" "20" "21" "22" "23" "24")

To prove laziness is at play here:

=> (take 2 (seq->> (range) (r/map #(str (doto % prn))) (r/take 25) (r/drop 5)))
"5" "6")

Of course seq-seq could be made to support chunked sequences and, if you try to play with it, beware of r/mapcat.


I added reverse-conses to account for the fact that when several items are added during a single call to f1 (eg during a r/mapcat “step”) they were added in the wrong order – if only everything was more set-like.

Clojure unconference, somewhere in France, early Spring.

unsorted — cgrand, 21 December 2012 @ 14 h 25 min

Some people are willing to set up a Clojure unconference in France in early Spring. Follow the clojure-fr mailing list for further discussion.

Extensible macros (A flatter cond follow-up)

unsorted — cgrand, 21 September 2012 @ 8 h 51 min

I pushed Utils out. It features reduce-by but also variations on cond (the ones introduced in the comments of A Flatter Cond) and if-let.

cond supports the following operators: :let, :when, :when-let and vectors in test position are treated as the binding form to a if-let (thus setting the local bindings) for the “then” expression not for the rest of the cond.

if-let supports :let to introduce new bindings without testing their expression for truth.

One notable twist to these macros is that thanks to net.cgrand.xmacros they are extensible: new operators can safely be participated to the macros (proper namespacing is enforced). See the cond implementation, abbreviated below:

(defmacro cond
 "An extensible variation on cond. Trailing else is supported.
keywords in test position are treated as operators.

Operators impls signature is [rhs else] where rhs is the \"then\"
expression next to the operator."
  [& clauses]
  (when-let [[op rhs & more-clauses] (seq clauses)]
    (if (next clauses)
      (x/expand-op op rhs `(cond ~@more-clauses))

(x/defdefault-op cond [test-expr then else]
  (if (vector? test-expr)
    `(if-let ~test-expr ~then ~else)
    `(if ~test-expr ~then ~else)))

(x/defop cond :let
  "Introduces local bindings."
  [bindings cont]
  `(let ~bindings ~cont))

(x/defop cond :when
  "Short-circuits the rest of the cond if false"
  [test-expr cont]
  `(when ~test-expr ~cont))

(x/defop cond :when-let
 "Short-circuits the rest of the cond if false and introduces local
  [bindings cont]
  `(when-let ~bindings ~cont))

Follow-up: A world in a ref

unsorted — cgrand, @ 8 h 35 min

I recently published Megaref a refined (and working) version of the idea described in A World in a Ref.

New and notesworthy:

  • actual working code,
  • the concurrency (number of guards) can be tuned at runtime,
  • tuning happens through an uniform interface (get-options, set-option!),
  • subrefs (and backward compatible replacements for usual STM functions) allow to easily migrate “polyref” code to megaref (see the conversion of Rich Hickey’s ant demo — mind the typo in the commit message).

Feedback welcome!

Next Page »
(c) 2018 Clojure and me | powered by WordPress with Barecity