Meetup summary

2026-01-23 - Set partition generation - part 2

Agenda:

Review background motivation from last time if anybody is interested.
Quick notes about multiset “shapes”. I’ve adopted the term protoype to describe a canonical multiset shape⁴.
- If you have ordered (“identifiable”/non-exchangeable) items within your multiset, then the number of distinct unrestricted prototypes is $c(n)$ , where $n$ is the number of items in your multiset and $c$ is the integer partition function.
- If you have exchangeable items, then the number of unrestricted prototypes $p(n)$ , where $p$ is the integer partition function.
- If you have $m$ distinct, non-exchangeable items in a multiset of size $n$ , the number of prototypes is equal to the number of (strict) $m$ -compositions of $n$ .
- If you have $m$ _distinct, exchangeable items in a multiset of size $n$ , the number of prototypes is equal to the number of (strict) $m$ -partitions of $n$ .
Develop 5 different techniques to generate all (unrestricted) set partitions:
- Using strict canonicalization and generating in lexcigographical order
- Using the Bell triangle recurrence
- Using the Stirling set number recurrence and summing over all part counts
- Using lexicographically ordered “restricted growth” strings. (This is equivalent to Knuth’s Algorithm H above.)
- Using a Gray code over the same restricted growth strings noted above. Surprisingly, this was the last technique I found and possibly the hardest to derive, yet in my opinion the easiest to understand (if you’re familiar with Gray codes). We’ll briefly discuss Gray codes before drilling into this one.
Motivation for the above is to see different ways to approach the regular set partitioning problem and try to extend it to multisets in a similar way. I’ve started work on this, but would like input from you on some different approaches we could take for multiset partitions. I highly recommend reading the referenced TAOCP section for inspiration.

Notes

We got through Stirling-based generation as well as lexicographically sorted (canonical) partitions. We’ll pick up form there next time.

We’ve been using Python and JavaScript heavily for combinatorial generation thanks to their friendly generator syntax. However, sets are so expensive to materialize that I couldn’t get them to run “acceptably” fast for moderate values of set size. The Rust and Go implementations use somewhat aggressive memory reuse and minimize allocations to get around this. Unfortunately, as a result, they are relatively less readable than they could be. We’ll be whiteboarding in pseudo-code to make this more pleasant. ↩
This is also known as the Peirce triangle or Aitken’s array. Unfortunately, I haven’t seen any good discussion of where this comes from or how/why the recurrence works. Wikipedia gives a combinatorial interpretation which was apparently retroactively discovered. At the meetup, I will present a different combinatorial interpretation which I discovered (essentially reverse-engineered from the known recurrence) and find easier to understand. It’s possible that this is equivalent to Peirce’s original formulation, but I wouldn’t know. I’ve only skimmed the original paper, which does not focus on set partitions and uses archaic notation and terminology in discussing the recurrence of interest. Similarly, Aitken uses an indirect route through generating functions to develop the recurrence. Neither seems to yield an intuitive combinatorial interpretation. If anybody is able to unpack either of these, please let me know! ↩
This is the only technique I didn’t develop on my own (I took it from Knuth). Interestingly, the “restricted growth” string representation is essentially the dense array in an “indexed partition” which I developed for efficient partition augmentation in the Bell triangle and Stirling recurrences. (Note that I did use this encoding as inspiration for the Gray code technique which generates partitions with minimal step-wise changes. Knuth covers this as well, but my technique is apparently not the same as Ehrlich’s as it produces suffixes in a different order and does not attempt to avoid recursion.) ↩
Unfortunately, multiset generation and counting is an under-studied area due to the explosion of shapes you must deal with. As a result, there are no succinct, general formulas for interesting results and asymptotics. Instead, you have to derive a bespoke generating function for the specific multiset (or shape) that you care about and do your own analysis. This is a recurring theme when working with multisets, so I’ve chosen to use the word prototype to capture the canonical multiset for a given multiset (probably becasue I’ve been writing too much JavaScript lately 😅). A canonical multiset consists of only the integers $\{1, 2, \dots, n\}$ and contains every such integer. Moreover, if the original multiset elements (before mapping to natural numbers) do not have an inherent total order, then the prototype has the additional criterion that the parts must be canonicalized (e.g., by ordering the part sizes). If the items do have a total order, then each distinct integer mapping corresponds to a different multiset in an identifiable way, so there is no need to canonicalize. I’m not aware of any standard terminology for this. ↩

Meetup summary

2026-01-23 - Set partition generation - part 2

Recommended reading:

Agenda:

Notes

Footnotes