Meetup summary

2026-01-23 - Multiset generation - part 2

  • TAOCP Volume 4A, Section 7.2.1.5 - Generating set partitions. In particular:
    • Partitions of a set and Algorithm H.
    • Partitions of a multiset and Algorithm M.
    • This appears to be in the O’Reilly catalog, which in principle means you should be able to access it through a KCLS or SPL library account. I’ve tested it out, but the book is quite heavy and doesn’t render well on the web (sadly, O’Reilly uses an epub adaptation rather than the blessed PDF render). Moreover, I couldn’t even get it to load on my phone. 🤦
  • Optionally, check out these example programs for generating set partitions1 in different ways. We will “rediscover” these in the session, so avoid this if you don’t want spoilers and consider for review instead:
    • Bell triangle2 recurrence (Rust) (Go)
    • Stirling set number (S2) recurrence (Rust)
    • Lexicographical order (Rust) (JavaScript)
    • Restricted growth strings3 (Rust)
    • Pseudo-Gray code over restricted growth strings. I only realized in retrospect this was not a proper Gray code, but it works as intended in that edit distances are at most 1 in “real” partition operations. (Rust)
    • Proper Gray code over restricted growth strings. This is due to Ehrlich and is mentioned in TAOCP. (Rust)

NOTE: The structure of the combinatorics project is subject to change, as I’ve been using this as a quick-and-dirty testing ground. I’ve used absolute commit links above in attempt to keep links from breaking.

Agenda:

  • Review background motivation from last time if anybody is interested.
  • Quick notes about multiset “shapes”. I’ve adopted the term protoype to describe a canonical multiset shape4.
    • If you have ordered (“identifiable”/non-exchangeable) items within your multiset, then the number of distinct unrestricted prototypes is c(n)c(n), where nn is the number of items in your multiset and cc is the integer partition function.
    • If you have exchangeable items, then the number of unrestricted prototypes p(n)p(n), where pp is the integer partition function.
    • If you have mm distinct, non-exchangeable items in a multiset of size nn, the number of prototypes is equal to the number of (strict) mm-compositions of nn.
    • If you have mm _distinct, exchangeable items in a multiset of size nn, the number of prototypes is equal to the number of (strict) mm-partitions of nn.
  • Develop 5 different techniques to generate all (unrestricted) set partitions:
    • Using strict canonicalization and generating in lexcigographical order
    • Using the Bell triangle recurrence
    • Using the Stirling set number recurrence and summing over all part counts
    • Using lexicographically ordered “restricted growth” strings. (This is equivalent to Knuth’s Algorithm H above.)
    • Using a Gray code over the same restricted growth strings noted above. Surprisingly, this was the last technique I found and possibly the hardest to derive, yet in my opinion the easiest to understand (if you’re familiar with Gray codes). We’ll briefly discuss Gray codes before drilling into this one.
  • Motivation for the above is to see different ways to approach the regular set partitioning problem and try to extend it to multisets in a similar way. I’ve started work on this, but would like input from you on some different approaches we could take for multiset partitions. I highly recommend reading the referenced TAOCP section for inspiration.

Footnotes

  1. We’ve been using Python and JavaScript heavily for combinatorial generation thanks to their friendly generator syntax. However, sets are so expensive to materialize that I couldn’t get them to run “acceptably” fast for moderate values of set size. The Rust and Go implementations use somewhat aggressive memory reuse and minimize allocations to get around this. Unfortunately, as a result, they are relatively less readable than they could be. We’ll be whiteboarding in pseudo-code to make this more pleasant.

  2. This is also known as the Peirce triangle or Aitken’s array. Unfortunately, I haven’t seen any good discussion of where this comes from or how/why the recurrence works. Wikipedia gives a combinatorial interpretation which was apparently retroactively discovered. At the meetup, I will present a different combinatorial interpretation which I discovered (essentially reverse-engineered from the known recurrence) and find easier to understand. It’s possible that this is equivalent to Peirce’s original formulation, but I wouldn’t know. I’ve only skimmed the original paper, which does not focus on set partitions and uses archaic notation and terminology in discussing the recurrence of interest. Similarly, Aitken uses an indirect route through generating functions to develop the recurrence. Neither seems to yield an intuitive combinatorial interpretation. If anybody is able to unpack either of these, please let me know!

  3. This is the only technique I didn’t develop on my own (I took it from Knuth). Interestingly, the “restricted growth” string representation is essentially the dense array in an “indexed partition” which I developed for efficient partition augmentation in the Bell triangle and Stirling recurrences. (Note that I did use this encoding as inspiration for the Gray code technique which generates partitions with minimal step-wise changes. Knuth covers this as well, but my technique is apparently not the same as Ehrlich’s as it produces suffixes in a different order and does not attempt to avoid recursion.)

  4. Unfortunately, multiset generation and counting is an under-studied area due to the explosion of shapes you must deal with. As a result, there are no succinct, general formulas for interesting results and asymptotics. Instead, you have to derive a bespoke generating function for the specific multiset (or shape) that you care about and do your own analysis. This is a recurring theme when working with multisets, so I’ve chosen to use the word prototype to capture the canonical multiset for a given multiset (probably becasue I’ve been writing too much JavaScript lately 😅). A canonical multiset consists of only the integers {1,2,,n}\{1, 2, \dots, n\} and contains every such integer. Moreover, if the original multiset elements (before mapping to natural numbers) do not have an inherent total order, then the prototype has the additional criterion that the parts must be canonicalized (e.g., by ordering the part sizes). If the items do have a total order, then each distinct integer mapping corresponds to a different multiset in an identifiable way, so there is no need to canonicalize. I’m not aware of any standard terminology for this.