nephnerd.xyzAboutProjectsBlog

An Interesting Combinatorics Problem

June 17th, 2023

The Motivation

A few weeks ago, I was preparing for my pharmacology subject examination and stumbled across a combinatorics problem with a rather interesting solution. I had completed the online question bank and wanted to repeat all of the questions again as a final review in the 10 days before the exam. However I had no way of resetting the question bank without buying another subscription, so I settled on the next best thing: generating tests that were maximum-sized random samples of 40 questions from the bank while wondering how many of these tests I would need to complete such that I could be reasonably certain that I would see every question in the question bank.

After a bit of searching on the internet, I stumbled across this question posted on a mathematics forum and realized that my question and the question posed on the forum are materially equivalent. In both cases we ask how many samples of objects with replacement from a collection of objects are required on average to have sampled each object at least once.

Deriving a Formula

Suppose there are questions and each generated test randomly selects questions to review. Let be a random variable representing the number of tests it takes to review all of the questions. For a given set of questions, the probability that you have managed to avoid in each of the first tests is

The probability that you have managed to avoid at least one question during the first tests is equal to the alternating sum over all subsets according to the principle of inclusion-exclusion. We may group these probabilities by the number of elements to succinctly define the tail probability

Using the tail sum for expectation formula, we are able to derive an expression for the expectation of Because the series converge absolutely, we can rearrange the summations and simplify using the formula for the sum of an infinite geometric series to derive a closed-form solution for the expectation.

Using the above derived formula on the problem posed in the forum post ( and ), we would expect it to take, on average, 9.05 days to replace all of the Belgian chocolates. Meanwhile, for the UWorld pharmacology question bank problem with of 639 and of 40, we would expect it take 109.45 tests on average to see every question of the question bank.