I’ve been playing with different ways to use a sieve to approach the Collatz Problem, and wanted to share it with you. The Collatz Problem is one of those problems that are easily stated but have real depth – in fact, the Collatz problem is still unsolved. Isn’t it great that you can have a problem named after you even if you don’t have a solution and neither does anybody else? I’ll state the problem in this form:
Each counting number n past 1 has a successor number, as follows:
The number “1” is considered home, and when you’re home, you stop. If you start at a given number away from home, and cycle through its successors, you may end up home. Is there any starting number from which you will not eventually reach home?
You might want to play with this for yourself, to get a feel for how this works. When you are at an even number, the rules of this game bring you half-way home in one step. But if you land on an odd number, the next step yanks you roughly three times as far from home as you were. But at least, because of the 3n+1, you are then guaranteed to land on an even number, and so the very next step will get you closer to home again.
If you start at 1, you’re done. If you start at 2, you’re done in one step. If you start at 3, it gets very interesting already. From 3, you land on 10, next is 5, next is all the way up to 16, but then it is smooth sailing: 8, 4, 2, 1. Here is a summary of the travels for the first six starting numbers:
The Wolfram site from which I linked the problem also shows a beautiful and evocative graph, reproduced below, which links the starting number (on the horizontal axis) with the time it takes to return home. Note that the version on the right includes ten times more data points than the one on the left, resulting in a very compressed horizontal scale. The vertical scale, on the other hand, is not very different from the one on the left.
On the same site, there is mention of experimental verification in 2008 that, yes, indeed, you return home no matter the starting point, at least if the starting point is less than a 5 followed by 18 zeroes. Note that this is not considered proof that any starting number will get you home; a general proof is still elusive.
A Sieve for the Collatz problem
What might a sieve for experimental verification of this problem look like? If you think of the entry in our table for the starting number of 6, you see that the work done there also gives us the entries in the table for 3, 5, 4 and 2 before, but it also gives us the entries for 10 and 16 after. As often with a sieve, we turn the problem around, and instead of looking at successors, we are going to look for predecessors. The following diagram illustrates this:
Each number in the diagram has an obvious predecessor: the number twice as large. So 32 is a predecessor of 16, since the successor of 32 is half of it, 16. But some numbers have a second predecessor. These are the numbers, like 16, where if you subtract one, you get a multiple of three, and the number you get by dividing by three is an odd number greater than 1. Few numbers have a second predecessor, yet enough of them do so making the diagram is actually kind of tricky: keeping it readable is somewhat of a challenge. Of course, the diagram is unceremoniously cut off at the right – it doesn’t actually stop there, but you can continue it indefinitely.
The diagram shows three kinds of numbers, and this will form our connection with the sieve. There are the black numbers, which we will call “settled”. For settled numbers, their path to home is known. Also, their predecessors have been dealt with. The second kind of number is marked blue. These we will call “pending”. Their path to home is known, but their predecessors have not yet been dealt with. And the third kind of number? Well, that’s all the other numbers – numbers that are neither settled nor pending. For unsettled numbers, no path to home is yet known. In the diagram above, 7 is an unsettled number. Unsettled numbers can later become pending, and pending numbers later become settled.
In our diagram, all the pending numbers are to the right. But this is not necessary. The pending numbers form a frontier that is moving rightward and downward. In the sieve, the operation of moving the frontier looks as follows: you find a pending number. For the pending number, n, you compute its double, 2n. You mark 2n as pending, and fill in its successor, n, and its time, which is one higher than the time for n. You now look if n has another predecessor. You compute (n-1)/3 and see if it is a whole number, odd, and greater than 1. If so, you mark it as pending, fill in its successor as n, and fill in its time, all just like you did for 2n. At this point, you mark n as settled. In the snapshots of sieves below, we use “blank” for unsettled, “-” for pending, and “+” for settled. This allows for easy update from pending to settled by changing the “-” into a “+”. We start out the sieve with everything blank except for the entry for 1:
The second snapshot above shows the (single) predecessor of “1” being marked. The third snapshot shows the predecessor of “2” being marked. In the fourth snapshot, we show the result of two steps, and now “8” and “16” are marked, and “16” is pending. The number “16” is the first one with two predecessors. However, its “normal” predecessor, “32”, is outside the range of the sieve as shown. The second predecessor of “16” is “5”, and the last snapshot shows it marked as pending. The process continues until there are no more pending entries. Since the sieve is set up in advance with a fixed range (here shown as 20), this must happen eventually, as more and more predecessors fall outside the range of the sieve. The final entries for the sieve of range 20 are shown below:
This sieve replicates the information we had gained from our early table, though it is shown in a different way. There is more information in the sieve, for example, 12 and 20 show as settled also, though we still don’t know whether starting at “7” will get us home. From this sieve, we can’t conclude that “7” is an exception. It is easy to make a much bigger sieve, though it won’t show well on the screen, and settle all the numbers up to 1000, and replicate the information shown in the left graph above. I assume that there is no easy way to size the sieve up front so that we can guarantee that any number left unsettled below 1000 is an exception – I assume that the general problem of sizing the sieve up front is identical to the Collatz problem itself. (By “easy way” I mean a primitive recursive function.) Yet you should allow at the very least three times as much space for the sieve as the highest number you want to see settled – and three times only allows for a single yank away from home. No wonder, for our sieve of size 20, that we didn’t get to settle “7”. Its successor is already “21”, outside of the sieve – so “21” had no chance of being marked a predecessor at any point along the way.
To settle all twenty entries shown, it turns out a sieve of range 256 is sufficient. Another sieve for the Collatz problem, this one using successors, will be shown in another post.