Though there has been much speculation, no one really knows how Santa Claus does it all. Traditionally, he and his crew spend most of the year making and gathering toys for children all over the world. Then he delivers them all on Christmas Eve and early Christmas morning. The delivery itself creates some alarming statistics for how fast Santa’s sleigh must travel and how much time Santa has at each house, for delivering presents and consuming whatever milk and cookies the children leave for him. But I’m wondering, how can Santa even figure out a flight path? How does he choose the first house or the last house or any of them in between?
First of all, I’m narrowing the scope of the problem to just delivering presents in the United States, particularly the 48 contiguous states (not Alaska or Hawaii). This scope is smaller, though still not exactly small, since the U.S. population was estimated at 330,098,987 as of 12/05/2019 at 6:28 AM EST[1]. It wasn’t a particularly small scope 100 years ago either, as the U.S. population was reportedly 104,514,000 as of July 1, 1919.[2] So in the past 100 years, the population has more than tripled, with an increase factor of about 3.16. I’m assuming the number of children has increased at roughly the same rate, basically tripling in that time. Needless to say, Santa has a lot to do at Christmas time, even when just considering the lower 48 states.
Back to the problem of choosing a flight path, how can Santa visit all those children in the contiguous 48 U.S. states in one night? He obviously needs a plan. The challenge of choosing the order in which to visit each location can be modeled as the traveling salesman problem, from math and computer science.[3] This famous problem involves finding the optimum path – maybe the quickest or shortest – for visiting each location (called a node) exactly once. The time for finding an optimal path grows roughly exponentially as the number of nodes increases. So with millions of nodes, finding the best path can take a very long time.
Can Santa just wing it and pick a path at random? The actual number of possible paths to choose from, grows at a factorial rate. The factorial of a given whole number is of course the product from multiplying together all the numbers from one to that whole number. For just ten locations, Santa first has to first choose one of those ten, then one of the remaining nine, followed by one of the remaining eight, and so on. Add one more node, and now the number of possible paths is eleven times greater. Eventually, factorials grow faster than exponential functions.
All this may sound a little bit alarmist. First of all, the factorial’s combinatorial explosion[4] described above applies only to the number of possible paths. The time required to travel to just one more node, generally does not lead to a factorial increase in travel time. Second, the exponential time for finding an optimal path is – as described – for finding an “optimal” path. A path that is “good enough” can probably be found more quickly. Third, he could potentially compute a reasonable solution during the months prior to Christmas and then follow the computed path when delivering presents.
On the other hand, some of the concern is justified. For the lower 48 states alone, Santa needs to visit millions of children, spread out over a very large area. A sudden change in the weather or in the Nice and Naughty lists can require arbitrary adjustments in Santa’s plan. There are also babies being born every day, and children sometimes have to change address. Come Christmas Eve, Santa cannot rely completely on a precalculated path, nor does he have a lot of time for computing or recomputing a good path. And besides, it’s an interesting problem to look at.
Fortunately, there are some assumptions that can help Santa deliver toys more efficiently. Since some traditions often have him arriving maybe shortly after midnight in the first hour of Christmas, he should probably finish each time zone in one hour. On the bright side, that splits the contiguous U.S. into four smaller chunks. Furthermore, nearly two thirds of Americans live in cities, which are said to take up just 3.5% of the U.S. land area.[5] We might assume that Santa completes a city before leaving it, so each of the four large traveling salesman problems (one per time zone) is now broken into two levels. The upper level problem involves efficiently reaching each city in a given time zone. The lower level involves efficiently visiting the home of every child in a given city (each city with its own instance of the traveling salesman problem). Gifts for the remaining third of the children still need to be delivered as well, but Santa at least could travel larger average distances between households and therefore could fly faster. Also he would be performing once third of the starting and stopping.
So how bad is it in terms of calculations? The time needed for calculating an optimum path in Santa’s various traveling salesman problems is bounded by the worst instance of the problem that he has to deal with. Looking at the upper level of the problem, how many cities are there? According to the Statista web site, in 2018 there were “19,495 incorporated places registered in the United States.”[6] Presumably those are cities, towns, etc. Not all those are in the lower 48 states, and the number may have changed, but that number is close enough for our purposes. What is the highest count of cities per U.S. time zone? It could be close to the average. Assume instead it’s skewed so that about one half the cities are in one time zone. That’s about 10,000 cities.
At the lower level of complexity, what is the most populous city in the U.S.? The U.S. Census estimates that New York, New York, had the most people as of July 1, 2018, at 8,398,748 people. How many households are there in New York City with children? This might take some trickery. A quick search didn’t show how many households are in New York City, but the CCC New York web site estimates there were 1,739,256 children in New York City in 2018.[7] Maybe we can use national statistics to get local estimates. The Census Bureau says that nationally there were 121,520,180 households in 2018, of which 36,878,070 had at least one child.[8] Also in 2018 there were an estimated 73,105,551 children in the U.S. under the age of 18.[9] That’s approximately 0.5044 households per child. Applying that fraction to New York City gives us a total of 877,300 households with children in that city. Granted, in New York City, I suspect the fraction of families living in the same building is higher than the national amount, we’ll overlook that for now. Anyhow, that number of households is much bigger than the number of cities expected to be in a time zone.
Now that we have a better idea of what scale we’re dealing with for the problem itself, how long does it take to find an optimal solution to the traveling salesman problem? Someone at the University of Michigan computed an optimum solution for just 24,978 nodes on a cluster of 96 dual-Xeon computers in 2004.[10] That was reportedly the equivalent of 84.8 years for a single Xeon processor. With nearly 200 CPUs I’m guessing the actual time is more along the lines of 84/200 = 0.42 years. If we apply the 80-20 rule, we might assume a reasonable solution can be found in 20% of that time. That’s still 0.084 years, or 30 days. But now at the end of 2019, we can take advantage of (or maybe misapply) Moore’s law somewhat and guess we can cut that time in half for every two years since 2004. Rounding up the current year to 2020, the CPU speed has supposedly doubled 8 times, so now the problem can presumably be solved 64 times more quickly, coming to maybe a half day (30/64 = ~0.5).
Back to Santa’s problem, we still have more than 30 times as many nodes for New York City as there were in the study at the University of Michigan. If we stick with factorials, the amount of necessary computing time could increase by orders of magnitude. Instead we’ll assume the nodes are further grouped in a size to roughly match the study. Sometimes it’s useful to stick with numbers you have, to just get a general idea of the scale. As for the amount of computing time, a half day isn’t even close to the target computing time of one hour, so we still need to improve the time by a factor of 360 (30 nodes * 12 hours). We already gave up on having an optimum solution, so that “discount” of accepting a non-optimal reasonable solution is not available. Cities typically have some kind of grid structure somewhere. Maybe Santa further divides New York City along some grid lines. At roughly 900,000 nodes, surely it’s possible to divide New York somewhat into smaller chunks. With the simple example of a square New York City, we could divide it into 19 rows and 19 columns (since 19*19 = 361 = ~360), and that would allow the computations to be done in one hour, if done in parallel. We can assume Santa can find a good way to divide the city. Though a cluster of 196 computers is a lot, maybe Santa is able to throw even more computing power (and/or magic) at the problem. We could also allow for continued research into the traveling salesman problem since that study. One way or another, Santa could very well have his path computed for him in under an hour, just in time for each time zone.
So now we have some idea of how Santa can compute a flight path. Admittedly, determining the flight path itself is just one part of the puzzle of Santa Claus. How does he do it all? For the time being, I’m just going to call it a Christmas miracle.
References
[1] https://www.census.gov/popclock/
[2] https://www.census.gov/population/estimates/nation/popclockest.txt
[3] https://en.wikipedia.org/wiki/Travelling_salesman_problem
[4] https://en.wikipedia.org/wiki/Combinatorial_explosion#Arithmetics
[5] https://www.census.gov/newsroom/press-releases/2015/cb15-33.html
[6] https://www.statista.com/statistics/241695/number-of-us-cities-towns-villages-by-population-size/
[7] https://data.cccnewyork.org/data/map/98/child-population#98/a/3/148/40/a
[10] https://www.me.utexas.edu/~jensen/ORMM/models/unit/combinatorics/tsp.html
(c) Copyright 2019 by Mike Ferrell