My Mathematical Journey: Factorization and Primality Testing

By: David Bressoud @dbressoud


David Bressoud is DeWitt Wallace Professor Emeritus at Macalester College and former Director of the Conference Board of the Mathematical Sciences

Just Equations has recently published an important new study: Charting a New Course: Investigating Barriers on the Calculus Pathway to STEM

As I start the Launchings columns for this new year, I ask for an indulgence from my readers. I stopped teaching almost six years ago. I have now stepped away from my position at CBMS. This is a time in my life when I am looking back over the journey that has brought me to this point. My story is one of a love affair with mathematics and a growing appreciation for the difficulties of effective teaching. I intend to use this column to reflect back on my own development as a mathematician and a teacher of mathematics. This is not to say that I will not allow myself to share developments in mathematics education as they come to my attention, but my primary focus for the foreseeable future will be to share my story.

Each episode of this story needs a theme. It occurs to me to build them around the mathematical problems or questions that have most inspired me. I am starting with the topic of my first textbook, Factorization and Primality Testing.

Problem: Given a very large number, say 2^128 + 1 = 340282366920938463463374607431768211457, find its factorization into primes.

This resonates with my first real encounter with mathematics. Throughout elementary school, mathematics seemed to be about nothing more than memorizing procedures. I found it dull and uninteresting. In 7th grade, I was lucky enough to be taught by Mr. Checkley. In a meeting outside of class, he shared with me the remarkable rules for determining divisibility by 3, 9 or 11. And then he challenged me to explain why they work. I could see that they were true, but I struggled with the explanation. I was in an accelerated group, and the following year I took Algebra I. What struck me the most was that this course gave me the tools I needed to explain Checkley’s rules.

During my graduate school years, I shared this story with my doctoral adviser, Emil Grosswald. He told me of a similar experience when he was a student in a German gymnasium. Knowing the roots of a monic quadratic polynomial, it is easy to determine where it hits its minimum value. What about a monic cubic polynomial? If it has three known roots, where will the local maximum and minimum be located? He wrestled with this problem for most of a year. The following year he was introduced to calculus and was struck by how it simplified an apparently intractable problem. That, of course, is the power of good mathematics and what should be the driving force behind any mathematics instruction, that it provides powerful tools for attacking difficult problems that can capture a student’s imagination.

Back to Factorization and Primality Testing. I earned my PhD in 1977 and was hired on a two-year contract at Penn State. That December, I went to my first number theory conference, the West Coast Number Theory Conference, started by Dick and Emma Lehmer in 1969 and held that year at UCLA. In succeeding years I often went back to that conference. It is where I first met Dick and Emma as well as Paul Erdös, John Brillhart, Sam Wagstaff, Richard McIntosh, Carl Pomerance, Hugh Williams, and many others.

Factorization and Primality Testing

During the 1980s, no topic at that conference was as hot as the problems of factorization and primality testing, driven by the arrival in the late 1970’s of the RSA public-key crypto-system. How hard is it to factor a very large number, say an integer with 100 digits? Trial division is hopeless. On average, the largest prime factor of n is about n^0.62. Of course, we only have to check the prime divisors up the second largest, which will be about (n^0.38)^0.62, or roughly n^0.24. How many primes do we have to check? The prime number theorem tells us that the number of primes less than x is approximating x/ln(x). For a hundred digit number, we are looking at roughly 6.7•10^21 primes that need to be checked. That will take a while. If we can manage to check a billion primes a second, we can expect this will take us over 200,000 years.

The key to fast factorization was a simple observation by Maurice Kraitchik (1882–1957). If we can find a “random” pair of perfect squares with the property that x^2 ­– y^2 is divisible by n, then there is at least a 50-50 chance that some of the factors of n divide xy while the remaining factors divide x + y. Taking the greatest common divisor of n and  xy (a very fast and efficient algorithm going back to Euclid) will yield a nontrivial factor of n. There is no guarantee that this factor is prime, but we have cracked our gigantic number into the product of two smaller integers on which we can repeat the process until we get primes. 

The problem, of course is generating such pairs that are truly random. Any obvious choices, such as a pair of integers that differ by a multiple of n, are guaranteed to fail. The divisor of xy will be n, while the divisor of x + y is 1. It is not very useful to know that n = n*1. It was Lehmer and Powers in 1931 who realized that the continued fraction algorithm could be used to find such pairs. But their approach was labor intensive and suffered from a basic defect in this era of mechanical and human calculators: If a given pair did not work, did that mean an error had been made in the calculation or that this was simply one of the 50% or so that do not work. By the 1960s, computers were sufficiently advanced and reliable that the Lehmer-Powers approach could be coded. John Brillhart and Michael Morrison used it to factor the 7th Fermat number: 2^128 + 1, roughly 3.4•10^38.

By the mid-1980s, sixty-digit integers were being routinely factored, but the Continued Fraction approach was reaching its limits. Adding one more digit to the number to be factored multiplies the size of the integer by roughly ten, at least tripling the running time for the algorithm. A better approach was needed. Kraitchik himself as well as many others had proposed methods that relied on the mathematics of quadratic residues. The problem was that they all required prodigious amounts of computer memory. But now the cost of computer memory had come down dramatically. Carl Pomerance’s algorithm, the Quadratic Sieve, became the method of choice, succeeded in following years by an assort of refinements and extensions. Technology was driving the directions for mathematical discoveries.

Trial division by primes is deterministic. You know exactly how long it should take and when you are done. If you have not found a prime factor by the time you reach the square root of n, then n must be prime. The methods that grew out of Kraitchik’s approach are probabilistic. They operate orders of magnitude faster, but there is a positive probability that on any given iteration they will fail. How do you know when you are done? Each time you apply the algorithm—which might take hours, days, or even weeks to run—you have a positive and nontrivial probability of failing to break a composite number into a product of two smaller integers. But what if your number actually is a prime? These probabilistic methods do not distinguish between bad luck and trying to apply the algorithm to a prime.

Fortunately, there is an easy way to test whether or not a number is prime that works in most cases. Fermat proved that 2^(p–1) is always one more than a multiple of p when p is a prime larger than 2. If 2^(n–1) is not one more than a multiple of n, then we know that n is composite. And calculating a^k modulo n is very fast. The number of steps is at most a small multiple of the number of digits in k. However, there are composite numbers that pass this test, known as pseudoprimes. It takes additional work to verify that a number really is prime and that we can stop trying to break it into smaller pieces.

I was fascinated by the fact that the primary tools for factorization and primality testing—modular arithmetic, continued fractions, and quadratic residues—are the stuff of a first course in undergraduate number theory. RSA itself is based on Euler’s theorem that if a and n are relatively prime, then a^phi(n) is always 1 more than a multiple of n, where phi(n) is the Euler totient function, the number of integers 1 through n that are relatively prime to n. One could build an entire undergraduate course in number theory around RSA and the problem of factorization into primes. By the time I decided to offer such a course, elliptic curve methods were demonstrating their superiority, but it was not too much of a stretch to bring the basic ideas of elliptic arithmetic into it.

Dan Shanks (1917–1996)

In the fall of 1987, I taught a number theory course at Penn State built around these problems. I had previously used Niven and Zuckerman for undergraduate number theory. It is clear, accurate, and complete. But I have found it uninspiring. Instead, I chose Solved and Unsolved Problems in Number Theory by Dan Shanks, someone else I knew from the West Coast Number Theory Conference. That text is built around problems. The official list consists of perfect numbers, repeating decimals, and Pythagorean triples, but in fact everything that is presented is motivated by an interesting problem. I really liked it. Aspects of it, especially the treatment of continued fractions, fed easily into what I needed, but it was not quite right. I began to write extensive lecture notes to complement the book. Given these notes, it was a natural next step to publish them as a book, which I pounded out on my brand-new Apple IIe. Some of the idiosyncracies in this book, such as using X for every product, are a result of the fact that it was typed in ordinary Apple IIe text and I was not certain how it would translate when typeset. That is my only book for which I did not type the manuscript myself in LaTeX.

By the spring of 1988 I had a complete manuscript that I sent out to every potential publisher I could think of. No one was interested, no one except John Ewing who was then on the editorial board for Springer’s Undergraduate Texts in Mathematics. I met John at the summer meeting in Providence that year. He was very enthusiastic and encouraged me to publish with Springer. My book came out in the spring of 1989.

A word on the cover, which illustrates the geometry behind addition on an elliptic curve. This was before the days when a desktop computer could produce such an image with the required clarity and precision. My father was a graphic artist, working in the advertising division of Bethlehem Steel. I asked if he would draw it. He said that to be done properly, it needed to go to a professional draughting house, which was done. Springer did choke when I presented them the bill of $300, but given that the book is essentially devoid of other illustrations, they did reimburse me.

I am amazed that 32 years later it still sells. Not a lot, but Springer reports sales of several copies each year. The book is now woefully out of date in terms of explaining methods for factoring large integers. The field has moved on. The quadratic sieve has been supplanted by the number field sieve that has succeeded in factoring 2^1193–1, a 360-digit monster whose second largest prime factor has 104 digits. Today’s algorithms go deep into graduate- and post-graduate-level mathematics. For anyone interested in knowing the latest results, Sam Wagstaff still maintains a website of the records, which can be found at https://homes.cerias.purdue.edu/~ssw/cun/index.html. Sam also has his own book on factorization and primality testing, The Joy of Factoring, published by the AMS in 2013.

Despite its age and the fact that the methods are now so far advanced, I have an affection for this book. I still like how it motivates the study of number theory. It prefigures all of my other textbooks in exhibiting a clear story line, a narrative to help students understand why anyone cares about this particular topic.


Download the list of all past Launchings columns, dating back to 2005, with links to each column.