Big O is often used to describe how fast an algorithm is. However, it’s not quite right. It doesn’t measure the speed of the algorithm. Rather, it categorizes algorithms based on how radical their growth is. Formally, we say that it describes the complexity of the algorithm. This can be quite confusing for most people because the word “complex” is usually used to describe things that are difficult to understand. Additionally, the term complexity described by Big O is not for humans, but rather, for computers. It is even more confusing because it’s unclear what it means for a problem to be complex from a computer point of view.
Formally, we can write \(f(x) = O(g(x))\) which read as “\(f(x)\) is big O of \(g(x)\)”, if there exists a positive real number \(M\), such that:
$$ |f(x)| \le M g(x) \quad \text{for all} \quad x \in D $$where \(D\) is the domain of \(f\). To put it simply, we say \(f\) is big O of \(g(x)\), if \(f\) is bounded by \(g\). If we can find a function \(g\) and constant \(M\), such that \(M\times g(x)\) is always be greater than \(f(x)\), we can say that \(f\) is bounded by \(g\) and \(f\) is big O of \(g\). Well, if we want to be very pedantic, it’s not really true, but it helps you to understand what big O is.
Here are some examples:
- \(x^2 = O(x^2)\), well because \(x^2 \le x^2\).
- \(10x^2 + 1000x = O(x^2)\), because \(10x^2 + 1000x \le 1010x^2\), so by choosing \(M=1010\), we can bound the function \(f\) with \(x^2\).
- \(x^3 + 2x^2 + 3 = O(x^3)\), because \(x^3 + 2x^2 + 3 \le x^3 + x^3 + x^3 = 3\times x^3\). So, by choosing \(M=3\), we can bound the function \(f\).
- \(x^3 + 2x^2 + 3 = O(x^4)\), because \(x^4\) always bound \(x^3\).
- \(sin(x) = O(1)\), because \(sin\) function always output real number between zero and one, so \(sin(x) \le 1\).
- \(\frac{10}{1 + e^{-x}} = O(1)\), because this function always output real number between zero and ten. By choosing \(M = 10\), we can have \(\frac{10}{1 + e^{-x}} \le M _ 1 \iff \frac{10}{1 + e^{-x}} \le 10 _ 1\).
- \(1 + \frac{1}{2} + \frac{1}{4} + \frac{1}{8} + \dots = O(1)\).
However, you can’t say that \(x^2 = O(x)\) because no matter what \(M\) you choose, there is always an \(x\), such that \(x^2 > M\times x\).
Now, what is this \(f\) all about? Well, technically, it can be anything. But, usually, we can think \(f\) is some metric in our program that somehow represents the amount of work needed to be done by our algorithm. The number of “instructions” can be one of them. We can also think of it as the number of lines executed. It depends on what kind of thing we want to measure. In some cases, we don’t really care about CPU, and use the big O for the number of IO. It’s up to us. Then, \(x\) is the input of whatever \(f\) you choose. Usually, it’s the size of our input. For example, in the context of sorting, we can define \(f(x)\) as the number of instructions needed to sort an array of \(x\) integers.
Imagine you have two sorting algorithms, algorithm A and B. Algorithm A needs \(1000N^2\) instructions to sort \(N\) integers, and algorithm B needs \(2N^2\) instructions to sort \(N\) integers. You may notice that both of them is \(O(N^2)\), even though the first algorithm clearly needs 500 times more instructions. The big O notation doesn’t really track the constant. The constant doesn’t matter. What matters is only the growth. Both of the algorithm grows roughly the same way. If you make the input twice as large, they roughly need twice as many instructions.
People use Big O with some assumptions. In the context of an integer sorting algorithm, we usually assume that comparing two integers has a constant amount of work. On a modern CPU, comparing two 64-bit integers is done using a single CPU instruction, and checking the result is one more CPU instruction. Which means we need roughly two CPU instructions for comparing integers and using the result. However, imagine if your integers are very big, like \(S\) bytes in size. Comparing two \(S\) bytes integers now may take roughly \(2\times S\) of instructions. In this case, an \(O(N^2)\) algorithm that assumes comparation is \(O(1)\), is actually \(O(N^2\times S)\) if we consider the comparation to be \(O(S)\).
Another example is calculating the \(N\)-th term of the Fibbonacci sequence using matrix exponentiation. People claim that we can calculate the \(N\)-th term of the Fibonacci sequence using matrix exponentiation with \(O(log N)\) complexity. However, this has one big assumption, i.e., multiplication is \(O(1)\), which is true for most competitive programming problems because they usually want the answer to be modulo by \(10^9+7\), forcing it to fit on a 32-bit integer, which takes roughly 1 CPU instruction to do the multiplication. In reality, the Fibonacci sequence grows quite fast. The 100th term of the Fibonacci sequence has more than 20 digits. Multiplicating big integers shouldn’t be considered \(O(1)\). In fact, there are integer multiplications with various complexities like Karatsuba with \(O(N^{log_2 3})\) complexity, and fast-fourier transform with \(O(Nlog N)\) complexity.
So, the complexity of the algorithm depends on what our assumptions are and what the variables we care about are. Technically, we can say that every problem in the competitive programming sites like Codeforces and Leetcode is \(O(1)\), because their input is bounded by a constant constraint. Take Median of Two Sorted Arrays problem, for example. If you care with the constraints (\(m\) and \(n\)), you could find an efficient algorithm with \(O(log(n+m))\) complexity. However, since \(m\) and \(n\) are technically bounded by \(1000\), you can say that your algorithm is \(O(1)\). Whatever input you give to your algorithm, since the constraint is \(0 \le m, n \le 1000\), your algorithm will stop in some maximum steps. It is just not very useful for the reader.