Jauhar's Blog

Why Circular Import Is Illegal In Go


One surprising fact about Golang is that they don’t allow circular import in our code, unlike the other mainstream proramming languages like Java, Javascript and Rust. When talking about circular import, some people like to tell me that “If your code has circular import, then it is not well designed”. I don’t understand this argument. A lot of concept in computer science are circulars. Most (if not all) tree data structure are circular, a node has parent which is also has a node, a node also can have zero or more children which are also nodes. Not just data structure, function can also circular, and they are called recursive function. Circular data structures and recursive functions are definitely not a bad design. A lot of software built in these concept.

There are actually several cases when circular “things” don’t make sense. One of them is because cirular “things” forms a loop and loop can run infinitely. If you think about it, loop looks like something circular. If you see your program as a CFG (control flow graph), you will indeed see a graph with a node that can have a path to itself. The problem with loops is that they can go forever infinitely. And when things go forever, you can’t iterate them because it will never be finish. Some programming language requires size information of all type defined in the program, especially compiled programming languages. Consider Go program below:

 1// available at: https://play.golang.com/p/wXHna8BPwPf
 2package main
 4import (
 5	"fmt"
 6	"reflect"
 9type User struct {
10	ID   int64
11	Name string
14func main() {
15	fmt.Println(reflect.TypeOf(User{}).Size()) // prints 24

In the above program, we can see that the User struct has size of 24. This is because it contains one int64 field called ID which has 8 bytes in size, and string that has size of 16 bytes. Go’s string has size of 16 because internally string is just a pair of 8 bytes integer, one to represent the length of the string and one 8 bytes integer that represent the address of the underlying memory address that store the raw bytes of the string in UTF-8 encoding (Note that some architecture might use smaller integer size to represent the len and pointer).

Now, when you have a circular data structure in your Go program, the compiler might have an infinite loop when calculating the struct size. Consider Go program below:

 1// available at: https://play.golang.com/p/-ys0tucyTJQ
 2package main
 4type User struct {
 5	ID   int64
 6	Name string
 7	Pet  Pet
10type Pet struct {
11	ID    int64
12	Owner User
15func main() {}
17// Compile error message:
18// ./prog.go:3:6: invalid recursive type User
19// 	./prog.go:3:6: User refers to
20// 	./prog.go:9:6: Pet refers to
21// 	./prog.go:3:6: User

If you think about what the programmer trying to achieve in the above program, there is nothing wrong with it. They create a User with reference to the user’s pet, and the pet has reference to the owner, nothing’s wrong with that. In fact, if you define a structure like this in javascript, python, java, kotlin dan dart, you won’t get any error. This is because the struct size calculation can’t be done. The size of struct user is effectively can be expressed as: sizeof(int64) + sizeof(string) + sizeof(Pet), and the size of struct Pet can be expressed as: sizeof(int64) + sizeof(User). You can see based on two expression above, we can’t really calculate the size of the struct User and Pet due to infinite loop:

sizeof(User) = sizeof(int64) + sizeof(string) + sizeof(Pet)
sizeof(User) = 24 + sizeof(Pet)
sizeof(User) = 24 + sizeof(int64) + sizeof(User)
sizeof(User) = 32 + sizeof(User)
sizeof(User) = 32 + sizeof(int64) + sizeof(string) + sizeof(Pet)
sizeof(User) = 56 + sizeof(Pet)
... and goes forever

You can see in the above calculation that we will never be finish calculating the size of User struct. Another way to look at it is that the User struct has infinite size and thus can never fit in the memory. Rust, C, C++ and Go don’t allow this kind of struct definition because of this very reason. But, why do Javascript, Python, Java, Kotlin and Dart allow this? To answer that, take a look at a slightly modified version of above code:

 1// available at: https://play.golang.com/p/PdcaMUEvY_1
 2package main
 4type User struct {
 5	ID   int64
 6	Name string
 7	Pet  Pet
10type Pet struct {
11	ID    int64
12	Owner *User
15func main() {}

We change the Pet.Owner from User into *User and suddenly it compiles just fine. This is because internally pointer is just a number that represent an address. in 64-bit computer, pointers have size of 8 bytes, and in 32-bit computer, pointers have size of 4 bytes. No matter what type behind the pointer is, we can always refer their position in the memory as a 8 bytes / 4 bytes number. Now, the size calculation can be done like this:

sizeof(Pet) = sizeof(int64) + sizeof(*User)
sizeof(Pet) = 8             + 8
sizeof(Pet) = 16

sizeof(User) = sizeof(int64) + sizeof(string) + sizeof(Pet)
sizeof(User) = 8             + 16             + 16
sizeof(User) = 40

Now, back to the question, why do Javascript, Python, Java, Kotlin and Dart allow this? Well, it’s because of the very same reason. When we define a struct (or rather, a class) inside another struct, we actually define a pointer to the inner struct, not embedding the inner struct inside the outer struct. Condider the java code below:

1public class A {
2    B b;
5public class B {
6    A a;

In the above code, the field b in the class A is actually a pointer to object B. That’s why we used to say that everything is pointer in Java. The same reason goes for Javascript, Python and Kotlin.

So, that’s one reason why having circular data structure might not be ok. Now, let’s experiment a little further. Let’s take the circular code that compiles above and split them into different package.

 1// user/user.go
 2package user
 4import "example.com/circularimport/pet"
 6type User struct {
 7	ID   int64
 8	Name string
 9	Pet  pet.Pet
12// pet/pet.go
13package pet
15import "example.com/circularimport/user"
17type Pet struct {
18	ID    int64
19	Owner user.User
22// main.go
23package main
25import (
26	"example.com/circularimport/pet"
27	"example.com/circularimport/user"
30func main() {
31	fmt.Println(reflect.TypeOf(user.User{}).Size())
32	fmt.Println(reflect.TypeOf(pet.Pet{}).Size())

Above code won’t compile with this error message:

package command-line-arguments
        imports example.com/circularimport/pet
        imports example.com/circularimport/user
        imports example.com/circularimport/pet: import cycle not allowed

You see there, the exact same code only splitted into two different package, and suddenly it is not ok according to Golang’s compiler. Why is that? We know that the code makes sense, we know the size of all of those struct is computable. And in fact, if you use Rust, the code above might still compile successfully.

One of the most lovely feature of Golang is its fast compilation. And Go does this by using a technique called incremental compilation. Note that compiled language like C, C++ and Rust also support incremental compilation. But, C++ and Rust can be slow to compile due to the heavy use of generic. Using generic requires compiler to do a process called monomorphization which can make incremental compilation not optimal.

So, what is incremental compilation? Basically, it is like a cache. The idea is that: if you already compile a piece of code, and you don’t change it, you don’t need to compile it again next time. You might notice that if you compile your Go project for the first time, it takes a lot of time compared to the second time. This is because the second time you compile your Golang project, the compiler only need to recompile the piece of code that you change only. Well, sort of. Now, what I mean by “piece of code” is actually called compilation unit by compiler. In Go, one package is considered as one compilation unit. Which means, the cache works in package level, and everytime you change a package, the whole package will be invalidated and recompiled. This is not 100% true, but close enough. Sometimes, if pacakge A imports package B, changing package B can trigger recompilation of package A.

Now, the way incremental compilation work is by using calling convention. In order to call a function, in assembly level, you don’t need to know the callee function body, you only need to know the signature. So, if you have a package main, and you want to call os.Open, you don’t actually need to compile os package in order to generate assembly for main package, you only need the signature. You only need to know what are the arguments to pass, and what are the returned value.

The fact that we use incremental compilation itself doesn’t necessarily we need to block circular import. If package A import package B and package B import package A, the incremental compilation can still happen. In order to compile A, we don’t need B to be compiled first, we only need to know their signature. So, we can compile package A and B separately and then combine them in linking phase. Now, it’s still mistery why Golang blocks circular import.

To understand why circular import is blocked, we need to talk about global initialization. In Golang, we can have global variable that can be assigned with initial value. Consider code below:

 1// Available at: https://play.golang.com/p/96idEMAe6oF
 2package main
 4import (
 5	"fmt"
 8var PI2 float32 = PI * PI
 9var PI float32 = 3.14
11func main() {
12	fmt.Println(PI, PI2)

In the above code, we have two global variables, PI and PI2, and we assign 3.14 to PI and PI*PI to PI2. If you look closely at the code, I intentionally put the PI2 above PI. To compute the initial value of PI2, we need to know the value of PI. Golang know this, and because of that, Go will initialize PI first before PI2. Now, if you have circular initialization, Golang actually won’t compile your code:

 1// Available at: https://play.golang.com/p/nOlfgNEhdCi
 2package main
 4var A int = B
 5var B int = A
 7func main() {}
 9// Compile error message:
10// ./prog.go:3:5: initialization cycle for A
11// 	./prog.go:3:5: A refers to
12// 	./prog.go:4:5: B refers to
13// 	./prog.go:3:5: A

This is because the same reason as circular data structure. Golang will never know what is the initial value of A and B. When trying to calculate the initial value of A, we need to calculate the initial value of B, which then we need to calculate the initial value of A, and it will goes forever infinitely. That’s why circular global initialization is not allowed.

Circular initialization can be more complex, consider the code below:

 1// Available at: https://play.golang.com/p/3QfLoapLjwi
 2package main
 4var A int = f()
 5var B int = A
 7func f() int {
 8	return B * 2
11func main() {}
13// Compile error message:
14// ./prog.go:3:5: initialization cycle for A
15// 	./prog.go:3:5: A refers to
16// 	./prog.go:6:6: f refers to
17// 	./prog.go:4:5: B refers to
18// 	./prog.go:3:5: A

The code above is also blocked by Golang since it’s unclear how the intialization should work. Effectively, A should be initialized with two times B, but B should be initialized with A. It’s infinite loop all over again.

Now, imagine if the function f above is not defined in package main, but in some other package. Can the compiler tell if there is circular intialization? Well, technically yes, but they won’t be able to do incremental compilation. Think about it, in order to know if there is a circular intialization, you need to know the dependency of all global variable initialization. In the case above, in order to check if there is circular initialization for calculating initial value of A, you need to know what are other global variables required to calculate A, and what are the functions need to be called to calculate A. And you need to know what are the global variables and functions required to calculate the dependency of A, and this process go on until you sure there is no cycle. In the case above, to calculate A, you need f, and f requires B, and B requires A, then we conclude that there is a cycle. Now, in order to know that f requires B, you need to inspect the function body of f. If f is defined in other package, then you need to inspect that package, which means if you change the main package, you are no longer allowed to only recompile main, because there is a chance that there is a package that main depends on but they also depends on main and their global value initialization need to be changed.

Now, if that’s the case, how come Rust allow circular import? Well, technically Rust doesn’t allow circular import either. Modules in Rust is not the compilation unit. Go’s compilation unit is package, and that’s why circular package are not allowed. Rust compilation unit is actually crate, not module. And it is actually true that you can’t have circular crate in Rust.