Jauhar's Blog


Advice For Junior Software Engineer

2024/06/01

I’ve been working professionally as a software engineer for more than 5 years and I want to give you advice to be a better software engineer. There are things that I think would help most people to start their journey in software engineering world.

Learn From Multiple Sources

Most of the content I found on the web is not of good quality. Even the paid one. If you read an article or a tutorial on the web, you should always assume that it might not be good quality content. As a beginner, you are likely can’t tell whether what you read is good quality content or not because you don’t know what you don’t know. So, don’t be too attached to a single source of learning.

This situation happens because people who make a course, a bootcamp, videos, or articles have other agendas Maybe they do it for the money. To make more money, they should create content that is beginner-friendly, because most people are beginners. Beginner content is the content that has the biggest segment because everyone can consume it. If they make a topic that is a little more advanced, of course, it will be consumable by fewer people. They often simplify things to make you understand but it’s technically incorrect. This kind of content can mislead you.

People may also make content for marketing. A company might create a tutorial to use their product like a certain database, certain programming language, or certain framework. Of course, they are biased. They tell you how to build stuff using their product that translate into more revenue on their end. A cloud company may create a tutorial to build a website that uses their compute engine, their CDN, their CI/CD, their database, their object storage, their message queue, their DNS service, etc when you don’t actually need it. They can show you how easy it was to do all that stuff, while actually, you are burning your money if you use all of those services.

To make things worse, most contents on the internet are not reviewed by anyone. You can write a blog post and say anything in your blog without any review. You can also make YouTube videos without being reviewed. This also leads to bad quality contents. And on top of that, people make this worse by promoting bad quality content because they don’t know it is a bad quality content. I once saw an inaccurate post talking about the comparison between MySQL and PostgreSQL that completely wrong, the writer doesn’t seem to know what he’s talking about. The worst part is, it was getting a lot of liked, shared to a lot of places, and got positive comments. I tried to comment on that post, but nothing happen. I get some confirmations from other people, but it has very small impact, and the writer doesn’t even respond.

Be aware that first impression is important and there is a thing called confirmation bias. If you are learning from a bootcamp, and they teach you about MongoDB, they might tell you it is better than a traditional database like MySQL and PostgreSQL. And the thing is, after you hear that for the first time, you will have a confirmation bias. The next time you learn about MongoDB, you will see a lot of benchmarks and arguments that favor MongoDB over the traditional databases. If you read something that says otherwise, you just assume that it is not true and consider the whole article or the author as a noncredible source of learning. Or, it can be the opposite. You might learn from a source that says MongoDB is bad because of this and that, and it makes you learn anything but MongoDB. The truth is, most of the time, it is hard to justify which tech is better than the other. There are companies that are successful by choosing MySQL, there are companies that are successful by choosing PostgreSQL, and there are also companies that are successful by choosing MongoDB.

To reduce the effect of those bad quality content, you need to consume a lot of content from different places. Read from multiple sources and see which one makes more sense. Try to be objective and not biased when learning a new thing. Every time you see someone claiming that you should do this, or this is good, or don’t use this, you should be skeptical about that claim and try to read other content opposite of that claim.

You can also try to learn from more credible sources like books. It is not easy to write a book, and there are reviewers and editors. You can’t just put anything in a book. If someone writes a book, they are in for a long journey. It might take years before the book can be published. Additionally, books also have reviews from people who buy them. Buying a book is not cheap, so we can say that their reviews might be more credible than a like button or comment on a blog post. It’s so easy to click the like button and comment on a post on social media or a blog. Your likes and comments almost have no value unless you are sacrificing something. A book review is more valuable since it’s harder to do.

Another reason to read from multiple sources is because different content focuses on different things. You might read about authentication in one piece of content that focuses more on how to use the libraries but only touches a little bit on the security aspect. Another source might go a little deeper into the security aspect but not how to implement it in practice. You should read both of them to gain more understanding about that topic.

Personally, I don’t like learning from paid resources other than books and universities because they’re rarely worth it, especially if you are learning something as common as web development — at least from an economic point of view. Most of the things you can get from paid sources, you can also get from free ones. Psychologically, buying a course might force you to learn because you have a feeling that you’ve paid for it and feel bad if you don’t watch that course. That’s fine. But for me, that’s not the case. If I buy a course, what will happen is that I’ll become very biased toward that course. Because I already spent my money on it, I don’t want that course to go to waste, and I want that course to be the source of truth. And that will prevent me from learning from other sources. Now, this doesn’t mean I never spend money to learn. I do spend a lot of money buying books, renting AWS servers to do experiments, etc. Most of the time, I spend that money because there is no other way I can learn other than from those resources. For example, I once wanted to learn about compilers, but unfortunately, the resources to learn about compilers are not that plentiful. Most of them also don’t go very deep. I ended up buying a book that goes very deep into optimizing compilers.

Pick A Language And Start Writing Code

What’s the next big thing in tech? Is it worth it to learn Rust now? Should I learn Python because AI and data are big things now? Those are the questions that are often asked by people who want to pursue software engineering. Unfortunately, I never have the answers. To answer these types of questions, people are just speculating, and there is a big chance that they are wrong. Predicting the future is extremely hard. Let me give you some examples.

The field of machine learning has been there for a long time. LLMs that we have today such as ChatGPT and Claude use a technique that was proposed a long time ago. The idea of artificial neural networks has been studied since about 1940. Back then, everything is theoretical and we don’t have enough computing power to train a massive neural network. At that time, nobody would say that machine learning would be a big thing. It seems impossible to train a massive neural network back then. But, as we can see now because we have so much better computing technology, machine learning started to take a leap. On top of that, Python is now the de facto standard for working in machine learning. If you think about it, Python is not the best language for machine learning. First of all, it’s slow. It can call C code to mitigate that, but so does other language like Java, javascript, lua, etc. It’s unclear why Python wins in the field of AI. Nobody can predict this at that time.

It is also hard to predict that JavaScript will become the standard of web technology. If you think about it, JavaScript is not particularly good at what we should have on the web. First of all, JavaScript was not designed well and there is a lot of weird stuff on JavaScript. On top of that, it is hard to make JavaScript fast because of the nature of the language. V8 has done a pretty good job making JavaScript fast by doing JIT compilation, but it’s extremely complicated. As a result, you need to ship the whole V8 engine to be able to run JavaScript efficiently. If I’m about to add a programming language for the web today, I wouldn’t choose JavaScript.

My first programming language was Action Script 2.0, it was used to make games and animations in Macromedia Flash 8. You might never heard it before, and yes, it is already dead, nobody uses it anymore and nobody supports it anymore. I learned it not because I wanted to learn the tech that was popular at that time or will be popular in the future. I learned it because there was a book about making games using Macromedia Flash back then that caught my eye. I was in high school and I liked playing games, but I didn’t know how to make a game. Making a game was also not considered a cool thing back then, but I thought it was cool. I don’t know anybody who knows how to make games, and I don’t even know that games were made using codes. So, I bought that book and just started following the instructions, and made my first game. One thing led to another, I learned about sorting algorithms because I wanted to show the users sorted by their score, I learned about socket because I wanted to make an online game that can sync its state remotely, I learned MySQL to store user score persistently, I learn about how we can display 3D object in 2D monitor, etc. You see, in my journey, one thing leads to another. I started my journey by just wanting to make a game. I don’t even want to sell it or anything. I just thought that it was cool. From that starting point, I learn about data structure and algorithm, networking, database, transformation matrix, etc. If I were to overthink what I should learn back then, I think I’d be too distracted and won’t learn as much as I learned back then.

Programming languages used in software engineering are quite similar to each other. Usually, learning one programming language deeply will make it easier to learn another one. If you know Javascript deeply, you might understand other programming languages like Python, Dart, Ruby, Lua, etc. If you know C++ deeply, I think it might be easy for you to understand most programming languages out there. I think, it does matter which first programming language you choose in the short run, but it won’t matter that much in the long run. Look at me, I started with Action Script, but now I can write Javascript, Python, Java, Kotlin, Rust, Go, Lua, etc.

So, how do you choose your first programming language? Well, my suggestion is to just choose a language where you can get the most help. For example, if you are pursuing web frontend development, just start with either JavaScript or TypeScript. Because that’s what people are using. They don’t have much choice either because browsers nowadays only support JavaScript. If you’re pursuing web development in general, I also recommend JavaScript because you’re going to need to learn it to build the web frontend anyway. If you’re pursuing web backend development, just choose any language you think is easier. Because at this point, what matters is to start coding. In the long run, you can always learn a new language easily. If you’re pursuing a specific topic like AI, blockchain, IoT, or operating systems, try to choose a language that most people use in that particular area. This is because your chance to get help is bigger. If you’re pursuing AI using Go, you might not get as much help as when you’re pursuing AI with Python because most people working in AI use Python, not Go.

Then, the next question is, how do you know which particular topic to pursue? Is it AI, blockchain, web development, game development, or anything else? Well, that’s something that should come from yourself. But, let me tell my story. I started programming by making games because I like to play games. I really enjoy it. In contrast, nowadays, I don’t like to play games that much. I don’t build many games either. Now, I like competitive programming, compilers, and databases. How do I get here? Because, like I said, one thing leads to another. One day I found a book about data structure, and it blew my mind. It’s like a whole other universe opened. Before, I only just wrote code like for loop, do this when one character touches another character, or go to this screen when the player’s HP is zero, etc. Now, after I read this book, I realize there are so many tricks we can do on the computer. I started to solve a programming challenge that looks kinda like leetcode, but harder! I remember spending all my time after school until midnight solving problems. Asking for clues on a Facebook community, read articles about algorithms, etc until one day, I participated in the National Science Olympiad doing competitive programming. Back then when I was spending my time writing games, I didn’t even know that competitive programming is a thing. So, if you don’t know what you like, just start with anything, and try to find what you really like along the way.

Don’t Focus Too Much On Clean Code

First of all, I’m not telling you to avoid learning about clean code. I’m telling you not to focus too much on it. Now that we’re clear, let’s continue.

You may have seen a lot of content talking about clean code and even watched or read such content. Do you know why there are so many people talking about clean code? Because people can understand it. Clean code is beginner-friendly. No matter what technology you are currently learning, whether you are an experienced engineer or a junior engineer, if you’ve written just a few lines of code, you can relate to that topic and understand it. This contrasts with topics like distributed systems. To follow that kind of topic, you need to know more things beforehand, and thus, only a smaller number of people want to read about it. Because of this, there are many people out there making content about clean code. Because they can. They don’t have to be very experienced to make claims about clean code. And clean code is also very subjective. You can’t be wrong when talking about it. That’s why so many people can talk about it. Even if you are a beginner, you can talk about it.

Because there are so many people talking about clean code, it gives us the perception that it is very important. Well, it is important, but not the most important. I think nowadays, people prioritize clean code too much compared to other things that matter more.

Most content I read about clean code is shallow. They tell you what you should do to have “clean code,” but they don’t really tell you why, or what kind of code is considered clean. They also don’t tell you the downsides. Sometimes, structuring your code in a certain way can significantly reduce your performance. Making many abstractions can also hide what really happens in your code and make it hard to debug performance issues. They usually use examples that are very abstract so they cannot be wrong, but these examples are useless most of the time. Or they use examples that are tailor-made to prove their point. But in reality, things are more complicated than that, and those clean code principles don’t matter that much.

The problem is, if you are a beginner, you don’t know what matters and what doesn’t. For example, you read about the single responsibility principle. You want your functions and modules to have a single responsibility, while it is actually unclear what is considered a responsibility. You start questioning whether you should split your function or not. Maybe you have a function that needs to read data, update it if the data already exists, but insert a new entry if it doesn’t. You start to question whether you should split it into three functions or just do it in one function. But you know what? It doesn’t matter which one you do. The point of the single responsibility principle is to make your code more readable and easy to refactor. Whether you split it into three functions or not doesn’t make your code any less readable or any less maintainable.

My suggestion is, don’t focus too much on clean code or things like clean architecture. Instead, focus on the goal you want to achieve. The whole point of clean code is to have maintainable code. The next time you are confused about some choice, just ask yourself: Which option makes my code more maintainable? If I do it this way, will my team be able to understand it? If I do it this way, can I test it? If I do it this way, will I be able to add more functionality to it later?

There is nothing wrong with the concept of clean code. Of course, you want your code to be maintainable, and that’s why you learn about clean code. The thing is, if you are a beginner, you don’t have enough experience to justify how far you should apply the principles of clean code. As a result, you get stuck thinking about whether your code is clean enough or not, when it doesn’t matter that much. If you are wrong, your team will tell you during code review. And the same code might be seen as clean by one person but not by another because it’s subjective. Different companies also have their own conventions about their code. So, don’t focus too much on clean code; you will get it once you have enough experience.

One thing that you should remember when reading about clean code is that these are just pieces of advice, not rules. And there is a reason for that advice. Find the reason first, then justify the advice. For example, you might read somewhere that your function should be small enough to fit on one screen and you should not indent your code more than three levels. If you take this advice literally, you might run into problems. If you use a programming language like Rust, just implementing a trait already takes three indents, so how are you supposed to write an if statement? If you are writing a matrix multiplication, the most straightforward way is to nest three for loops, which already adds three more indentations. Instead, you should ask why your function can’t have more than three indentations. The reason is that people tend to lose track when there is too much nested logic. But it’s not always the case. Matrix multiplication is one example. You can have three indentations to write matrix multiplication and your code is already super readable because people already know what it’s doing. The next time you have a lot of indentation in your code, the first thing you should do is not to refactor them into a smaller number of indentations, but ask yourself whether people can understand it or not. Again, it’s very subjective. If you are working in a field where that nested logic is common, the answer might be yes, people can understand it. But if you are working in another field where that nested logic is rare, you might want to split it into multiple functions.

Keep Asking Why

Since most of the content you read on the internet is of poor quality, every time someone makes a claim, you should always ask “why.” If someone tells you that you should use Kafka for a specific task, you should ask yourself why. Try to come up with alternatives, and ask why you can’t just do it differently. What makes their way better than this? Don’t just accept an answer like “Because Kafka is designed for this workload.” Stay skeptical. If that’s the reason, how does Kafka achieve that? Why can’t MySQL be used for that workload?

I once asked about how to trigger a periodic action in a web application. There were two servers, and every midnight, the first server wanted to trigger the second server to start fetching data from somewhere. Their solution was to send a message through Kafka at midnight, and the subscriber would start the process when it received this message. I thought there must be some limitations that required them to use Kafka for this. So, I kept asking if they had this or that constraint. I kept asking, but I just couldn’t justify why they needed Kafka for this. At the end of the discussion, I understood that they used it because they had read somewhere that Kafka is used to trigger background jobs. They didn’t know the reason why people use Kafka for that. As a result, they used Kafka when they could have just made a simple HTTP request to trigger the process, or even just sent a TCP message. Now, instead of a simple solution, they need to deploy Kafka, and of course they need Zookeeper. They spend too much money for such a simple job. If they had known why Kafka is used to spawn background tasks, they wouldn’t have gone that route. Kafka is used because background tasks usually run for a long time. In the afternoon, the producers may produce more tasks than the handler can handle, and thus we should put them in a queue to not block the producer. Later at night, since the number of new tasks is smaller, the consumer can complete all the tasks. But if you have only one task, there’s no point in doing this. Just run your task directly.

Software engineers like to use buzzwords nowadays. Terms are rarely standardized. Even if they are standardized, people don’t always use the standard meaning. People like to use different terms when talking about the same idea. Additionally, they also like to use one term to describe different ideas in different contexts. One example is the term we use to describe the HTTP path. People call it by different names like endpoint, path, route, API, etc. They mean the same thing, but we use different terms for it and we don’t seem to agree on the standard term. Another example is the term “strong consistency.” People in databases often use this term. In an academic context, people often describe strong consistency as linearizable consistency. But in practice, you often see a bunch of services claiming that they support “strong consistency,” while their definition of strong consistency is read-your-writes consistency. AWS S3 is one of them. They claim that S3 is strongly consistent, but it’s actually read-your-writes consistent. They technically aren’t wrong because the term “strong consistency” doesn’t have a standard meaning. But it’s very misleading. Another example is the terms clean architecture, onion architecture, hexagonal architecture, and layered architecture. They mean the same thing. The idea is to split your application into layers where each layer has its own concern. The outer layer depends on the inner layer. That’s the whole idea, but somehow they have many names for this concept. Be critical when reading these kinds of terms. Ask yourself why you should use this or that, and you’ll often conclude that they actually have the same reasoning—they’re talking about the same thing.

Often, when you learn a new concept, there are many details that are not clear to you. For example, when you learn about logging, you may have no idea how much you should log. Should you add logs for every layer in your app? Should you log all the requests coming to your app? Should you add logs here and there? It’s unclear. To be able to answer those questions, try to ask yourself why you log in the first place. What is the overall intention of logging? Once you can answer that question, you can use that answer to answer your other questions. The point of logging is for debugging. So the ultimate goal of your logging strategy should be to make it easy for you to debug. Now, think about what might go wrong. If something goes wrong, what would you want to know next? That’s the answer to your question. You should log in the places where having that log would help you debug.

Do Experiments

Experiments are very important to strengthen your understanding. Remember that most of the content you find on the web is not of good quality. Doing experiments is a way to verify the information you just learned. To check whether a claim is legitimate or not, sometimes you just need to prove it yourself. For example, you read that using UUID makes your database slower compared to ULID. Well, this is just a claim, and they might give you benchmark results, but how do you know they’re telling the truth? Well, you can test it yourself.

Reproducibility is important to validate a claim; if you can’t reproduce their results, then their claim should be questioned. I’m a little bit sad because people often do this nowadays. They make claims that can’t be proven because their code, their data, and their machines are not accessible.

Another reason to do experiments is to make sure that you understand what you read correctly. For example, you might read that you should do full-text search using Elasticsearch instead of SQLite because Elasticsearch can scale well while SQLite can’t. But “scale well” is a very ambiguous term. You can try experimenting with it yourself. If you have 100 documents in your database, your SQLite might perform better compared to Elasticsearch. You’ll start to understand that Elasticsearch matters only when you have really big data that can’t be handled by one machine.

Master Your Tools

To learn faster, you need to iterate faster. In basketball, people who are very good at shooting the ball into the ring have probably already done it thousands of times. They’ve failed more times than you have tried. Now, if you practice for one hour using one ball, you might not perform as well as someone who practices with ten balls because they shoot more than you. You spend time picking up the ball, while they do it less. Some people might even hire others to pick up their balls so they can focus on shooting without wasting their time retrieving them.

Software engineering is the same thing. What you do most of the time is writing code, running it, finding problems, trying to debug it, fixing it, and running it again, etc. It’s a cycle. The more you do this, the better you become. That’s why it’s critical for you to master your tools. You should be able to use your code editor fluently. Navigation, editing, searching, etc., should be very natural to you. The same goes for other tasks like debugging, running, testing, etc. This is why we have many tools to help us iterate faster. In web development frameworks, we have hot reloading so we don’t have to recompile our code in order to run it. We also have developer tools in Chrome to debug our code. We have tools like Postman to test our HTTP endpoints, etc.

Even though I don’t put this advice at the top, please master your tools. Nowadays, people don’t even really understand file systems—what a directory is, what a file is, how to find files, etc. Programming languages that we use today rely heavily on these concepts. To understand their module systems, you need to understand what files and directories are. Sometimes, just to install a programming language compiler, you need to understand environment variables. Your OS is part of the tools you use to program; it’s the basic one, so please be comfortable working with it.

Read Other People Code

Reading other people’s code is very helpful for improving your programming skills. Sometimes, you can program things, but you’re not sure if you are doing it correctly, effectively, and efficiently. This is where reading other people’s code can help you. Reading other people’s code exposes you to how real software looks. Just because your program looks fine doesn’t mean it is good enough. In industry, it is important to make your program maintainable. Often, what you learn in bootcamps or tutorials is very far from what people actually do. In the real world, things get complicated. Reading other people’s code is really helpful to see these differences.

Sometimes, you can also find new ways of programming by reading other people’s code. I remember back then I read the source code of CodeIgniter. It opened up a bunch of new things that I had never learned or known before. I started to see the patterns they used. I even realized that there were a bunch of features in PHP that I never knew existed. Sometimes, they also teach you new things. You may read code that you don’t understand and encounter words that you’ve never learned before. For example, you may read the word “idempotent” in their source code. Maybe it’s in a function name, variable name, a comment, or something else. You might not know what this word means, or at least what it means in the context of software. You can start looking up that keyword, and now you’ve learned a new thing.

Competitive Programming

I put this advice at the very end because I think it won’t be useful for most people. But I think the thing that improved my programming skills significantly in a short amount of time was competitive programming. Competitive programming strengthened my basics because it covers very broad topics like sorting, statistics, hashing, data structures, etc. Not only does it have broad topics, but it forces us to understand them deeply. You can’t just learn sorting to solve competitive programming problems; sometimes you need to modify your sorting algorithm. It makes you understand that sorting is not as simple as it seems—you can do various tricks with it. You start to gain some intuition about these topics. Now, the next time I read about B-trees in databases, I immediately understand what they mean, and reading about them is very easy.

I think competitive programming is almost like a gym. I learn some basic concepts that don’t seem to mean anything and don’t have any particular application, but they actually strengthen my computational thinking. If you often move your body and train it to be more agile, you might not automatically be good at martial arts. But when you do learn martial arts, you will improve faster than people who rarely exercise their body. Competitive programming is kind of like that. Because I’m very used to thinking about data structures, algorithms, proofs, etc., reading about complex things like the Raft consensus algorithm is very easy for me. I know what it does, why it does it that way, what its limitations are, and how to modify it pretty quickly.

Thinking is not easy. Sometimes we are just too lazy to think about things, just like sometimes we are too lazy to do activities like exercise or cooking. In my opinion, this is because we are not used to doing these things. If we rarely cook, or have never cooked before, we might be too lazy to start because we think about how messy and exhausting it will be. However, if we have done something thousands of times, it won’t be as exhausting. It will feel like a simple task because everything becomes easier once we’ve done it many times. Doing a lot of competitive programming helps us get used to thinking computationally. The next time we need to think about something, it won’t be as hard because we are used to it. As a result, we become more critical when facing problems in software engineering.