Beginner guide for novice researchers
Why am I writing this?
I was going to be officially be a mentor to 2 students starting next week and I was trying to figure out what all stuff would help them right off the start. Once, I came up with some things I realised that it is perhaps better to structure it properly and maybe make it useful for anyone (and for future students I may work with).
I believe that the advice might go down well with concrete examples. Personally, I tend to remember things better if they have some accompanying fun analogies to go with the serious explanations. So for each “technique”, I’ll try to explain it seriously and with a corresponding analogy from a popular movie.
Selecting an advisor
Selecting an advisor is probably the hardest and most important decision you will make in your academic career. You are basically trying to find the specific person that will not only introduce you to a specific research area but also essentially train you to do independent research for the entire duration of your apprenticeship (Master’s or PhD)! You are going to be working with this person on a regular (almost daily) basis for the next 4 to 5 years of your life. So, if you don’t get along with this person, it really doesn’t matter if that person is objectively the best researcher in your chosen field.
Movie analogy: Let’s consider some of the final scenes of Indiana Jones and the Last Crusade. (Spoiler Alert duh!) The templar knight asks Indy and the Nazi dude to choose the correct cup to find the Holy Grail. The Nazi dude chooses poorly and gets turned into a skeleton instantly. Similarly, if you choose poorly when you select your advisor, you will also turn into a skeleton albeit not that instantly.
Note: If you are a student who is already working with me, ha, gotcha!:)
Presenting your research
If you are starting out, then you may think that doing the research is the only important part of the job – doing good research is all that matters! I’d say that makes you only half right. Doing good research is one important part of the job but communicating about your research is just as important. You may have solved the biggest research problem of your time, but if you do not communicate it well and present it in a way that people can easily understand then your research won’t have as much impact as it should.
So, the question is how do you actually present your research then? This is something I was first taught by Margo Seltzer and further emphasised by Jonathan Mace, Antoine Kaufmann, and Ivan Beschastnikh! You should think of every research paper as a story. A story that has a villain (the problem you are specifically trying to solve) and a hero (the solution/system/approach that you are proposing). If you follow this analogy you can see that for a good compelling story that the readers will buy, you have to enforce the following:
- Why the villain is particularly bad? : Why the problem is important?
- Why the villain is hard to defeat? : What are the challenges preventing this problem?
- Why the hero is capable of beating this treacherous villain? : What are the key insights and ideas that overcome the challenges?
- How the hero beats the villain? : What evidence do we have that shows that your proposed solution has solved this problem?
Normally, you can attribute all the paper rejections you would get to failing to answer one or more of the above questions. You may also get a rejection saying that oh your solution is not new and this was done before. I view this as a failure of adequately setting up the villain. On top of that, the reviewers and their reviews tend to be usually very subjective and biased on their pre-conceived notions about things.
Ok, so at this point we have this storyline-based approach of viewing our research but how do we actually use this in our day-to-day life. Let’s find out.
Fairytales
Well, the first step is the elevator pitch! An elevator pitch is essentially a 30 second to 1 minute blurb that you would say about your specific research project. Its often had to figure out what you should really say in that small amount of time.
One way that I was taught (once again by Margo!), was how we can always think of our research as a short fairytale. In my experience, having a fairytale for your project almost has a 1:1 mapping for having an elevator pitch!
As practice, in Margo’s seminar, we often used to write fairytale versions for the papers we were assigned to read. It helped us in trying to understand and distill the key ideas of the paper, while also helping us get practicing in summarizing works for other research works.
Here is example structure of fairytale:
Once upon a time, <What is the problem>.
<Why the problem is important>.
<What are the challenges>.
To solve these challenges, we propose a new system \sys.
\sys solves the challenges with <insert key insights>.
With \sys, we see <insert best evidence>.
Because of \sys, everyone now lives happily ever after.
Storyboarding
The next place where I think the research as a story aspect helps is when you are writing the first draft of the full research paper for your research project.
By the time you will have to write your first draft, you would have read a lot of the research papers and would have a general sense of what kind of content would need to go into the paper. Even though you may know what content needs to go in the paper, you are unsure of how to get started, how to write, and what to write. Even to this day, I often struggle with this to an extent but I usually find it easier to manage this struggle with the idea of storyboarding.
Storyboarding in the context of research paper, in my opinion, refers to writing the paper in a hierarchical fashion. You start at the paper level and simply write the section headers for each section you think you should be in the paper. Next, for each section you write the high-level takeaway you want the reader to takeaway from each section and how it connects to the takeaways of the prior sections. These section headers with the key takeaways essentially form the storyboard of your paper and shows how the story of your paper will flow at a level through the paper.
Often times, I even take this approach to writing individual sections as well. For each section, I know what the key takeaway should be at the end of the section and what I can assume the reader knows at the start of the section. What I ideally want to achieve is get from what the reader knows at the start to the key takeaways at the end through a linear storyline. To do this, I break down the section into individual paragraph and then write the top-level sentence for that paragraph. This forms the storyboard of the section.
Once you have the storyboard (or the structure) in place, it is relatively easier to add more content within the confines of the storyboard than it is to start from scratch.
Movie reference: Let’s take one of my absolute favourite movies of all time: Ocean’s 11. Nearly half of the movie is about planning for the heist and putting all the pieces in the right positions for executing the heist. If you think about writing the paper as executing a heist, then storyboarding is simply the planning part of the heist. Planning in advance usually results in much faster and smoother execution.
Know Your Audience
When you are presenting your work to a person or a group of people, you should try to tailor your presentation to that group as much as possible. At MPI-SWS, we often have to do institute wide presentations! A presentation that you made especially for a conference targeting your community may not be the best version to present to a general CS audience. A good rule of thumb is that the wider the scope of the audience, the more context and background you would have to provide in your presentation. Naturally, this often comes at the cost of sacrificing specific detail from the rest of your presentation to stay within the confines of your time limit.
Meta Movie reference: In the latest Superman movie (2025), James Gunn specifically chose not to adapt the Superman’s pod landing scene or any scenes showing Superman’s bond with his adoptive parents. This is because, these scenes had already been adapted in previous movies over multiple decases and almost everyone knows Superman and his background. By leveraging the assumption that most watchers would know this, James Gunn could instead focus on some other details about Superman’s relationships over the course of the movie.
Doing the research
There are people better equipped than me to explain how to do good research in general or how to do research in a specific area. I’ll only talk about two techniques here that I think are critical but also widely applicable.
The Scientific Method: Fake Eval Edition
At the end of the day, we are science researchers (or at least claim to be), so it is imperative that we actually follow the scientific method (or the scientific process). In a nutshell, as part of the scientific method, we start with a research question, formulate a hypothesis, conduct experiments designed to test the hypothesis, and then analyze the experimental data to either validate the hypothesis or to disprove the hypothesis.
The scientific method is extremely useful right before you even start doing the evaluation experiments. The scientific method simply provides a strict ordering on the tasks you would undertake when doing experiments. The ordering is usually as follows:
- Step 1: Figuring out what is the exact experimental question you want to answer. Eg: Does my new system solve the problem?
- Step 2: Deciding the experimental setup and the data we will be collecting to answer the question. Eg: To see if our system solves the problem, we will execute a wide variety of workloads with our new system and collect a specific metric (eg: throughput or p99 latency). As a baseline, we will also run the same variety of workloads on another system that does not handle this problem and collect the same metric.
- Step 3: Formulating the hypothesis. This is essentially you saying what you think you expect to see from the experiment! Eg: We expect that our new system is solves the problem produced by the variety of workloads by improving the collected metric in comparison to our baseline.
- Step 4: Executing the experiments and collecting the data!
- Step 5: Testing the hypothesis. You now test the hypothesis you formulated in step 3 by using the data you collected in step 4 from running the experiments you designed in step 2.
You may notice that Step 1-3 are completely independent of ever running the experiments. They will remain the same regardless of whether you have the results or not.
And this is where the Fake Eval kicks in. Instead of waiting for the results to come in to write the section, you can pretty much write your full evaluation section by using all the artifacts you created in Steps 1 through 3. Moreover, you can actually write a full evaluation by simply using the predictions based on your hypothesis to generate fake graphs (that you expect to see) even before you have actually ran the experiments. This also helps you write the scripts in advance that you would need to parse and analyze the collected data.
Reproducible Experiments?
During the course of working on a research project, you will inevitably have to conduct experiments to collect the necessary data and then parse, clean, and convert the data into graphs that you can use in your evaluation sections. This is a very critical and common task for any research project. Most times, you would have to re-run these experiments multiple times because you had to make some bug fixes or tweak some parameters or run them on a different deployment environment. If you don’t set things up in a way that you can easily re-run these experiments, you would have to go through the process of setting up all the things from scratch which can not only be a big source of frustration but also a major time sink which can be problematic when you are working on deadlines.
Movie Analogy: Let’s consider the premise of Groundhog Day where the main character is stuck in a loop having to relive the same day over and over again and how it is an utterly frustrating experience for the main character. Well, re-running your experiments (especially if they are distributed) can be a similar frustrating experience if you have to start from scratch over and over again! Make your experiments (and scripts for generating the necessary graphs) as push-button as possible.
Added Bonus: Your experiments being easily re-runnable already sets you up well for the artifact evaluation stage of the submission process. Moreover, research is built on prior research. Having your work be easily re-runnable also significantly helps the people in your community who would want to build up on your research.
Writing a Thesis
Section coming before GTA VI release.