More and more scientists talk about open science and preregistering your design. The main goal is to warrant the quality of the research and to ensure that the knowledge is shared. I have a positive experience with preregistration. However, is can oftentimes seem daunting and binding. My fellow PhDs ask me: ‘what if I make a mistake in my preregistration’. This question relates to two misconceptions: (1) the idea that you are not allowed to make mistakes and (2) the idea that a preregistration is another millstone round our necks. In the following blog I will elaborate on both misconceptions and on my experience using preregistration. We will specially focus on a pre-analysis plan, since using this has helped me a lot throughout my career.
We talked a bit about this in my previous blog: Open science and pre-registration: let’s unite instead of fight. Let me summarize it really quick: preregistration is a step you take before collecting your data. Boom! That is it, it is really nothing more than what you should already do before you collect your data. You specify your research question, hypotheses, dependent variable and conditions. Why is this important? It will help you to really think things through before you collect your data. Also, you can share it with peers who can in turn provide you with useful feedback before you collect the data. A more practical reason to preregister is that it becomes clearer (for the readers and yourself) what you set out to do (confirming) and what you discovered along the way (exploring). Both are important, but it is also important to be clear and honest about which finding was based on a hypothesis (conformation) and which was found accidentally (exploring).
Pre-analysis data plan
There is one small box in your preregistration that askes you to specify the analyses: which analyses will you conduct to examine the main question/hypothesis? For me, in the beginning of my PhD this was a hard question to answer. I would not say that I found it unnecessary to think about this beforehand, but I did not see the merit in it since it was such a struggle. However, I forced myself to do it anyways, and I am glad for it. Why? I will explain that after showing you how I fill in my pre-analysis data plan.
I start with my hypotheses. I make three versions of each hypothesis: one as presented in my paper (academic), one in understandable terms (layman) and one in code (analytical). It is important that you take time to master all three versions. Next, I focus on the analytical version. In this version I write it out like: IV influences the DV. Next, I specify what type of variable the dependent variable (DV) is; continues or categorical. I do the same for the independent variable (IV).
I consider whether I have one or multiple independent variables. I do the same for the dependent variable: one or multiple variables? Also, I consider the number of categories per variable, if the variable is categorical. These steps basically help me pick the right test to analyze the data. I consider whether I expect the distribution to be normally distributed or not. In my field, philanthropy, the data is hardly ever distributed normally (it is right-skewed). Then I write out which test I will use to analyze the hypothesis. Make sure you check the assumptions of the test and consider whether the expected data will fit the criteria.
I also consider what I will do with outliers and exclusions. In case of the outliers, I follow a single-construct technique of standard deviation analysis (Aguinis, Gottfredson, & Joo, 2013), and considered a data point as an outlier if it was more than three standard deviations from the mean.
The idea that you are not allowed to make mistakes
The haunting question of ‘what if I make a mistake’ seems to scare off some individuals who want but do not dare to preregister. Let me comfort you: everyone makes mistakes. Let me now depress you a bit: you are not superman and academics make mistakes too. Also, I have noticed that I am less likely to make a mistake thanks to the preregistration. You can easily share it with your peers and go beyond that by sharing it on social media like Twitter (you should definitely do this, it will lead to interesting conversations and new friends).
If you make a mistake, meaning that you picked the incorrect test, you indicate that ‘in retrospect’ I should have chosen … Always be transparent about your steps.
Preregistration is a millstone round our necks?
While it is true that you should try to follow the pre-analysis data plan, there are exceptions to every rule. Sometimes, it is ineffective to stick to an outdated plan. If the data plan does no longer fits the data and it makes the results uninterpretable, state that. Maybe you did not (correctly) interpret the assumptions of the test, be transparent about this. Describe the new plan, the reasons for change, and then move on to the more appropriate test. However, it is important to include the outdated pre-analysis plan. Some journals might reject you for admitting your mistake(s), but transparency is important for further research. I advise you to focus on journals that value transparency and open science.
Writing out my pre-analysis plan has helped me specify my hypotheses and better my overall design. It is not an extra activity, since you have all the information written out in your paper. It is a helpful addition to better you design. Share your pre-analysis data plan with others and discuss it before you collect your data. This will help you collect qualitatively better and more useful data to answer your burning question.
Aguinis, H., Gottfredson, R. K., & Joo, H. (2013). Best-Practice recommendations for defining, identifying and handling outliers. Organizational Research Methods, 16(2), 270–301.