Create, scan, and correct exams with R

As instructors, we are creating exams and often we use multiple-choice questions to assess the knowledge of our students. I don’t know how many times I have manually corrected an exam which is why this blog gives a short introduction and shows you why you should consider the R exams package for single or multiple choice exams, even if you are not a heavy R user. The R exams package provides some very useful features to automate the entire process, from generating the exam up to assigning grades. To encourage people with little R experience, this blog shows you how the package works.

First, why should you consider R for exams? Short answer, the exams package helps you to generate, scan, and assess the exam. It helps you to create a scan sheet that your students fill out in the exam. After the exam, use a regular copy machine to scan the sheets and the exams package scans the images of the exam, assess students’ answers, and provides documents with the results instantly. That is a awesome reason, but the package also helps you reduce mistakes because students’ answers are no longer corrected manually. Furthermore, it returns a HTML file for each participant that shows the scan sheet, the given and the correct answers, and how many points a person has earned.

Let’s see how it works. In order to use exams, we need to setup a folder that contains all the question of the exam; we need images of the exam to extract the answers; and we have to assess whether students’ answers are right or wrong. You can give it a try and learn how each step works before you setup your own exam, because the exams package comes with a nice tutorial and demo files. The next subsections give you a quick summary of the demonstration from the website and you can find a step-by-step guide on the R exams website as well.

A short demo from the R exams package

Before we can start to create the exam, we need exercises or questions of the exam and the package provides several exercises files that you can download on the R exams website. Create a new folder where all the files will be stored. Copy the exercises from the website and save them in a new folder named exercises. Next, create a new R script, do not forget install the package first and then load the library.


In order to create the exam, we need to decide first which questions should be included in the exam. Therefore, we create a list that stores the questions names that are available in the exams folder. In the last section, we will see how to create our own questions, but as first step it is fine if we use the questions provided by the exams package. As the next output shows, I created a list (exercises_exam) with several example questions that will be included in the exam:

exercises_exam <- list(
  c("boxplots.Rnw", "scatterplot.Rnw"),

Second, we use this list to create the exam with the help of the exams2nops function. The latter creates a PDF file based on the exercises_exam list and adds the scan sheet as first page. As the exams package outlines, you may want to set a seed to reproduce the results and we assign the exam as test_exam. After the next code is executed, the exams package create a pdf with questions of the exercises_exam list.

test_exam <- exams2nops(exercises_exam, 
                  dir = "nops_pdf", 
                  name = "demo", date = "2015-07-29",
                  points = c(1, 1, 1, 2, 2, 3),
                  n = 2)

There are several options to adjust the exam and you may check the documentation of the package to adjust the minimal code for your own purposes. As the minimal code shows, we need to provide the directory (dir) where the pdf file will be stored (here: nops_pdf), a name and the date of the exam; and I created two different versions of the exam (with randomized order of the questions), but you can create more versions if you want to. In addition to the exprted pdf files of the exam, the exams2nops function saves a .rds file in the directory which contains all of the meta data about the exam (e.g. solutions). We will learn more about this file in the last section.

The package provides everything that is needed for a test run. It includes even test images, which we will use to learn how we can scan images and extract the given answers of our students. The next code snippet saves two example scan files as scan_image from the exam package. We can use the two files to see how we evaluate images and scan an exam.

scan_image <- dir(system.file("nops", package = "exams"), 
           pattern = "nops_scan",
           full.names = TRUE)

All of the scan images must be stored in one directory. As the minimal example of the exam package shows, we create a new directory with dir.create("nops_scan") where the scan files will be saved and the second line of code copies the demo files (scan_image) into the nops_scan folder with file.copy().

file.copy(scan_image, to = "nops_scan")

After you run this code chunk, two fake examples of scan images appear in your folder, one from Ambi Dexter and another one from Jane Doe, which the next figure displays to illustrate.

Now, we have prepared all the essential steps, we can use the nops_scan() function to scan the images. As the next console shows, it trims and rotates the files and extracts the information from the PNGs.

nops_scan(dir = "nops_scan")

Have a look in your directory. The nops_scan() function saves the result as an archive, which includes the png files as well as a text file with the extracted information.

Before we can finally evaluate the results, we need a list with information about our students. This list will be used to match them with the results of the exam. The minimal example gives you also a code snippet to create a csv file that contains the information about the two fake students (Jane Doe, Ambi Dexter) from the demo.

  registration = c("1501090", "9901071"),
  name = c("Jane Doe", "Ambi Dexter"),
  id = c("jane_doe", "ambi_dexter")), 
  file = "Exam-2015-07-29.csv", sep = ";", quote = FALSE, row.names = FALSE)

Finally, we use the nops_eval() function to evaluate the scanned images. The register points to the students’ matching list, solutions points to the meta data of the exam, scans provides the directory and name of the scan results, eval determines how the results are evaluated (e.g. do we give partial points), and interactive gives information whether errors should be handled interactively or not. Check out the documentation of nops_eval()for more information about the given options.

exam_results <- nops_eval(
  register = "Exam-2015-07-29.csv",
  solutions = "nops_pdf/demo.rds",
  scans = Sys.glob("nops_scan/nops_scan_*.zip"),
  eval = exams_eval(partial = FALSE, negative = FALSE),
  interactive = TRUE

Anyway, the nops_eval() returns a data frame that contains the answers, solutions, and given points for each student!


Having data with the exams answer is awesome, but nops_eval gives us more. The nops_eval() function has already exported the exam file and it created an archive that contains a short summary document of the exam results for each participant. As the next figure shows, it displays the meta information of your students, an assessment of each question, and the image of the scan sheet used to extract the information. Thus, you are really prepared if your students show up to review the exam.

Thus, the exams package reduces a lot of pain when it comes to correct exams. I am sure, you can handle the discussed steps with the help of the demo files from the R exams package, even if you have limited experience using R. To boost the popularity of the package, and to convince you that you should stick to R for the next steps as well, the next section shows you how can use R to work with the exam data. Often we need to upload the exam results in a specific way. We may want to automate this process as well and we may also create a document to communicate the results. Feel free to use any software to finalize this steps, but R has some nice features to automate this process without much effort. Unfortunately, this implies that I assume in the next section that you have some basic knowledge about R. I try to outline most important steps to explain what happens if you run the code, but I skip the details.

R is your exam friend

How do we communicate the results of the exam and how can we automate this process? The data is saved as nops_eval.csv and we can use R to wrangle the exam data, prepare a final list with the results (a list to upload exam’s grades), and to communicate the results to your students.

Obviously, the exam data is already loaded, but we have to import the data if we want to provide a short summary for the participants or if we want to rerun the data management steps. Thus, use the readr package to import the data and the tidyverse package provides different packages for data wrangling, including the readr package. If you are not familiar with loading data in R, import the data with the import data function in RStudio. It gives you a preview of the data, shows you the corresponding packages and the code to import the data. As the following code snippet shows, you can read a delimited file (including csv & tsv) with the read_delim() function and we have to tweak the delimiters, because our file contains semicolons instead of commas to separate values.

exam_df <- read_delim("nops_eval.csv", 
                        ";", escape_double = FALSE, trim_ws = TRUE)

Next, I exclude all variables which are not longer necessary for the report after importing the data. The dplyr package gives a lot of handy functions to work with data and the package is included in the tidyverse package. We can use the select() function to make a narrow data frame, which includes only an ID variable (register number) and the points from the exam. Let’s generate some fake data (exam_df) to illustrate this process. The code shows you furthermore how you can save a new data frame with the selected variables with the select() function:

exam_df <- tribble(
  ~ID, ~points,
   1,  57,
   2,  60,
   3,  84,
   4,  45,
   5,  82,
   6,  23,
   7,  99,
   8,  47,
   9,  37,
   10, 77

exam_df <- exam_df  %>% 
  select(ID, points)
## # A tibble: 10 x 2
##       ID points
##    <dbl>  <dbl>
##  1     1     57
##  2     2     60
##  3     3     84
##  4     4     45
##  5     5     82
##  6     6     23
##  7     7     99
##  8     8     47
##  9     9     37
## 10    10     77

Again, the exam package makes our life very easy since there is not much to do to prepare the data. For this reason I try to encourage people to use R even with less R experience. As the next code chunk illustrates, we have to generate a new variable that stores the grade depending on the points people have achieved. Use mutate() to extend the data frame and the case_when() function assigns grades in accordance to the points of the exam. I decided that grade level goes from 100 to 50 points with a range of 5 points for each grade level, but that is not the important point here. The case_when() function checks whether the condition (e.g. points >= 95 ~ 1.0) is fulfilled and assigns the corresponding grade. Let’s see how it works:

exam_df <- exam_df %>% 
    grade = (
        points >= 95 ~ 1.0,
        points >= 90 ~ 1.3,
        points >= 85 ~ 1.7,
        points >= 80 ~ 2.0,
        points >= 75 ~ 2.3,
        points >= 70 ~ 2.7,
        points >= 65 ~ 3.0,
        points >= 60 ~ 3.3,
        points >= 55 ~ 3.7,
        points >= 50 ~ 4.0,
        points <= 49 ~ 5.0
## # A tibble: 10 x 3
##       ID points grade
##    <dbl>  <dbl> <dbl>
##  1     1     57   3.7
##  2     2     60   3.3
##  3     3     84   2  
##  4     4     45   5  
##  5     5     82   2  
##  6     6     23   5  
##  7     7     99   1  
##  8     8     47   5  
##  9     9     37   5  
## 10    10     77   2.3

As the output shows, Person 1 has 57 points and gets the grade 3.7 (German grading system); person 4 gets the grade 5 because he/she has achieved less than 50, and so on. Please check each grade level to make sure that there are no mistakes, no typos, or any other problems.

The next steps depend on how you have to upload the grades at your university. For instance, I need a sorted list and the grades multiplied by 100 at my institution. Nothing easier than that, use mutate() again to extend our data frame with an additional grade and use arrange() to sort the data.

exam_df %>% 
  mutate(grade_system = grade * 100) %>%
## # A tibble: 10 x 4
##       ID points grade grade_system
##    <dbl>  <dbl> <dbl>        <dbl>
##  1     1     57   3.7          370
##  2     2     60   3.3          330
##  3     3     84   2            200
##  4     4     45   5            500
##  5     5     82   2            200
##  6     6     23   5            500
##  7     7     99   1            100
##  8     8     47   5            500
##  9     9     37   5            500
## 10    10     77   2.3          230

Finally, I need to match the exam list with a list of students who actually showed up. This step also depends on what your institution expects you to deliver, which makes it harder for me to give you any useful advice. You may want to check out how to merge data (in my case I used a left_join()) because we only need a list for those people who showed up and took the exam. After the merging process, we are able to save the final results with the readr package. In my case I save the exam results as a csv file which makes it possible to upload the exam’s results.

write_csv(final_results, "final_results.csv")

Thus, you can create, scan and correct exams even if you have only limited knowledge about R, but I know from my own experience that the start can be tricky and we all need sometimes an incentive.

I guess reducing mistakes when correct exams is already an huge incentive, but you can also use R to generate an exam report for your students. Check out rmarkdown, which let you easily create different files (pdf, html, word). In my case I have a standard document for my students that contains a table with individual results as well as a histogram that depicts the distribution of exam grades. The next console shows the code to generate a histogram for the fake exam data with the help of the ggplot2 package.

mean_grade <- exam_df %>% 
  pull(grade) %>% 
  mean() %>% 

ggplot(exam_df, aes(x=grade)) +
  geom_histogram(colour="black", fill="white", bins = 11)+
  geom_vline(xintercept=mean_grade, size=1.5, color="red")+
  geom_text(aes(x=mean_grade+0.5, label=paste0("Mean\n",mean_grade), y=8))+
  theme_minimal(base_size = 14)

The ggplot2 package, rmarkdown, and the r-exam package? Maybe you feel a lit bit overwhelmed depending on your background. I just wanted to outline the advantages if we create all steps in the same environment, and R gives you the possibilities to do all essential steps when it comes to exams. Moreover, it is very easy to learn rmarkdown or ggplot2 in case you have never heard of it before. I hope that the code examples give you a start how to apply it on your own. You could even have my own RMarkdown template, but RStudio comes with several rmarkdown templates and if you copy the code from above, you would have essentially the same document as I use.

From my opinion there is only one thing left for me to do. You have to create your own exercises before you can think of using R for your exams.

Create your own exercises

The next output shows you an example of an exercise. The exercises need to be available as a Markdown or a RMarkdown file. Obviously, another good reason why you want to learn more about RMarkdown. Anyway, even if you are not familiar with both, creating new exercises is very easy and the structure of the exercises is not complicated at all. Let’s have a look at an example:

What is the question?


* A
* B
* C
* D

A and C

* True
* False
* True
* False

exname: question1
extype: mchoice
exsolution: 1010
exshuffle: TRUE

I don’t think there is much to explain about the question, the answerlist or the solution section in this example. Just copy and paste it for your first questions. However, let’s have a look at the meta-information at the end of the exercise which is need to scan and assess the results. First, you have to give your question (exercise) a name (exame). Next, you have to outline the exercise type (extype), which means that we have to outline whether you create a single (schoice) or multiple choice (mchoice) question; Exsolution points to the binary string for the solution. A and C are right in the example above, which leads to exercise solution of 1010. Ultimately, you must decide whether the answers of the questions should be shuffled or not.

The R exams package has much more to offer than I could possibly show in this post. I just tried to give a quick summary how R can be used for exams. Visit the R exams website for tutorials, the dynamic exercises, or e-learning tests. Most of all I hope that I could convince some people that learning how to create, scan, and correct exams with R is not rocket science.

Edgar Treischl
Edgar Treischl
Senior Research Fellow

My research interests include quantitative methods, evaluation, causality and so much more. matter.