/ A Novel Technique for A/B Testing Using Static Prototypes

Abstract

A/B testing is a powerful technique for evaluating the success of a specific design element, but it is not yet widely adopted among library user experience professionals. Many libraries cannot or choose not to do A/B testing on a live website for a variety of practical reasons. Appalachian State University Libraries recently piloted a variety of A/B testing that has the potential to address some of these shortcomings: a Qualtrics survey of tasks carried out on static prototype websites embedded into the survey as inline frames. The technique allowed us to capture qualitative data in the form of survey questions and link it to quantitative server data typical in “live” A/B tests. Prototype A/B testing allowed us to reap the benefits of A/B testing without needing to modify a production server environment. Based on our findings from a large sample of undergraduate and graduate students, we were able to justify a post-migration design choice.

This paper was refereed by Weave's peer reviewers.

Introduction

A/B testing is a research methodology for evaluating an isolatable design element of a product or service. At its core, A/B testing (also referred to as split testing or multivariate testing) involves presenting users with versions of an interface differing in only one respect (the independent variable) and collecting data on performance metrics (dependent variables) to determine which version outperforms the others. A/B testing has become a staple in the UX community as companies perpetually test and make changes to their online presence. It is a powerful, data-driven technique to improve how users interact with systems. In the context of libraries, however, its adoption has been complicated by scale.

Generally speaking, libraries have not been able to successfully “split” a service and gather data from the entire user population. A recent and notable exception is Young (2014), who describes how Montana State University leveraged Google Experiments to test which word best approximated students’ mental model for getting information on research services. Focusing on library websites in particular, there are some barriers to conducting such a large-scale split test. Many libraries do not have direct access to a production server or lack the wherewithal to set up a test in the appropriate software. In addition, A/B testing could interfere with librarians who teach or create e-learning resources based on the current version of the website. Obradovich, Canuel, & Duffy (2015) found that over 70 percent of Canadian Association of Research Libraries and Association of Research Libraries member institutions that provide instructional videos on their library website feature content on using a library catalog, discovery tool, or specific databases. As many of these videos likely use the library homepage as a starting point, it could confuse viewers to encounter a version of the website different from the one seen in the video. What many libraries have done is adopt the basic principle of A/B testing but conduct it on a much smaller scale, typically with a tiny subset of users. Instead of collecting in-the-wild data, they sit the user down and observe a common task the user might perform on the website. And instead of comparing against live variations of the website, they rely on a surrogate or prototype website for variations from the live control. This kind of A/B testing is known as prototype A/B testing and is common in web development projects outside of libraries (Frome & Cohn, 2015).

Appalachian State University Libraries recently performed a prototype A/B test of the library website that scaled to a level more typical of a live A/B test, giving us data from more than a hundred users. The technique we used offers libraries the potential for performing larger scale A/B testing while avoiding many of the problems noted above. We accomplished this by inserting the prototype A/B test inside an electronic survey sent out to a large number of students. The survey consisted of tasks to be carried out on a static prototype website embedded as an inline frame, or iframe. Our setup allowed us to track two kinds of data: qualitative, self-reported data in the form of survey questions similar to what is collected in face-to-face usability tests, and quantitative data associated with live A/B tests, such as what pages were visited and for how long. Best of all, the two kinds of data were not disparate but we were able to be joined together, allowing us to see how reliable each measure was when compared with the other. This method relies on standard survey software and a class of software known as static site generators, and it requires only a sandbox server. As no changes need to be made to a live site, libraries may find this a more feasible approach to A/B testing. This paper will walk through the process of what issue we tested, how the test was set up, and how data was extracted and analyzed to inform our decision.

Literature Review

A/B testing has been a staple in product development for the past fifty years. Large tech companies such as Google and Facebook continuously experiment with website and app features to gather data on what their user base prefers. As Young’s (2014) article points out, the literature on A/B testing is extensive in fields such as computer science and marketing but essentially nonexistent in library science. However, librarians are clearly thinking about the same kinds of user-centered design considerations, especially given the large emphasis placed on website usability studies.

Prototyping, like A/B testing, is an integral part of the design process. It can be used in different stages of the process but is often used early to test features without having to create a functional final product. Prototypes can range from low-fidelity paper mockups to high-fidelity themed websites. Some prototypes have little to no functionality and are used very early in the brainstorming phase, while others have enough functionality to test relevant aspects of the user interaction, albeit with some imagination required on the part of the test participant. For example, one can make interactive paper prototypes by having a human “computer” process a user’s tap; when an element is tapped, the sheets of paper are rearranged, thereby representing the response of the system. Facilitators occasionally need to explain to users that some expected responses could be nonfunctional and ask them to make believe (Pernice, 2016). The literature on using website prototypes in the library setting is somewhat sparse, but representative studies have many commonalities, including recruiting a small number of participants and comparing task performance between a prototype website and a current website. Ellis and Callahan (2012) discuss prototyping in the context of reimagining how online finding aids are organized. They tested users with a Bootstrap-based prototype to see how they would interact with an “atomized”-components approach to viewing archival content. Other examples are more straightforwardly A/B testing using prototypes. Swanson, Hayes, Kolan, Hand, and Miller (2017), Dougan and Fulton (2009), and Reynolds (2008) used prototype websites in the course of usability testing their library websites. Metrics ranging from the total time on task, task completion, the path of webpages taken, “think aloud” comments, and self-reported ratings all showed that the prototype outperformed the current website.

In recent years, prototyping has been aided with the introduction of a set of tools called static site generators. A static site generator is a web development program that resides on the developer’s local machine rather than on a server. It allows the developer to arrange website content and systematize how it interacts with other aspects of a website, such as site layouts and CSS assets. The website can then be compiled, generating a complete and self-contained static website that is ready to be uploaded to any server. As the website only contains HTML, CSS, and JavaScript, it might be asked what is gained by working with this software. The fact is static site generators have found a wide audience owing to their many advantages over larger content management systems such as Wordpress or Drupal. For example, static websites load faster, have fewer security vulnerabilities, and are easier to preserve and version, that is, save a snapshot at a particular time. For websites that are smaller or lack the need for complex, database-driven interactive features, static websites are a viable and increasingly preferred option. Newson (2017) describes how the Ontario Historical Topographic Maps Digitization Project website was developed using Hugo, a Go-based static site generator. Diaz (2018) describes how Northwestern University Libraries used Jekyll, perhaps the most popular static site generator currently, to publish conference proceedings and an open educational resource. Outside of libraries, static site generators are commonly used in the design process to create static mockups.

Surveys are frequently used as a research methodology within libraries and are almost certainly the most common data collection method in the field (Halpern, Eaker, Jackson, & Bouquin, 2015). They are helpful for gathering data about the experience, opinions, and attitudes of library users, staff, and other stakeholders. Survey research is typically contrasted with experimental research, although it is possible to manipulate variables in so-called survey experiments. Surveys implemented in library contexts usually adopt a purely descriptive form, but other creative uses have been tried with some success. For example, Symonds (2011) describes how survey software can also be used as a remote asynchronous usability testing tool. Symonds and her colleagues used SurveyMonkey to disseminate both a link to a website and a series of tasks on that website that respondents were to complete. After completing each task, they were instructed to answer questions that could be used to assess how usable the website was.

The methodology described in this article builds off Symonds (2011) by combining a descriptive electronic survey with an interaction with a randomly assigned prototype website. We arrived at this approach mostly as a natural extension of our experience with the two featured software tools, electronic surveys and static site generators. In particular, we knew our preferred survey tool, Qualtrics, was capable of displaying and capturing much more information than we had had the need for in previous surveys. Around this same time, we were using static prototypes to mock up ideas during our website redesign process. Merging the two methods struck us not only as an interesting technology challenge but an opportunity to glean the benefits of having rich qualitative survey data with quantitative server log data. The latter would allow us to determine which pages respondents visited and how much time was spent on each page. By itself, this information is informative, but combined with user feedback, it provides us with some insight as to why users did what they did.

Methodology

The A/B testing discussed in this article occurred after Appalachian State University’s main library website was redesigned and migrated to a new Drupal theme in August 2017. Leading up to the launch, members of the library’s Web Content Committee had conducted an electronic survey, which included an open and closed card sort implemented with the “Pick, Group, and Rank” question type in Qualtrics. Card sort data indicated there was a need to better organize our content relating to our spaces. Previously, information on the website relating to our rooms and spaces was situated in the “Services” menu item. However, from our card sort we realized that this type of content would be better categorized in a standalone section of the website. In addition, the library had been undergoing a major space renovation. More space for students was being created, the Writing Center was relocating to another floor, and other spaces such as our makerspace and video recording room were being created or expanded. A “Rooms & Spaces” menu item was seen as an important design decision to better inform our users of the changes taking place in the building.

When the website was launched, we had put our building floor maps as the “landing page” to this new section as it seemed, at first glance, to be the conceptually obvious choice. Our previous card sorting tests, however, had not specifically looked at where users expect to find a map of the building. The floor maps had been in the “About” section of our website prior to the migration, and a scan of other library websites indicated that this was a common location. We realized we lacked a data-driven rationale for our decision and began looking into ways to test this with our users. Since we were interested in testing two specific and definable options – i.e., floor maps under “About” versus floor maps under “Rooms & Spaces” – we opted to conduct an A/B test.

The prototypes were built in a static site generator called Sculpin. We chose Sculpin because it was developed in a programming language (PHP) we use for our own projects. However, nearly any static site generator would have been adequate for our purpose. To create the base theme for the static prototype, we simply asked our colleagues in Web Services if they could provide the flat files—HTML, CSS, JavaScript, and images—that together made up the template of the Drupal site. This is typically an easy request because many custom themes are created and refined first as static HTML before being converted into a form the content management system can handle. However, if these files are not readily available, they can be always be recreated, though it will require some extra work.

The Web Services department provided us with a single HTML file that contained links to all the required assets on a central server. This HTML file served as the basis of our Sculpin theme. We inserted snippets of Twig code, which is the templating language Sculpin uses, to tell Sculpin where to put our page title and page content. Then we manually created all the site’s pages. Fortunately, for most pages, this was simply a matter of copying and pasting the source code from our live Drupal site and making some minor adjustments, such as removing Drupal-specific HTML code.

We created two versions of our prototype website. One situated our floor maps page under “About,” where it had been prior to our migration, and the other located it under “Rooms & Spaces,” where the page currently existed. The sites were deployed to one of the library’s sandbox servers. Since the websites were entirely static, it was not necessary to install or configure any specialized software to get the sites to run. However, there were two specific requirements for the server setup. First, server logging (in Common Log Format) needed to be enabled to capture the respondent’s browsing activity. Second, SSL needed to be enabled in order for the website to be embedded as an iframe in the Qualtrics survey software.

Qualtrics was the mechanism through which we recruited participants, but it also enabled us to link together the survey data with data from the browsing session. While we could have used IP addresses or a service like Google Analytics to achieve the same thing, we wanted to keep the survey truly anonymous. This was important to us because if respondents could be identified, we would be obligated to provide information on how their session was being tracked, which might have affected the response rate or the informality we were trying to convey as they approached the tasks. The nature of our project was such that it was limited in scope to gathering data about Appalachian State University users. As that data could not generalize to other populations, it was not subject to human subjects review. However, those wishing to utilize a technique like the one described would be wise to consult with their Institutional Review Board chair or administrator. For details on how we associated a respondent’s survey with his or her browsing activity, see the Appendix.

The static websites generated were high-fidelity functional prototypes. See the prototype homepage in figure 1. However, there were a number of features that one would expect on a dynamic library website that could not be replicated on the prototypes, and so we were forced to compromise. First, in areas where the content was too complex to be worth rendering in Sculpin, we left it out or replaced it with a shaded box. This informed the user what should have been there but was not fleshed out in detail. For example, we did not list detailed information on every one of our databases. Since none of these features were crucial for the tasks users were to complete, this was seen as an acceptable compromise, although even here we cannot rule out the possibility that this biased some user interactions. Second, we wanted to track just about everything a user did or encountered on the site in our server log. On a real library website, however, with links and forms directing users to databases, catalogs, and other third-party sites, it is unlikely that the totality of interactions would be captured in a single place. Furthermore, as we were only testing our website, we did not want to allow our users to leave, even accidentally. Our solution was to “lock down” external sites. For each outbound link or form submission, we redirected users to a single page (called “404”) as shown in figure 2. The page captured what link was clicked or what query was submitted, and displayed this to the user along with an explanation for the atypical behavior and access to a “Back” link. This is no doubt a stark case of asking the user to make believe for the purpose of a test, but we felt this was probably the smoothest way to handle this limitation. To confirm this, we did some in-person pretesting of the prototype website with students we found in the library. We simply asked them to complete some tasks that we knew would lead them to this explanation page. Based on observations and talking with them afterwards, we believed the page was sufficiently clear and not unacceptably disruptive.

Figure 1. Prototype homepage.
Figure 1. Prototype homepage.
Figure 2. Workaround for outbound links.
Figure 2. Workaround for outbound links.

The survey had a few sections. First, an introductory section explained the purpose of the survey, provided brief instructions on what they would encounter and what they were to do, and obtained informed consent. Next, the survey software randomly chose one of two paths, corresponding to the assignment of either the “A” website or the “B” website. The task question consisted of three parts: (1) a brief description of the task, (2) the prototype website loaded directly underneath it in an iframe, and (3) a Likert-type question that asked how easy the task was to complete (if it could be completed at all). This last question was a slight modification of the Single Ease Question and represents a simple way to gather data on perceived usability (Laubheimer, 2018). After the task question, we included a general open-ended feedback question before ending the survey. See figure 3 for a screenshot of the first section of the survey. We chose a task that we hoped would get our users searching for floor maps of the building: “You were told to meet up with your study group in room 303 of the library, but you have no idea where this is. Find out where exactly this room is in the library.”

Figure 3. Prototype website loaded in iframe beneath task description.
Figure 3. Prototype website loaded in iframe beneath task description.

We collected a total of four pieces of data. In addition to the Single Ease Question, we collected the total elapsed time on task measured in seconds, the “path length,” or total number of page loads needed to complete the task, and the final page that was loaded, which could be used to determine whether the respondent successfully completed the task. All this data came from analysis of the server logs using the R programming language; see the Appendix for sample code.

The survey was launched on November 1, 2017. To generate our contacts list, we used a previously downloaded full population list of students currently enrolled at the university. (If a directory list of students is not easily accessible at your institution, consider talking with your office of institutional research as they may be able to provide a full or partial list of student email addresses.) We randomly sampled 4,500 students using the R programming language and sent them invitations to participate. A reminder email was sent to non-responders two weeks later, and the survey was closed on November 30, 2017. No incentives were used.

Analysis

Before doing any analysis, the data first needed to be “cleaned.” Three types of respondents were excluded from the analysis. First, we excluded any respondent who did not finish the survey, which is simply anyone who did not click through to the end and submit the survey on the final screen. This removed 130 responses. Second, we excluded respondents who did not engage at all with the prototype, which we defined as anyone who did not navigate beyond the landing page. Given that our task could not reasonably be attempted without leaving the homepage, we felt this decision was justified. This excluded twenty-five responses. Third, we excluded extreme outliers with respect to time on task. Our goal was to have these tasks completed in a single uninterrupted sitting; we wanted to exclude users who became distracted with other things on their computer or device and came back to the task much later, when it was no longer fresh, as this would significantly overestimate the time needed. This excluded one response that took over nine hours to complete the survey.

Out of 130 valid respondents, 67 had been randomly assigned by Qualtrics to interact with the prototype featuring floor maps under “About,” while 63 had been assigned to the prototype featuring floor maps under “Rooms & Spaces.” These sample sizes are significantly higher than anything we could have achieved with in-person usability testing, which typically rely on as few as five and as many as twenty users (Beck & Manuel, 2008). Small sample sizes in usability testing are justifiable but only for problem discovery; they are inappropriately underpowered to draw conclusions about which of several interfaces performs best according to some metric (Nielsen, 2006).

The data for each condition are summarized in the table below. Only 31 percent of respondents ended the task on the correct page when floor maps were featured under the “About” page. A large number instead navigated to the page on group study rooms, which is a comprehensive source of information about our rooms, except that it lacks a map of their location in the building. Since room numbers are listed on this page, it is at least arguable whether it provides the necessary information to complete the original task, so the 31 percent may be an underestimate. By contrast, 83 percent of respondents correctly navigated to “Rooms & Spaces” in the other prototype website. This was clear evidence that users expected to find information about locations inside the building on this page. Not surprisingly, users perceived the task as easier when they found the information available on this page.

Table 1. Survey results.

Percent rated somewhat or very easy

Percent successfully completed task

Median time on task (in seconds)

Median path length (in page loads)

“About”

62

31

37

3

“Rooms & Spaces”

93

83

18

2

The other metrics we collected, time on task and path length, showed significant differences between the two conditions. When floor maps were featured under the “About” page, respondents took longer to complete the task and required browsing through more pages. These two metrics are represented as histograms in figures 4–7. Note that they are right-skewed, meaning most of the data points are at lower values, unlike a bell-shaped normal distribution. To determine if the two groups were statistically different, we used a Mann-Whitney test, which does not assume that the data come from a normal distribution. This test confirmed that the differences between the two conditions were statistically significant across both metrics.

Figure 4. Histogram of time on task, “About » Floor Maps” version.
Figure 4. Histogram of time on task, “About » Floor Maps” version.
Figure 5. Histogram of time on task, “Rooms & Spaces” version.
Figure 5. Histogram of time on task, “Rooms & Spaces” version.
Figure 6. Histogram of path length, “About » Floor Maps” version.
Figure 6. Histogram of path length, “About » Floor Maps” version.
Figure 7. Histogram of path length, “Rooms & Spaces” version.
Figure 7. Histogram of path length, “Rooms & Spaces” version.

Conclusions

A/B testing allowed our library to confirm that a design choice, implemented with less than desirable forethought, was not a major source of confusion to our users. This would have been difficult to determine using any other single means. A/B testing was the most efficient methodology because two interfaces immediately suggested themselves given the context of the problem. Furthermore, with sample sizes of more than sixty per interface, we were able to make statistically valid conclusions, something that would be impossible if we were only able to recruit half a dozen users as is typical in usability testing scenarios. This is an important consideration for librarians and information professionals who often rely on small sample sizes to arrive at their conclusions. As Laubheimer (2018) points out, “Numeric data from five users should not inform design decisions, and reporting numbers collected with such a small sample is highly misleading.”

By the same token, A/B testing, like any other research methodology, should not be relied upon in isolation but should be triangulated with other approaches. As an example, the open-ended, free-text responses that we collected as the final survey question, while very helpful, do not come close to capturing the nuanced reactions that are observed in a face-to-face usability test, nor could we probe with follow-up questions to elicit further thoughts and feelings. Others may wish to extend this method by soliciting more data from the respondent, such as class year, major, and degree of familiarity with the library website. That data was not analyzed in our test and may have been relevant. Perhaps heavy users of the library website expected to find floor maps of the building in its “old” spot to a greater degree than light users. It may also be worthwhile to compare the effectiveness of live A/B testing to prototype A/B testing.

References

  • Beck, S. E., & Manuel, K. (2008). Practical research methods for librarians and information professionals. New York: Neal-Schuman.
  • Diaz, C. (2018). Using static site generators for scholarly publications and open educational resources. Code4Lib Journal, 42. Retrieved from https://journal.code4lib.org/articles/13861
  • Dougan, K., & Fulton, C. (2009). Side by side: What a comparative usability study told us about a web site redesign. Journal of Web Librarianship, 3(3), 217–237. doi:10.1080/19322900903113407
  • Ellis, S., & Callahan, M. (2012). Prototyping as a process for improved user experience with library and archives websites. Code4Lib Journal, 18. Retrieved from https://journal.code4lib.org/articles/7394
  • Frome, N., & Cohn, S. (2015, February). How split testing validates new product concepts without code. UX Magazine. Retrieved from http://uxmag.com/articles/how-split-testing-validates-new-product-concepts-without-code
  • Halpern, R., Eaker, C., Jackson, J., & Bouquin, D. (2015). #DitchTheSurvey: Expanding methodological diversity in LIS research. In the Library with the Lead Pipe. Retrieved from http://www.inthelibrarywiththeleadpipe.org/2015/ditchthesurvey-expanding-methodological-diversity-in-lis-research/
  • Laubheimer, P. (2018). Beyond the NPS: Measuring perceived usability with the SUS, NASA-TLX, and the Single Ease Question after tasks and usability tests. Retrieved from https://www.nngroup.com/articles/measuring-perceived-usability/
  • Newson, K. (2017). Tools and workflows for collaborating on static web projects. Code4Lib Journal, 38. Retrieved from https://journal.code4lib.org/articles/12779
  • Nielsen, J. (2006). Quantitative studies: How many users to test? Retrieved from https://www.nngroup.com/articles/quantitative-studies-how-many-users/
  • Obradovich, A., Canuel, R., & Duffy, E. P. (2015). A survey of online library tutorials: Guiding instructional video creation to use in flipped classrooms. Journal of Academic Librarianship, 41(6), 751–757. doi:10.1016/j.acalib.2015.08.006
  • Pernice, K. (2016). UX prototypes: Low fidelity vs. high fidelity. Retrieved from https://www.nngroup.com/articles/ux-prototype-hi-lo-fidelity/
  • Reynolds, E. (2008). The secret to patron-centered web design: Cheap, easy, and powerful usability techniques. Computers in Libraries, 28(6), 6–47.
  • Swanson, T. A., Hayes, T., Kolan, J., Hand, K., & Miller, S. (2017). Guiding choices: Implementing a library website usability study. References Services Review, 45(3), 359–367. doi:10.1108/RSR-11-2016-0080
  • Symonds, E. (2011). A practical application of SurveyMonkey as a remote usability-testing tool. Library Hi Tech, 29(3), 436–445. doi:10.1108/07378831111174404
  • Young, S. W. H. (2014). Improving library user experience with A/B testing: Principles and process. Weave: Journal of Library User Experience, 1(1). doi:10.3998/weave.12535642.0001.101

Appendix

Qualtrics has an internal ResponseID (rID) variable that it associates with each survey. This identifier is both unique and anonymous, making it an ideal primary key to join together the Qualtrics and server log data. The rID was simply “passed” to the respondent’s browsing session through a query string and subsequently “passed around” as they navigated the prototype website. A query string is a part of a URL that contains extra information in the form of a field-value pair. It would therefore be captured in the resource field of the server log. We designated a field called r that held the respondent’s rID and question number in the survey concatenated together with an underscore. For example, if a respondent’s rID was R_123 and she were interacting with a prototype on question 1, r would be set to R_123_Q1. This field was initially set directly in the src attribute of the iframe. In Qualtrics, the HTML source of the iframe would appear as

<iframe src="https://abtest.library.appstate.edu/var-a/?r=${e://Field/ResponseID}_Q1"></iframe>

We also needed a way to carry this information across a session as the respondent clicked on links to other pages; otherwise, the query string would not persist and the trail would be lost. This was accomplished with some custom JavaScript coding that altered all links to include the field-value pair currently in the URL. For example, if the URL were currently https://abtest.library.appstate.edu/var-a/about/?r=R_123_Q1 , then all links on the page would have “r=R_123_Q1” appended to the href attribute’s value.

/**
* Code created with the help of Stack Overflow question
* https://stackoverflow.com/questions/901115/how-can-i-get-query-string-values-in-javascript
* Question by Deleplace:
* https://stackoverflow.com/users/871134/deleplace
* Answer by Code Spy:
* https://stackoverflow.com/users/1045296/code-spy
*/

function getParameterByName(name) {
 var url = window.location.href;
 name = name.replace(/[\[\]]/g, "\\$&");
 var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
  results = regex.exec(url);
 if (!results) return null;
 if (!results[2]) return '';
 return decodeURIComponent(results[2].replace(/\+/g, " "));
}

function trackRespondentID() {
 var serverName = "https://abtest.library.appstate.edu";
 var anchors = document.getElementsByTagName("a");
 for (var i = 0; i < anchors.length; i++) {
   if (anchors[i].href.substr(0, serverName.length) == serverName && !anchors[i].hasAttribute("role")) {
   anchors[i].href = anchors[i].href + "?r=" + getParameterByName("r");
   } else if (anchors[i].href.substr(0, serverName.length) != serverName) {
  anchors[i].href = "{{ site.url }}/404/?linkClicked=" + encodeURIComponent(anchors[i].textContent) + "&r=" + getParameterByName("r");
  }
}
 var forms = document.getElementsByTagName("form");
  for (var i = 0; i < forms.length; i++) {
   if (forms[i].action.substr(0, serverName.length) == serverName) {
   var input = document.createElement("INPUT");
   var typeAttribute = document.createAttribute("type");
   typeAttribute.value = "hidden";
   input.setAttributeNode(typeAttribute);
   var nameAttribute = document.createAttribute("name");
   nameAttribute.value = "r";
   input.setAttributeNode(nameAttribute);
   var valueAttribute = document.createAttribute("value");
   valueAttribute.value = getParameterByName("r");
   input.setAttributeNode(valueAttribute);
   forms[i].appendChild(input);
  }
}
// on the “404” page, include an empty element with an ID of “msg”
if (getParameterByName("tabClicked") != null) {
  document.getElementById("msg").innerHTML = "Well, we’ll just have to <em>pretend</em> that you did a search for “" + getParameterByName("q") + "” in a form named " + getParameterByName("tabClicked") + "."
}
 if (getParameterByName("linkClicked") != null) {
  document.getElementById("msg").innerHTML = "Well, we’ll just have to <em>pretend</em> that you clicked a link called “" + getParameterByName("linkClicked") + ".”"
  }
}
document.addEventListener("DOMContentLoaded", function(event) {
 if (getParameterByName("r") != null) {
   trackRespondentID();
  }
});

The following R code provides a sample of how to merge data from a Qualtrics CSV with server log data in Common Log Format.

# read in data from csv files
qualtrics.df <- read.csv("survey_data.csv", na.strings = "")[-1:-2, ]
server.log.df <- read.csv("server_log_data.csv", header = FALSE, sep = " ", stringsAsFactors = FALSE)
# convert date-time to POSIXlt object
server.log.df$V4 <- paste0(server.log.df$V4, server.log.df$V5)
server.log.df$V4 <- as.POSIXlt(server.log.df$V4, format = "[%d/%b/%Y:%H:%M:%S%z]")
# filter data containing relevant ResponseIDs
respondent.log <- server.log.df[grep(paste0(qualtrics.df$ResponseId, collapse = "|"), server.log.df$V6), ]
# filter just the resource requested
requests <- lapply(respondent.log$V6, strsplit, split = " ", fixed = TRUE)
respondent.log$resource <- sapply(unlist(requests, recursive = FALSE), "[[", 2)
# clean up column names
respondent.log[c(2, 3, 5, 6)] <- list(NULL)
colnames(respondent.log) <- c("remote_addr", "time_local", "status", "body_bytes_sent", "http_referer", "http_user_agent", "resource")
# list all combinations of ResponseID and questions of interest
expanded.df <- expand.grid(ResponseId = qualtrics.df$ResponseId, task = colnames(qualtrics.df)[5:10])
# create columns for metrics and fill with NAs
expanded.df[, "path_as_list"] <- NA
expanded.df[, "path_length"] <- NA
expanded.df[, "time_on_task"] <- NA
for (row in rownames(expanded.df)) {
  # if the respondent-question pair exists
  task_set <- grep(sprintf("%s_%s", expanded.df[row, 1], expanded.df[row, 2]), respondent.log$resource, fixed = TRUE)
  if (length(task_set)) {
   expanded.df[row, "path_as_list"][[1]] <- list(respondent.log[task_set, "resource"])
   # path_length is calculated by counting the total number of pages loaded
   expanded.df[row, "path_length"] <- length(task_set)
   # time_on_task is calculated by subtracting the date-time of the first page load from the date-time of the last page load
   expanded.df[row, "time_on_task"] <- as.numeric(difftime(respondent.log[tail(task_set, n = 1), "time_local"], respondent.log[task_set[1], "time_local"], units = "secs"))
  }
}
# remove all rows with NA
expanded.df <- na.omit(expanded.df)
# join the data frames on ResponseID
merged.df <- merge(qualtrics.df, expanded.df)