HBS Data Reconciliation 1 — Francesca v Harvard

Refutation of Data Reconciliation Claims by HBS

In an earlier post, I refuted Data Colada’s critique of the PNAS paper by demonstrating that:

Data Colada cherry-picked the data it chose to include in its analysis,
Data Colada excluded a third condition in the study,
Data Colada excluded 2 of the 3 dependent variables in its analysis, and
Data Colada completely misrepresented the Excel calcChain function to buttress their claims.

Together, this allowed Data Colada to create an enormously misleading impression about my work. Importantly, I also demonstrated that if one were to exclude all of the data observations that Data Colada claimed should have been considered “suspicious,” the findings of the study still hold, suggesting there was no motivation for me to manipulate data.

In a more recent post, Data Colada doubled down on their claims, arguing that HBS’ investigation found additional evidence of data manipulation in this particular study.

What is this additional evidence?

In a nutshell, HBS claimed the following:

HBS claimed that it was able to obtain the original (unpublished) dataset for the study in question.
HBS claimed that, with the help of a forensics firm, it was able to compare this original dataset to the dataset published on OSF.
HBS claimed that this comparison yielded significant discrepancies.
HBS claimed that these discrepancies were evidence of fraud on my part.

Now that I’ve been able to enlist the help of my own forensics team, I have been able to conduct my own investigation into this matter. In the analysis below, I offer my findings and refute each of the claims put forth by HBS.

Studies like this one often involve multiple dataset files.

To obtain what it saw as “original” (unpublished) data for this study, HBS contacted the lab manager at UNC (identified here as JF) and asked her to share materials from the study.

Before I proceed, it’s important to repeat a point I made in an earlier post: the original study was actually conducted using paper, so the phrase “original data” is incorrect. So far as I know, the original paper records no longer exist. What does exist are the spreadsheets that were later created by entering paper-based data into a computer.

It is also important to note that there were actually multiple versions of these spreadsheets created during this time period. Why? As anyone who conducts these types of studies understands, once a spreadsheet is created, it may be modified for any number of benign reasons, including:

New data may be added to the dataset as new subjects are run through the experiment.
Existing data may be eliminated from the dataset if it is determined a subject was non-compliant for some reason.
Data may be changed as obvious errors in the dataset are corrected.

With multiple versions of the files, HBS needed to figure out which dataset was the “right” dataset to use. HBS received three files from JF: A version dated July 13, a version dated July 16, and a version dated July 27. As far as I know, these are the only files HBS considered in evaluating “original” data for this study.

NOTE: This was one of many errors HBS made in this investigation: It failed to conduct a comprehensive search for all available versions of the data. If HBS had done a more thorough investigation, it would have considered a fourth file in its investigation, a file that also had a date stamp of July 16. I will not go into the details of why HBS failed to include this fourth file in its investigation, but suffice it to say that it is part of the mountain of evidence my team is amassing about HBS’ mishandling of this matter.

HBS hired a forensics firm (Maidstone Consulting Group) to analyze this data, and Maidstone concluded that the July 16 version was the one to use in its analysis because it included the July 13 data and was included in the July 27 dataset. I will refer to this file as the “July 16 HBS version” of the dataset.

Meanwhile, I will refer to the fourth file as the “July 16 OG version.” Here is a comparison of the metadata from these four files:

HBS used the wrong dataset

It took a while for my team and me to unpack the evidence in this case, but our unequivocal conclusion is that HBS used the wrong dataset in its investigation. They used the July 16 HBS version, when they should have the July 16 OG version.

Before I explain how we came to this conclusion, let me note that I do not believe Maidstone is to blame here. On the contrary—and this is a significant detail—Maidstone repeatedly expressed their lack of certainty that the July 16 HBS version was the dataset they should be relying on for their analysis. In their report, they repeatedly included caveats about this, noting that there was reason to believe there might be additional datasets they were not privy to, and it was therefore possible they were looking at the wrong file. Maidstone also explicitly noted that it was relying on “the Client’s description of provenance” (the client being HBS) in its analysis, in a clear attempt to distance itself from the possibility that the July 16 HBS version was not the correct dataset to be using.

As I said, I believe the July 16 OG file is the version that HBS should have used in its investigation. This file has the following characteristics:

It contains all of data from the July 13th version, except for some data that appears to have been discarded or corrected.
It contains some additional data that that is not included in the July 13th version.
It is completely consistent with the data posted on OSF, with few exceptions (analysis entries with some summary statistics).

The following sections explore each of these characteristics.

The July 16 OG file contains all the data from the July 13 file, except for some data that appears to have been discarded or corrected. Why was any data discarded or corrected?

In studies such as this one, there are multiple valid reasons why a researcher might decide to exclude data from the final analysis: perhaps the participant did not complete the study in its entirety, perhaps the participant did not follow proper instructions, perhaps the participant committed an obvious, major error, etc. Without specific notes or records of the conversations that took place at that time (remember, this was 13 years ago), it is hard to be specific about the details, but there is no doubt in my mind that my RAs and I created criteria for excluding participants we deemed to be non-compliant. As anyone who conducts studies of this nature knows, engaging in this kind of data hygiene is considered perfectly acceptable and consistent with research norms in my field.

Recall that the study in question required participants to complete 20 math puzzles within 5 minutes, and then offered them $1 for each puzzle they reported to have solved correctly. Here are two examples of the discrepancies between the July 13 file and the July 16 OG file:

One participant’s data was discarded because they reported completing one puzzle out of 20 within the allotted five minutes. It is rare that a participant would have only been able to one puzzle in this time frame, suggesting they may not have given the study their full attention. This is something the RA running the study would have made a note of.
One participant’s data was discarded because they reported solving all twenty puzzles, which is also implausible in a study like this one. In other studies using the same task, there have been participants who simply report having solved everything while doing no actual work, which is something that the RA running the study would have made a note of.

As for corrections, a careful review of the data suggests legitimate reasons for each change based on apparent data entry errors (a number entered twice by mistake, a similar-looking number). In Appendix A (coming soon), I will be posting a table with each discrepancy, along with an explanation.

The July 13 dataset included 50 participants. By comparison, in the July 16 OG file, two of those participants were dropped, both for legitimate reasons. Overall, the remaining 48 participants generated 816 different data entries (cells) relevant to the study. A total of 35 (4.1% of all entries) were corrected for the July 16 OG file.

As this comparison demonstrates, any discarding/correcting of the data that occurred in this study was well within conventional research norms.

My investigation also uncovered evidence that this study faced an unusual number of subject errors that needed to be addressed. Consider the email JF sent me on July 13, 2010 (an email that JF shared with the HBS investigating committee). What this email indicates is that the participants in this study were particularly prone to making errors or engaging in behavior that may have disqualified them from analysis, to the point where JF felt compelled to call it out in her email to me.

The July 16 OG file also contains some additional data not included in the July 13 file. What is this additional data?

The additional rows represent additional data from the experimental sessions conducted between July 13 and July 16.

The July 16 OG file is completely consistent with the OSF dataset, with few exceptions. What are the exceptions?

The OSF dataset includes some additional columns with additional computations and formulas. That’s the only difference between these two files.

So what is the evidence that the July 16 OG file is actually the correct one?

This is where it gets interesting. As I dug into the evidence in this case and started to pore over these different dataset files, I soon became 100% convinced that what I’m calling the July 16 OG file is, in fact, the file HBS should have used in its investigation. How did I know this with such certainty? Because in my heart, I know I did not commit fraud.

But I still had to figure out a way to prove it. This kept me up at night for weeks… until I remembered something important:

At UNC, we paid subjects for their participation in our studies. We did not pay participants if we considered them to be non-compliant.

Moreover, in order for a faculty member such as myself to receive reimbursement for these payments, UNC required careful documentation of these payments—including subject names, addresses, ID numbers, amount paid, and so on.

It is with immense relief and satisfaction that I can now share with you the fact that I have been able to locate the receipts for this study. Here’s a sample showing the format of these receipts. It took me a while to reconstruct all of these payments properly because the receipt sheets I kept from that time period included multiple studies I was running at the same time. Still, by cross-matching data (amount paid, gender, etc.) I have been able to establish that the July 16 OG version – which is consistent with the OSF published data – is the dataset that most accurately reflects the participants who were paid for their participation in the study. In other words, this dataset most accurately reflects the participants who legitimately complied with, and completed, the study.

By contrast, the July 16 HBS version of the dataset does not jibe with the payment data. For instance, according to the July 16 HBS version, three participants should have received $7, but the subject receipts indicate that only two subjects were paid $7. Similarly, the July 16 HBS version says 14 subjects were paid $16, but the payment receipts say only 11 were paid that amount. In contrast, the July 16 OG file gets both those numbers exactly right.

To repeat: I now have demonstrable proof that the file that HBS used in its investigation was an illegitimate dataset. That’s right, HBS neglected to use the correct dataset. The correct dataset—the July 16 OG file—is 100% consistent with the OSF dataset.

This is a critical piece of evidence that my team plans to reveal, in its entirety, when the time is right.

Lingering Questions

I realize there may be a few additional lingering questions about this study, which I will attempt to answer here.

If the July 16 OG file is the legitimate dataset, why did the July 16 HBS Version even exist?

As I described earlier, when studies like this are run, there are often multiple versions of spreadsheets circulating among the RAs and principal investigators as part of the legitimate process of correcting, cleaning and consolidating data. I don’t have a precise recollection about this specific file, but it seems entirely possible that the July 16 HBS version was simply one that did not have the appropriate corrections.

So does that mean that readers of this post should simply ignore the significant discrepancies the HBS investigative team found between the July 16 HBS version and the OSF published dataset?

The simple answer to this question is yes, but I realize there may still be skeptics out there who have trouble disregarding these discrepancies.

To address those skeptics, I’ve conducted an in-depth analysis of these discrepancies, which I am including in Appendix B (coming soon). This analysis demonstrates that even these discrepancies are not incriminating upon close inspection. Even if you were to cling to the misguided notion that the July 16 HBS version is the “correct” one to be using in comparison with the OSF dataset, you would discover—upon rigorous examination of each and every discrepancy—that they are far less suspicious than they have been portrayed.

Let me give you a sample of what I’m talking about:

(As a reminder, the study in question required participants to complete 20 math puzzles and then offered them $1 for each puzzle they reported to have solved correctly within 5 minutes.) In one of the observations, a participant reported completing 25 puzzles when the maximum was 20 (clearly an error). In another observation, a participant reported solving fewer puzzles than they actually solved (clearly an error). Again, in Appendix B, I go through every single one of the discrepancies and offer an explanation for why somebody – most likely, an RA –concluded there was a legitimate reason to correct the data.

At the same time, it is also important to reiterate: The dataset associated with this file—the July 16 HBS Version—is not the one that was ultimately used for the analysis in the PNAS paper. So it is perhaps not surprising that the dataset is imperfect or flawed – we did not use it.

What about the July 27 version of the dataset? Why did this version exist?

I don’t quite know the purpose of this file. For one, it lived on JF’s computer, not mine. However, a close look at the metadata for this file shows that the “last modified” date stamp on it is 2022. (The other files have date stamps from 2010.) So perhaps this was a copy JF made of an older file on her laptop? I simply do not know.

A final aside

The Data Colada - HBS linkage has become a source of dark comedy among my team. If you read the Data Colada posts carefully, you will see that time and time again, whenever they need to bolster their argument and strengthen their persuasive impact, they will make reference to HBS’s investigation, which they imply was much more comprehensive than anything they are showing their readers. Meanwhile, if you read HBS’s investigative report, you will see error-after-error-after-error in its handling of this matter.