/ Stakes, Scales, and Skepticism

## Abstract

There is conflicting experimental evidence about whether the “stakes” or importance of being wrong affect judgments about whether a subject knows a proposition. To date, judgments about stakes effects on knowledge have been investigated using binary paradigms: responses to “low” stakes cases are compared with responses to “high” stakes cases. However, stakes or importance are not binary properties—they are scalar: whether a situation is “high” or “low” stakes is a matter of degree. So far, no experimental work has investigated the scalar nature of stakes effects on knowledge: do stakes effects increase as the stakes get higher? Do stakes effects only appear once a certain threshold of stakes has been crossed? Does the effect plateau at a certain point? To address these questions, we conducted experiments that probe for the scalarity of stakes effects using several experimental approaches. We found evidence of scalar stakes effects using an “evidence-seeking” experimental design, but no evidence of scalar effects using a traditional “evidence-fixed” experimental design. In addition, using the evidence-seeking design, we uncovered a large, but previously unnoticed framing effect on whether participants are skeptical about whether someone can know something, no matter how much evidence they have. The rate of skeptical responses and the rate at which participants were willing to attribute “lazy knowledge”—that someone can know something without having to check—were themselves subject to a stakes effect: participants were more skeptical when the stakes were higher, and more prone to attribute lazy knowledge when the stakes were lower. We argue that the novel skeptical stakes effect provides resources to respond to criticisms of the evidence-seeking approach that argue that it does not target knowledge.

## 1. Background: Experimental Studies of Knowledge and Stakes

Is it easier to know something when very little is at stake in being right or wrong? If one believes that knowledge is sensitive to stakes, it might be the case that it is indeed easier to know trivial things. The view that knowledge is sensitive to stakes holds that whether a subject knows some proposition p depends on how much is at stake for that subject in being right or wrong about p; if knowledge is sensitive to stakes, it could be the case that the more that is at stake, the harder it is for the subject to know that p.

Advocates of the stakes-sensitivity of knowledge have assumed that their own intuitions that it is easier to know that something is the case in lower-stakes situations than in higher-stakes situations are representative of how ordinary people (that is, non-philosophers) would judge those cases. Experimental attempts to verify that assumption have been mixed, however. While May et al. (2010) found evidence of an effect of stakes on knowledge, Buckwalter (2010) and Feltz and Zarpentine (2010) did not. Subsequent studies made the empirical case for the stakes-sensitivity of knowledge look more promising: Sripada and Stanley (2012), Pinillos (2012), and Pinillos and Simpson (2014) all found evidence that participants’ judgments of when an individual knew that p were sensitive to stakes. But a large cross-cultural study of judgments about cases involving the stakes-sensitivity of knowledge found effects only for Spanish, Japanese, and UK participants, and not for any of the other 16 sampled nationalities (Rose et al. 2019). Theoretical challenges to findings of stakes sensitivity have also been raised. Buckwalter (2014) and Buckwalter and Schaffer (2015) target what appear to be the strongest evidence in favor of the stakes-sensitivity of knowledge, namely Pinillos’s “evidence-seeking” studies (Pinillos 2012; Pinillos & Simpson 2014). These arguments make the case that the appearance of stakes effects on knowledge are actually stakes effects on other features that appear in the experimental prompts, so no conclusions about knowledge per se follow from the experimental data.[1]

In this paper, we aim to advance our understanding of the empirical foundations of the stakes sensitivity of knowledge by looking at an aspect of the interaction of stakes and knowledge that has not received any experimental attention, namely the scalar nature of stakes. To date, stakes effects on knowledge have been investigated using binary paradigms: responses to “low” stakes cases are compared with responses to “high” stakes cases. However, whether a situation is low or high stakes is not a binary property but a scalar property: whether a situation counts as “high” or “low” stakes comes in degrees and depends on what it is being compared with (Anderson & Hawthorne 2019; Hansen 2014). No experimental work has investigated the scalar nature of stakes effects on knowledge: Do stakes effects increase as the stakes get higher? Do stakes effects only appear once a certain threshold of stakes has been crossed? Do stakes effect plateau at a certain point?

To address these questions, we conducted experiments that probe for the scalarity of stakes effects using several experimental approaches.[2] In our first experiment, which adopts the classic “evidence-fixed” design employed in the earlier experimental studies of stakes effects on knowledge (e.g., Sripada & Stanley 2012), we ask participants to rate their level of agreement with claims that S knows that P or S doesn’t know that P. To anticipate our results: Across several epistemic scenarios that vary the type of stakes, from personal injury to reputation, we did not find any evidence of the stakes effects on judgments about knowledge, even when comparing relatively low and relatively high points on the scale of stakes. Since this failure to find an effect is at odds with the effect reported in Sripada and Stanley (2012), we conducted a pre-registered replication of Sripada and Stanley’s study. Our replication did find a small effect of stakes on knowledge using the evidence-fixed design, but in a different condition than Sripada and Stanley’s study.

The second series of experiments we conducted employed the “evidence-seeking” approach developed in Pinillos (2012). We asked participants to judge how much evidence a subject needs to collect before she counts as knowing various propositions. Results from the evidence-seeking experimental design revealed stakes effects across multiple scenarios and indicate that there is variability in the structure of stakes effects when different scales of stakes are at issue (e.g., when number of lives or money or degrees of embarrassment are at stake).

In addition, by testing both positive and negative polarity versions of the evidence-seeking prompts in an attempt to identify a threshold for ascribing knowledge, we also were able to uncover a large framing effect on participants’ willingness to say that a subject in a scenario can never know that something is the case: Participants tended to respond to the negative prompt (“How many times can S check F and still not know that P?”) by saying that S can never know that P at much higher rates than in the equivalent positive prompt (“How many times does S need to check F before she knows that P?”). Finally, we found evidence that the rate at which participants gave skeptical “never” responses in the negative frame and the rate at which participants gave “lazy knowledge” responses (S knows without having to check) in responses in the positive frame were themselves subject to a stakes effect in some of our evidence-seeking experiments. We argue that a stakes effect on skeptical “never” responses provides a new way of responding to an important theoretical criticism of the evidence-seeking approach to uncovering a stakes effect on knowledge.

## 2. The Sensitivity of Knowledge to Stakes

Intellectualists about knowledge hold that “factors in virtue of which a true belief amounts to knowledge are exclusively truth-relevant, in the sense that they affect how likely it is that the belief is true” (DeRose 2009: 24). Those who believe that knowledge is stakes-sensitive are a type of anti-intellectualist about knowledge: anti-intellectualists “hold that whether a subject knows something or not depends in part on such non-truth-relevant ‘practical’ matters as the cost (to the subject) of being wrong” (DeRose 2009: 25; see also Gerken 2017: 34). Holding that knowledge is sensitive to stakes is only one way of being an anti-intellectualist. For example, one might hold that whether some subject knows something depends partly on whether various possibilities are salient to the subject (Hawthorne 2003; see Dinges 2017 for discussion), or on how much time is available to the subject (Shin 2014). Our focus in this paper will be on stakes-sensitivity about knowledge, the view that whether a subject knows some proposition p depends on how much is at stake for that subject in being right or wrong about p.

Anderson and Hawthorne (2019) precisify the notion of stakes at work in theories of the stakes sensitivity of knowledge (usually characterized as the stakes of being wrong about some proposition p) in the following way:

A natural place to start is to articulate a measure of how much turns on p in performing a certain action A (where the core ideology focuses on a three-place relation between agents, actions, and propositions). Henceforth, we shall call this the ‘p-stakes of the action.’… the p-stakes of an action is a matter of the gap between the utility of what would happen if one performed that action and p were the case, and what would happen if one performed that action and p were not the case.[3]

To illustrate the notion of the “p-stakes” of an action, consider the following action, proposition, and contrasting pair of scenarios:

 Action: Leaving the apartment without checking to see if I turned the stove off. p: The stove is turned off. Not-so-bad-scenario: I’m going out to check the mail, and I’ll come back in five minutes, when I can check to see if the stove is off. Bad scenario: I’m going out of town for a week and can’t check to see if the stove is off until I’m back (and I live alone).

Compare the gap between the utility of leaving the apartment without checking to see if I turned the stove off given p versus given not p in the bad scenario versus the not-so-bad-scenario:

Bad scenario: if p is the case, my apartment does not explode because I left the gas on, while if p is not the case, it does.

Not-so-bad-scenario: if p is the case, my apartment does not explode because I left the gas on, while if p is not the case, it smells bad and I get a headache.

The greater differential in utility between outcomes in the bad scenario is what constitutes its “higher stakes” status. The stakes-sensitivity of knowledge can thus be understood as the claim that all else being equal, whether a subject is in the bad, high-stakes scenario or the not-so-bad, low-stakes scenario can make a difference to whether the subject knows that p.

## 3. What is a Scalar Stakes Effect on Knowledge?

Both the ordinary notion of “stakes” and the precisified notion given in the previous section are scalar, rather than binary, properties. That is, they admit of degrees of application beyond 0 and 1, and those degrees can be compared using expressions like “x is a higher-stakes scenario than y”. It is possible to arrange scenarios in terms of increasing stakes. For example, the “bad” scenario discussed above is nowhere near as bad as things could get. For example, suppose that while I’m on vacation, I leave my cat at home with an automatic feeder. The utility of not checking to see if the gas is still on, given not-p, is that she’s blown to smithereens along with my apartment. The stakes in that scenario are therefore even higher than the “bad” scenario. One can imagine how things could be even worse (imagine if I had two cats)—and thereby also how the stakes could be even higher. There is no obvious upper bound to the stakes scale—for any given “high stakes” scenario, there will be a scenario that will have even higher stakes. There is therefore no absolute notion of “high” stakes: whether stakes are high is a relative notion (Kennedy & McNally 2005).

Given that the notion of stakes is scalar, are stakes effects on knowledge (if there are any) also scalar? As stakes go up, how are judgments about knowledge affected? Do people attribute knowledge less and less as the stakes go up, or is there a “plateau” beyond which further increases in stakes stop affecting judgments about knowledge? How large do the differences in stakes have to be before one of the scenarios counts as “high” vs. “low” stakes?

In order to answer these questions, we designed experiments that created a variety of stakes scales, along which the stakes of being wrong about a particular proposition varied. To return to the example concerning whether the stove was left on, discussed in the previous section, instead of just a “bad” (high) and a “not-so-bad” (low) scenario, it could be supplemented with more scenarios that fill out the relevant stakes scale, as follows:

 Action: Leaving the apartment without checking to see if I turned the stove off. p: The stove is turned off. Stakes 1, Not-so-bad-scenario: I’m going out to check the mail, and I’ll come back in five minutes, when I can check to see if the stove is off. Stakes 2, Bad scenario: I’m going out of town for a week and can’t check to see if the stove is off until I’m back (and I live alone). Stakes 3, Very bad scenario: I’m going out of town for a week and can’t check to see if the stove is off until I’m back, and I live with a cat. Stakes 4, Terrible scenario: I’m going out of town for a week and can’t check to see if the stove is off until I’m back, and I live with a cat, a gray parrot, and I have several unpublished papers by J. L. Austin in my library, one of which contains a heretofore unknown and totally convincing response to external world skepticism.

One possibility that becomes clear when the stakes scale is expanded beyond two degrees is that the difference in stakes between any two adjacent points on the scale (between “not-so-bad” and “bad”, for example) might not be big enough to trigger a stakes effect on knowledge. Previous experiments which failed to uncover a stakes effect, all of which only use two points on a stakes scale, might simply have failed to pick points on the stakes scale that were distinct enough for a stakes effect to show up.

In order to examine the scalarity of stakes effects on judgments about knowledge, we designed two experiments. The first experiment used “evidence-fixed” prompts: participants were asked, across four different points on the relevant stakes scale for six different scenarios (concerning paramedics racing to the scene of an accident, scientists checking the formula for a vaccine, mountaineers checking their climbing rope, participants on a game show thinking about an answer to a question, a moderator for a talk looking up the pronunciation of a guest speaker’s name before introducing them, and a homeowner checking on her home sprinkler system in response to the threat of an arsonist), to agree or disagree with sentences attributing knowledge and denying knowledge to the protagonist of each scenario. In each scenario, the amount of evidence available to the protagonist remained fixed, while the point on the stakes scale was varied. The first experiment is discussed in Section 4. We also conducted a pre-registered replication of Sripada and Stanley’s (2012) “evidence-fixed” study, which will be discussed in relation to our first experiment, and the details of which are presented in Appendix II.

The second experiment used the “evidence-seeking” prompts introduced in Pinillos (2012). Instead of asking participants to agree or disagree with statements that the protagonist knows or doesn’t know a proposition, participants were asked to indicate how much evidence the protagonist would have to gather in order to know that the relevant proposition is true (in the positive polarity condition), or how much evidence the protagonist could gather and still not know the relevant proposition (in the negative polarity condition). The scenarios and points on the relevant stakes scales were the same as in the “evidence-fixed” experiment. The second experiment is discussed in Section 5.

## 4. Experiment 1: The “Evidence-Fixed” Design

### 4.1. Experimental Materials

Six scenarios were developed in which different types of stakes were manipulated (lives; physical injury; embarrassment; money; damage to objects of personal value). Four versions of each scenario were created in which the stakes were scaled in magnitude (see Supplementary Material for all scenarios and versions). For example, the ‘vaccine’ scenario involves changing the number of lives at stake should a vaccine be made incorrectly and administered to research participants:

Stakes 1: Low

Elaine is a medical researcher. Her task is to create a vaccine for a virus. Elaine has done this before, and she has a check list that specifies all of the steps she needs to take to make the vaccine. Elaine is following all of the steps correctly.

Elaine’s assistant has informed her that there is one human research participant who has volunteered to trial the vaccine before it is distributed more widely. If Elaine does not follow the steps correctly, it will produce an ineffective combination that when administered to the research participant will give them mild cold-like symptoms

In the above ‘low’ stakes version of the scenario, one individual will experience mild symptoms. These stakes are then incrementally raised in the remaining versions of the scenario:

Stakes 2

one human research participant who has volunteered to trial the vaccine before it is distributed more widely. If Elaine does not follow the steps correctly, it will produce an ineffective combination that when administered to the research participant will kill him within days

Stakes 3

15 human research participants who have volunteered to trial the vaccine before it is distributed more widely. If Elaine does not follow the steps correctly, it will produce an ineffective combination that when administered to the research participants will kill them all within days

Stakes 4: High

100 human research participants who have volunteered to trial the vaccine before it is distributed more widely. If Elaine does not follow the steps correctly, it will produce an ineffective combination that when administered to the research participants will kill them all after several days of excruciating pain

After reading each scenario, participants were asked to respond to a knowledge prompt along a 7-point Likert-type scale (the endpoints of the Likert scale were labeled as follows: 1-Strongly disagree, 7-Strongly agree). Whether participants received a negative or positive knowledge prompt was determined by initial condition assignment (positive polarity condition; negative polarity condition). For example, having read the vaccine scenario (above) participants in the positive polarity condition were asked to rate their level of agreement with the statement:

Elaine knows that she is making the vaccine correctly

Participants in the negative polarity condition were asked to rate their level of agreement with the statement:

Elaine doesn’t know that she is making the vaccine correctly

To exclude any participants who failed to understand the experimental instructions, all participants had to respond correctly to a control question at the beginning of the experiment (Prompt 1) and a control question at the end of the experiment (Prompt 2) in order to be included in the data analysis. Participants who responded “agree” or “strongly agree” to Prompt 1 were removed and participants who responded “disagree” or “strongly disagree” to Prompt 2 were removed.[4]

Prompt 1

You have a fair coin, with heads on one side and tails on the other, that you flip into the air and catch on the back of your hand without looking at it.

Please indicate whether you agree or disagree with the following statement about the scenario:

You know the coin landed heads

1 (strongly disagree) – 7 (strongly agree)

Prompt 2

You have a fair coin, with heads on one side and tails on the other, that you flip into the air and catch on the back of your hand without looking at it.

Please indicate whether you agree or disagree with the following description of the scenario:

You don’t know that the coin landed heads

1 (strongly agree) – 7 (strongly disagree)[5]

### 4.2. Procedure

Participants in both the positive and negative polarity conditions were presented with six scenarios in four different degrees of stakes each (24 pairs of scenario + degree of stakes) in a randomized block design. Using this design ensures that all scenarios occur once in a sequence (or block) before any of them is repeated.

### 5.3. Hypothesis

If knowledge is sensitive to stakes, we expect to find some effect of changing stakes on responses to a given polarity in a given scenario. For example, we should find some significant difference in responses to the “low” and “high” stakes conditions for positive or negative polarity prompts in at least one of the six scenarios we considered.

Regarding the effect of polarity, we do not expect to find a significant difference between responses to the negative and positive polarity prompts. That’s because participants should be searching for the threshold of how much evidence is required for knowledge when presented with either the positive or the negative prompts. To explain why, consider a particular example of a positive prompt in the “high” stakes version of the vaccine scenario:

Positive: How many times does Elaine need to consult her check list before she knows that she is making the vaccine correctly?

If participants respond to the positive prompt with 3 times, they should give a similar number in response to the negative prompt in the same combination of scenario and degree of stakes:

Negative: How many times can Elaine consult her check list and still not know that she is making the vaccine correctly?

It would be unexpected to find that participants would say that Elaine could consult her check list a significantly different number of times and still not know that she’s making the vaccine correctly while saying that she needs to consult her check list only 3 times before she knows that she’s making the vaccine correctly.[9]

### 5.4. Results

Overall and in line with our hypothesis, there were effects of changing stakes on responses across both the negative and positive polarities in all scenarios (in the possessions scenario, this change in responses was observed in the positive polarity only) (see Figure 3).[10] Across scenarios, as the stakes increased, there was a general pattern of participants stating that more evidence would be needed. In terms of the scalarity of these stakes effects, there was a clear pattern of lower stakes scenarios requiring less evidence than higher stakes scenarios (for a full summary of scenario and stakes scales analyses, see Appendix III). We did not observe any significant difference in responses to positive and negative polarity prompts (for a full summary of individual statistics for each scenario, see Appendix III).

In the evidence-seeking task, we also collected two additional types of response from participants:

enter a whole number: 1, 2, 3... etc. If you think Elaine knows without having to check, write "0" (in the positive polarity condition). If you think Elaine will never know no matter how many times she checks, write "never" (in both the positive and negative polarity conditions)

Previous papers simply discarded these “never” responses from further analysis (Pinillos & Simpson 2014: 21) and in terms of zero responses, our analyses did not include values less than or equal to 0.[11]

Upon closer inspection, we noticed something unexpected about these responses: There was a much larger number of “never” answers given in response to the negative polarity prompt concerning how much evidence a protagonist could have and still not know that p than in response to the positive polarity prompt. “Never” answers made up 22% of the overall responses in the negative polarity group but only 2% of responses in the positive polarity group (see Figure 4). In effect, many more participants were responding to the prompt “skeptically”—that is, responding that they thought that no matter how many times she checks, S will never know that p.[12] We also observed a larger percentage of “0” responses to the positive (8.5%) than in the negative prompts (4%), but that isn’t surprising given that participants weren’t explicitly given the option to respond with “0” if the subject knows without having to check in the negative polarity prompts (see Figure 4). The meaning of a “0” response when given in response to a positive vs. a negative polarity prompt is therefore probably different: in response to a positive prompt such a response means S knows without having to check; in response to a negative prompt, it’s not clear what a “0” response means.

Once we noticed these large disparities in “never” and “0” responses between the positive and negative polarity prompts, we wondered whether there might be a stakes effect on the rate at which participants gave these responses. If knowledge is sensitive to stakes, it’s possible that the number of these declarations (S will never know no matter how many times she checks and S knows without having to check) will vary when the stakes vary.

In order to investigate this, we analysed the frequency of “never” and “0” responses in each stakes scale across all scenarios. We found no main effect of stakes in global analysis across all stakes scales for “never” responses (see left-hand side Figure 5).[13] But we did find an overall pattern of participants being less likely to say that S knows P without having to check (“0”) as the stakes increased in the positive polarity (see right-hand side Figure 5).[14]

### 5.5. Discussion

Using the evidence-seeking approach, we found evidence of a stakes effect: As the stakes were raised, individuals stated that the protagonist would need to gather more evidence in order to know something or that she could gather more evidence and still not know something. This finding was consistent across the majority of our scenarios (bar the scenario involving personal possessions in which a stakes effect was found for the positive polarity only). We further found evidence that these stakes effects were scalar, not binary. In the majority of scenarios, participants stated that less evidence would be required by the protagonist in stakes 1 [low] cases when compared to the stakes 4 [high] cases. However, the details of how degrees on the scale of stakes affected knowledge judgments varied across different scenarios. For both the paramedic and vaccine scenarios (the vignettes in which lives were at stake), the stakes 1 [low] case was significantly different to the stakes 3 case with the amount of evidence plateauing after this point. That could be due to participants reaching a saturation point of how much the number of lives at stake affect knowledge. In the mountaineering scenario, the stakes 1 [low] case was significantly different to all of the subsequent stakes cases (i.e., stakes 2, stakes 3, and stakes 4 [high]). In the possessions scenario, the reverse pattern was found with the stakes 4 [high] case being significantly different to all previous stakes cases (i.e., stakes 1 [low], stakes 2, and stakes 3). Finally, in the game show scenario, the stakes 1 [low] case was significantly different to the stakes 3 and stakes 4 [high] cases. But the stakes 2 case was also significantly different to the stakes 4 [high] case.

The variability we observed in scalar stakes effects across the different scenarios is unsurprising, given that (i) the scenarios differed in their details, including what the scale of stakes was measuring (lives, money, embarrassment, non-monetary possessions, personal injury), and (ii) we relied on our own intuitive, non-systematic sense of what would make for noticeable differences between different degrees of stakes on each scale. Future investigations of scalar stakes effects could systematically construct particular types of stakes scales in order to evaluate more precisely how the shape of particular stakes scales affect knowledge judgments. For example, according to Kahneman and Tversky’s (1979) prospect theory, people experience monetary losses and gains with diminishing sensitivity: moving from $5 to$10 is experienced as a bigger gain than going from $95 to$100. And people are “loss-averse”—losing $5 hurts more than gaining$5 feels good. With these factors in mind, it would be possible to more systematically construct monetary stakes scales with the aim of making the differences between degrees on those scales more regular. That would also serve to bring the discussion of stakes understood as differences in expected utility (as on the Anderson and Hawthorne proposal discussed in §2) into contact with what we know about human decision making under risk and uncertainty. That is a potentially rich area for future experimental work on stakes effects.

With regards to the “never” responses, we found a large framing effect in the different polarities: people were more likely to respond with “never” (S can never know that P) when presented with a negative prompt. That result is surprising. We predicted that there should be no significant difference in numerical responses to the positive and negative polarity prompts, so we assumed that there should be no significant difference between the “never” responses to the positive and negative polarity prompts as well. What could be driving this framing effect? One possibility is that participants are being encouraged by the negative prompt to consider possible situations in which someone could fail to know that P in spite of having a great deal of evidence that P is the case. When that possibility is suggested, it might appear reasonable to respond that S will never know, no matter how much evidence she has, because she can’t rule out the possibility that she is in such a situation. When confronted with the positive polarity prompt, in contrast, participants are not being encouraged to consider such a situation.[15]

The fact that we found a stakes effect on “0” responses in the positive polarity group is a previously unnoticed stakes effect: people are less likely to say that S knows P without having to check as the stakes increase. Although we did not find a significant stakes effect on the “never” responses, there was a pattern of people stating that S can never know that P in greater numbers as the stakes were raised. As we will discuss below, in a follow-up experiment we did observe a stakes effect on “never” responses.

In order to illuminate these unexpected findings, we ran two follow-up studies that addressed two worries that we had with the existing experimental design:

1. The existing negative prompt included the phrase (in bold): “how many times can S check and still not know that P” which may have triggered the presupposition that the protagonist does not know that P, which might have contributed to the framing effect on “never” and “0” responses.
2. The option to “write ‘0’ if you think S knows without having to check” was only included after positive prompts (not negative prompts) which may have contributed to the framing effect on “never” responses by giving the positive polarity group a different response anchor.

To address our first concern, we ran a follow-up experiment (Symmetrical Experiment) which replicated the existing paradigm but with symmetrical negative and positive prompts, removing the phrase “still not know” (see Appendix IV for full experimental details). And to address our second concern, we ran another follow-up experiment (Matched Experiment) which replicated the design of the Symmetrical Experiment but simply removed the option to “write ‘0’ if you think S knows without having to check” from the positive polarity prompts (see Appendix IV for full experimental details).

The framing effect on “never” responses was preserved across both experiments when controlling for the wording of the prompt and when removing the additional prompt option in the positive polarity (see Figure 6).

In the second, Symmetrical, experiment, we found a stakes effect on “never” responses (see figure 7).[16] (For full analyses, see Appendix IV.)

The finding of a stakes effect on “never” responses is important because it offers a response to a powerful theoretical objection that has been made to the use of evidence-seeking experiments as a way of investigating stakes effects on knowledge. Buckwalter and Schaffer (2015: 214) argue that the reason that changing stakes affect judgments about evidence-seeking prompts is that the evidence-seeking prompts contain a deontic modal (“need”, or “has” in Pinillos 2012), and it is uncontroversial that what is at stake affects judgments about what someone needs to do; it is the effect of changing stakes on the deontic modal that is driving participants’ responses, rather than any effect of stakes on knowledge. However, when participants respond with “never”, they are responding to a secondary prompt that does not contain a deontic modal, namely:

If you think Elaine will never know no matter how many times she checks, write "never".

Since we found evidence that the rate at which participants respond with “never” is itself sensitive to stakes, that looks like evidence of a genuine stakes effect on participants’ judgments about whether someone can ever know that p which cannot be explained as an effect of stakes interacting with an interpretation of a deontic modal.[17]

We offer this as a tentative response only, because we only found a stakes effect on skeptical “never” responses in one out of three of the evidence-seeking experiments we conducted. In the third and final Matched Experiment that we ran, as in the first experiment, we did not find a stakes effect on skeptical “never” responses. (See Appendix IV for detailed analyses.) The mixed findings across our three evidence-seeking experiments regarding the stakes effect on skeptical “never” responses might arise from the fact that the effect is small—this is something that will have to be addressed in future research designed specifically to investigate stakes effects on skeptical judgments.

A referee pointed out that our finding of a stakes effect on both positive and negative knowledge prompts (“S doesn’t know that P”) also presents a challenge to Buckwalter and Schaffer’s objection to the results of evidence-seeking experiments. The negative prompt, “How many times can S [check] and not know that P?” contains a modal expression (“can”), but it’s not a deontic modal—it’s either an ability modal or a modal expressing metaphysical possibility. While stakes clearly affect what one should do (for example, in how we interpret the deontic modal “have” in the positive prompt), stakes don’t obviously affect what one is able to do, or what is possible in a metaphysical sense. The stakes effect we found on responses to the negative prompts therefore isn’t easily explained in the same way that Buckwalter and Schaffer explain the stakes effect on positive prompts in evidence-seeking experiments.

A stakes effect on “0” responses was found in the Symmetrical Experiment (see analyses in Appendix IV) although this time, the effect was seen across both prompts. So, participants were more likely to respond with “0” in lower stakes scenarios when presented with both positive and negative prompts.

As well as allowing us to investigate the “never” responses further, these follow-up experiments also served as opportunities to replicate the stakes effects found in the first evidence-seeking experiment. Overall, we found that the stakes effects observed in our first evidence-seeking experiment replicated across one or both of the follow-up experiments (for a full summary of individual scenario analyses see Appendix IV), with the paramedic scenario (number of lives at stake) and the game show scenario (amount of money at stake) consistently producing stakes effects across all three of the evidence-seeking experiments.

## 6. General Discussion and Conclusions

### 6.1. Evidence-Fixed versus Evidence-Seeking Results

Using the classic evidence-fixed design employed in earlier experimental studies of stakes effects on knowledge, we did not find evidence of a stakes effect on judgments about knowledge across several epistemic scenarios.[18] However, in a second series of experiments which employ the evidence-seeking approach developed in Pinillos (2012), we did find evidence of a stakes effect across multiple scenarios.

Based on our findings and the results of previous studies, there are two types of competing explanations for this pattern of negative and positive results in the two types of experiment. The first type of explanation, which is favorable to the existence of genuine stakes effects on knowledge, is that there is some feature of the evidence-fixed design that obscures such a stakes effect. Pinillos (2012: 198) and Sripada and Stanley (2012: 10) hypothesize that stakes effects on knowledge exist, but they can be difficult to observe in evidence-fixed experimental designs because participants assume that protagonists in higher stakes situations will have gathered more evidence than those in lower stakes scenarios, leading to a tendency to judge that subjects know that P at greater rates in higher stakes situations, which would suppress any effect of higher stakes lowering the tendency to judge that subjects know that P. The possibility that participants may be revising their sense of how much evidence the protagonist has upwards in the high stakes case could therefore potentially explain why participants are equally likely to agree with the statement that the protagonist knows in the high stakes case as in the low stakes case.

Another factor that might be obscuring an underlying stakes effect in the evidence-fixed design is that the quality of the subject’s evidence that P is so high in all conditions (from low to high) that stakes aren’t having an observable effect on judgments about whether the subject knows that P, because judgments that the subject knows will already be at or near ceiling. The data represented in Figure 1 is compatible with this possibility, since agreement with “S knows that P” is consistently high across scenarios and degrees of stakes, and agreement with “S doesn’t know that P” is consistently low across scenarios and degrees of stakes. It’s possible that scenarios in which the subject’s evidence is lower quality might leave room for the stakes effect to show up in participants’ responses.[19]

A second type of explanation of the divergent patterns of results is not favorable to the existence of genuine stakes effects on knowledge. One version of this type of explanation holds that the failure to detect a stakes effect in most evidence-fixed experiments is because there is no stakes effect on knowledge, while the finding of an effect in the evidence-seeking experiments arises not from a stakes effect on knowledge, but a stakes effect on the deontic modal (“have”) that appears in the evidence-seeking prompt (Buckwalter & Schaffer 2015).

Our finding of a stakes effect on sceptical “never” responses in our second “Symmetrical” experiment, and our finding of a stakes effect on responses to the negative polarity prompts provide a novel response to this explanation: since neither the prompt to which the “never” responses are directly offered, nor the negative polarity prompts, contains a deontic modal, the observed stakes effect cannot be explained away as the effect of stakes on a deontic modal.[20]

Another version of the second type of explanation (which is not favorable to the existence of a stakes effect on knowledge) is proposed in Gerken (2017). Gerken explains the apparent stakes effect in Pinillos’s evidence-seeking experiments as resulting from what he calls an “Epistemic Actionability-Proxy” heuristic, which leads participants to interpret the prompts in the evidence-seeking design as “concerning how much evidence S should gather before it is reasonable to act on P, rather than concerning the nature of knowledge” (2017: 271).

While Gerken’s explanation accounts for the stakes effect we found in responses to positive polarity prompts, it doesn’t easily account for our finding of a stakes effect on “never” responses, and on responses to the negative polarity prompts (“How many times can S check and not know that P?”). Gerken might argue that “never” responses indicate that participants think that the subject in the scenario should never perform the relevant action (checking that the steps for creating the vaccine have been correctly carried out, e.g.), but that strikes us as implausible (see Footnote 17, above). Also problematic for Gerken’s explanation is the fact that we found a stakes effect in response to negative polarity prompts, which can’t be interpreted as asking how much evidence S should gather before it is reasonable to act on P. He could potentially argue that the negative prompts are proxies for a question about how much evidence S could gather and yet not act on P, but as Nagel (2011) discusses, sentential negation (such as occurs in our negative prompts) generally triggers effortful type-2 processing, which would interfere with Gerken’s heuristic-based (type-1) explanation of responses to the evidence-seeking prompts.

Like Gerken, Nagel (2008) provides an explanation of stakes effects in terms of a psychological effect that is not (directly) a stakes effect on knowledge, namely “need-for-closure”. “Closure” is the arrival at a settled belief; prior to closure, one’s mental state is “open” or non-committed (2008: 287). “Low need-for-closure” is a state in which a subject is “strongly averse to inaccurate or premature judgment, as in [high-stakes scenarios]”, while subjects in low-stakes scenarios are in a state of “neutral” need-for-closure (2008: 288). Subjects who have a lower need-for-closure seek more evidence before settling on a belief, and are characterized by lower degrees of confidence in the belief even once settled. Given that subjects in high-stakes scenarios are also in a state of low need-for-closure, Nagel argues that stakes effects might be driven by the fact that it is belief formation that is directly sensitive to stakes, while knowledge is only sensitive to stakes indirectly (assuming belief is a component of knowledge). Nagel’s competing explanation of stakes effects in terms of need-for-closure is not ruled out by the stakes effects we found in the evidence-seeking experiments.[21]

While our experiments do not rule out the possibility that stakes effects could be explained in terms of need-for-closure, Pinillos (2012: 202) ran an experiment in which participants were explicitly told that the subject in the scenario “forms the belief” that P before they are asked to judge how many times the subject has to check before she knows that P. Even with this modification to the evidence-seeking design, Pinillos still found a significant stakes effect on responses to the knowledge prompt, which provides reason to doubt that stakes are affecting knowledge indirectly through affecting belief formation.[22]

As discussed above, Gerken (2017) and Nagel (2008) use mechanisms drawn from cognitive psychology to explain apparent stakes effects in a way that is consistent with “intellectualist invariantism”, the view that practical factors like stakes do not directly affect knowledge. Another approach to making intellectual invariantism compatible with the apparent stakes effects on knowledge is to invoke pragmatic linguistic mechanisms, like conversational implicature, to explain what participants are responding to when asked to judge whether a subject knows something.[23] To the best of our knowledge, no one has proposed a pragmatic explanation of the apparent stakes effect revealed in evidence-seeking experiments. But we did consider one possible pragmatic confound present in the first version of our evidence-seeking experiment, in the form of the presupposition trigger “still” that appeared in the negative prompts in our first evidence-seeking experiment (“How many times can S [check] and still not know that P?”). But we replicated our findings of a stakes effect and framing effect (in which there were far greater numbers of “never” responses in response to negative prompts than to positive prompts) even when “still” was removed from the negative prompts (see §5.5).

### 6.2. Framing Effects and Skepticism

We uncovered a large framing effect on participants’ willingness to say that a subject in a scenario never can know that something is the case. These skeptical responses appeared at a much greater rate when participants responded to negative polarity prompts.

### 6.3. Advantages of Our Methodology

In order to investigate the stakes sensitivity of knowledge, we have incorporated a diverse set of scenarios that vary both what is at stake and how much is at stake. Previous research has predominately incorporated a single pair of vignettes that involve more or less commonplace scenarios. By including a variety of stakes, from extreme cases involving dozens of lives at risk in spectacular circumstances, to less extreme cases involving degrees of embarrassment, as well as vignettes that scale these stakes in magnitude, we can begin to build a finer-grained picture of the stakes sensitivity of knowledge than is possible from previous studies. Aside from being statistically more powerful, the variety of scenarios we employed also speaks to the generality of the effect—where we have consistently found (in the evidence-seeking design) stakes, framing and scalar effects across scenarios we can be more confident that such results are not unforeseen artefacts of the particular vignettes employed.

Stakes effects were elusive in our original evidence-fixed study, although a registered replication of Sripada and Stanley (2012) confirmed that such effects could be found (though the overall pattern of stakes effects on knowledge that we observed in our replication did not match those observed by Sripada and Stanley). In both the original Sripada and Stanley study and our replication, questions about the quality of the evidence available to the protagonist always preceded questioning the participants whether they agreed with a claim about knowledge, potentially contributing to the finding of an effect. In contrast, in our first experiment, where we did not find evidence of a stakes effect, we did not ask participants to assess the quality of evidence available. A direct comparison of evidence-quality question-present versus question-absent conditions is needed to clarify whether this difference contributes to determining whether stakes effects are observed in an evidence-fixed design.[24]

The contrast between the mixed results in evidence-fixed designs and the more consistent results in evidence-seeking designs serves to reinforce the notion that evidence-seeking designs are likely to be more informative experimental tools for further research on stakes effects.

Finally, because we included both negative and positive polarity prompts, we have been able to determine the role played by the polarity of a prompt is affecting judgments about knowledge. Most importantly, if we had not included this positive-negative prompt distinction, then we would not have been able to detect and interpret the large framing effects observed in our evidence-seeking experiments or uncover the stakes effects on “0” and “never” responses that we observed.

Though no single experimental investigation of a stakes effect on knowledge can definitively settle the existence of such an effect, the results of this study provide new reasons to think that such an effect exists.

## Acknowledgments

Thanks to Alexander Dinges, Julia Zakkou, audiences at the UCL Experimental Philosophy Workshop, the Buffalo Experimental Philosophy Workshop, the University of Illinois Chicago, the University of Reading philosophy department and Centre for Cognition Research, and two anonymous referees for very helpful comments on this paper. Funding from the Leverhulme Trust Research Project Grant RPG-2016-193 made this research possible.

## Appendix I: Experiment 1

### 1. Individual Scenario Analyses

For each individual scenario 2 x 4 mixed model Analysis of Variance (ANOVA) was performed on levels of agreement, with polarity (know, doesn’t know) as the between-subjects factor and stakes scale (one [low], two, three, four [high]) as the within-subjects factor. This analysis was replicated with a generalized estimating equation (GEE) using a linear model. Note that, where these were conducted, the non-significant results of multiple comparison follow-up tests in the non-parametric analyses are likely due to the weak global stakes effect (η2 = .03).

#### 1.1. Paramedic Scenario

ANOVA found no main effect of stakes (p =.813) and no interaction of polarity x stakes (p =.333). There was a main effect of polarity, (F(1, 95) = 149.03, p <.001, η2 = .61) with lower levels of agreement for the negative polarity prompts. GEE similarly revealed no main effect of stakes (p =.795) and no interaction between stakes x polarity (p =.500). There was a main effect of polarity, (Wald X2[1] = 154.42, p < .001) as above.

1.2. Vaccine Scenario

ANOVA found no main effect of stakes, (p = .075) and no interaction of polarity x stakes (p =.817). There was a main effect of polarity, (F(1, 95) = 212.46, p < .001, η2 = .69) with lower levels of agreement for the negative polarity prompts. GEE revealed a main effect of stakes, (Wald X2[3] = 8.58, p = .035) but no interaction between stakes x polarity (p =.863). There was a main effect of polarity, (Wald X2[1] = 206.30, p < .001) as above. Follow-up tests (sequential Bonferroni) examining the main effect of stakes were non-significant (ps >.092).

#### 1.3. Mountaineering Scenario

ANOVA found no main of stakes (p = .650) and no interaction of polarity x stakes (p = .776). There was a main effect of polarity (F(1, 95) = 163.29, p < .001, η2 = 63) with lower levels of agreement for the negative polarity prompts. GEE revealed a no main effect of stakes (p = .617) and no interaction between stakes x polarity (p =.789). There was a main effect of polarity, (Wald X2[1] = 147.06, p < .001) as above.

#### 1.4. Game Show Scenario

ANOVA revealed no main of stakes (p = .252) and no interaction of polarity x stakes (p = .513). There was a main effect of polarity (F(1, 95) = 72.55, p < .001, η2 = .43) with lower levels of agreement for the negative polarity prompts. GEE revealed a no main effect of stakes (p = .088) and no interaction between stakes x polarity (p =.417). There was a main effect of polarity, (Wald X2[1] = 71.27, p < .001) as above.

#### 1.5. Introductions Scenario

ANOVA found no main of stakes (p = .055) and no interaction of polarity x stakes (p = .803). There was a main effect of polarity (F(1, 95) = 278.95, p < .001, η2 = .75) with lower levels of agreement for the negative polarity prompts. Follow-up tests (Bonferroni) examining the stakes effect were non-significant (ps >.091). GEE revealed no main effect of stakes (p = .074) and no interaction between stakes x polarity (p =.871). There was a main effect of polarity, (Wald X2[1] = 275.46, p < .001) as above.

#### 1.6. Possessions Scenario

ANOVA found no main of stakes (p = .983) and no interaction of polarity x stakes (p = .954). As expected, there was a main effect of polarity (F(1, 95) = 323.81, p < .001, η2 = .77) with lower levels of agreement for the negative polarity prompts. GEE revealed no main effect of stakes (p = .995) and no interaction between stakes x polarity (p =.931). There was a main effect of polarity, (Wald X2[1] = 308.64, p < .001) as above.

## Appendix II: Registered Replication of Sripada and Stanley (2012)

### 1. Open Science Protocol

Adopting the experimental design of Sripada and Stanley (2012), we preregistered the experiment (background, methods, and power analysis) using the Open Science Framework repository (osf.io/sqeau). Our preregistration was submitted prior to data collection and is accessible to the public.

### 2. Participants

#### 1.3. Missing Values, Normality, and Outliers

Overall, 25 “never” responses were given in the positive polarity condition and 178 “never” responses were given in the negative polarity condition. These responses were removed from main analyses and analysed separately. As in the first evidence-seeking experiment, data were non-normal with responses in the both conditions being positively skewed. Two analyses were subsequently performed across the stakes versions of each scenario; given normality and homogeneity of variance violations in the data, a Generalised Estimating Equation (GEE) (non-parametric equivalent) with stakes (one [low]; two; three; four [high]) as within-subjects factor and polarity (positive; negative) as between-subjects factor) was initially conducted and the results of a second GEE analysis following the removal of extreme outliers are also reported.

#### 1.4. Summary of Descriptive Statistics (prior to outlier removal and following removal)

Across all scenarios, there was a general pattern of participants stating that more evidence would be required by the protagonist in order to know something (positive polarity)/still not know something (negative polarity) as the stakes increased. This pattern of responses remained with extreme outliers removed There was a general pattern of participants stating that more evidence would be required by the protagonist in order to know something (positive polarity)/still not know something (negative polarity) as the stakes increased.

In terms of replicating the stakes effects observed in the first evidence-seeking experiment, we found a main effect of stakes in the paramedic and game show scenarios (see Figure 9). However, we did not observe a stakes effect in the vaccine, mountaineering, introductions, or possessions scenarios. Additionally, across four of the six scenarios, we did not observe significant differences in responses to positive and negative polarity prompts. An effect of polarity in the paramedic and mountaineering scenarios, in which evidence scores were higher for the positive polarity. A full breakdown of analysis by scenario follows this summary.

#### 1.5. Individual Scenario Analyses

##### 1.5.1. Paramedic Scenario

Initial analysis (N = 359, zero values ignored) revealed a main effect of stakes, (Wald X2[3] = 16.12, p =.001) and a significant interaction of polarity x stakes, (Wald X2[3] = 16.26, p =.001). Follow-up tests using sequential Bonferroni revealed no significant differences between stakes scenarios in either polarity. These non-significant follow-up tests are likely due to large variances in the dataset. There was no main effect of polarity, (p = .096). Having extracted extreme outliers (N = 359 to N = 339, zero values ignored), further analysis confirmed a main effect of stakes, (Wald X2[3] = 11.50, p =.009) and a main effect of polarity, was also observed (Wald X2[1] = 4.19, p =.041). There was no significant interaction of polarity x stakes, (p =.933). Follow-up tests using sequential Bonferroni revealed no significant differences between stakes scenarios in either polarity. These non-significant follow-up tests are likely due to a small global effect. Sequential Bonferroni comparisons did reveal that evidence scores were significantly higher in the positive polarity group when compared to the negative polarity group, (p =.043).

##### 1.5.2. Vaccine Scenario

Initial GEE analysis (N = 365, zero values ignored) revealed a main effect of stakes, (Wald X2[3] = 9.65, p =.022), and main effect of polarity, (Wald X2[1] = 14.64, p <.001) and a significant interaction of polarity x stakes, (Wald X2[3] = 19.46, p <.001). Follow-up tests using sequential Bonferroni revealed no significant differences between stakes scenarios in either polarity. These non-significant follow-up tests are likely due to large variances in the dataset. Having extracted extreme outliers (N = 365 to N = 325), further analysis found no main effect of stakes, (p =.087), no main effect of polarity, (p =.100) and no significant interaction of polarity x stakes, (p =.063).

##### 1.5.3. Mountaineering Scenario

Initial analysis (N = 389, zero values ignored) revealed a main effect of stakes, (Wald X2[3] = 32.62, p <.001) and a main effect of polarity, (Wald X2[1] = 4.05, p =.044). There was no interaction of polarity x stakes, (p = .083). There was a main effect of polarity. Comparisons using sequential Bonferroni found no significant differences between the polarities. In terms of differences between the stakes scenarios, sequential Bonferroni comparisons found no significant differences between stakes scenarios. These non-significant follow-up tests are likely due to large variances in the dataset. Having extracted extreme outliers (N = 389 to N = 362, zero values ignored), another GEE analysis found no main effect of stakes, (p =.112) and no interaction of polarity x stakes, (p =.514). There was a main effect of polarity, (Wald X2[1] = 7.61, p =.006) with evidence scores higher in the positive polarity group when compared to the negative polarity, (p =.005).

##### 1.5.4. Game Show Scenario

Initial analysis (N = 337, zero values ignored) found no main effect of stakes, (p =.377), no main effect of polarity, (p =.625) and no interaction of polarity x stakes, (p = .057). Having extracted extreme outliers (N = 337 to N = 313, zero values ignored), further GEE analysis revealed a main effect of stakes, (Wald X2[3] = 10.24, p =.017). There was no main effect of polarity, (p =.556) and no interaction of polarity x stakes, (p =.757). Comparisons using sequential Bonferroni indicated that there was a significant difference between the stakes 1 [low] scenario and the stakes 3 scenario (p =.013) and the stakes 1 [low] scenario and the stakes 4 [high] scenario (p =.016). These effects were across polarity.

##### 1.5.5. Introduction Scenario

An initial GEE analysis (N = 375, zero values ignored) revealed a significant interaction of polarity x stakes, (Wald X2[3] = 9.28, p =.026). There was no main effect of polarity, (p =.130) and no main effect of stakes, (p = .093). Comparisons using sequential Bonferroni comparisons found no significant differences between stakes scenarios. After extracting extreme outliers (N = 375 to N = 355), another GEE analysis found no main effect of stakes, (p =.961), no main effect of polarity, (p =.168) and no interaction of polarity x stakes, (p =.756).

##### 1.5.6. Possessions Scenario

An initial GEE analysis was performed using a poisson[25] (log link) model (N = 381). Analysis revealed a main effect of stakes, (Wald X2[3] = 92.12, p <.001) and a significant interaction of polarity x stakes, (Wald X2[3] = 47.71, p <.001). There was no main effect of polarity, (p = .061). Comparisons using sequential Bonferroni found no differences between stakes scenarios in either polarity. These non-significant follow-up tests are likely due to large variances in the dataset. Having extracted extreme outliers (N = 336, zero values excluded), another GEE analysis with gamma (log link) model was performed on the evidence scores. Analysis revealed no main effect of stakes, (p =.119) and no interaction of polarity x stakes, (p =.989). There was a main effect of polarity, (Wald X2[1] = 4.47, p =.034) with higher levels of evidence given in response to positive polarity prompts.[26]

### 2. Matched Experiment

#### 2.1. Materials and Procedure

The same set of scenarios were presented to participants in the same randomized block design as in Experiment 2. For this follow-up experiment, the same set of symmetrical prompts were used in both the positive and negative polarity groups. However, the additional option to write “0” following the positive polarity prompts, was removed:

Original

enter a whole number: 1, 2, 3... etc. If you think Elaine knows without having to check, write "0". If you think Elaine will never know no matter how many times she checks, write "never"

Modified

enter a whole number: 1, 2, 3... etc. If you think Elaine will never know no matter how many times she checks, write "never"

#### 2.2. Participants

##### FOUR (HIGH)

Tracy is taking part in a game show that involves answering general knowledge trivia questions. The game show host has asked Tracy, “What is the capital of Tanzania?”. Tracy has recently read a list of the most obscure world capitals and the city “Dodoma” pops into her head. In fact, Tracy is right: the capital of Tanzania is Dodoma. As this is the final round of the game show, $1,000,000 is at stake: answering this question correctly will result in Tracy winning$1,000,000 and answering incorrectly will result in her losing \$1,000,000.

##### Prompts:

+ How many minutes does S need to spend considering her answer before she knows that the capital of Tanzania is Dodoma?

- How many minutes can S spend considering her answer and still not know that the capital of Tanzania is Dodoma?

#### Introduction: reputation

##### ONE (LOW)

Siena teaches at a university and has been asked to introduce a guest speaker to her colleagues over lunch. There are 5 colleagues present at lunch. Siena wrote down the speaker’s name—“Dr. Woodbridge”— in her notebook earlier in the day. But if Siena introduces the guest speaker by the wrong name, she will feel slightly embarrassed in front of her colleagues.

##### TWO

Jane teaches at a university and has been asked to introduce a guest speaker to her colleagues during a seminar. There are 20 colleagues present at the seminar. Jane wrote down the speaker’s name—“Dr. Woodbridge”— in her notebook earlier in the day. If Jane introduces the guest speaker by the wrong name, she will feel embarrassed in front of her colleagues and it will reflect badly on her professional capabilities.

##### THREE

Agnes teaches at a university and has been asked to introduce a guest speaker to her colleagues and members of the public during a public lecture. There are 200 people present at the public lecture. Agnes wrote down the speaker’s name—“Dr. Woodbridge”— in her notebook earlier in the day. If Agnes introduces the guest speaker by the wrong name, she will feel very embarrassed in front of the audience and it will reflect very badly on her professional capabilities.

##### FOUR (HIGH)

Nicole teaches at a university and has been asked to introduce a guest speaker on national television as part of a live interview. The interview will be viewed live by thousands of people. Nicole wrote down the speaker’s name—“Dr. Woodbridge”— in her notebook earlier in the day. If Nicole introduces the guest speaker by the wrong name, she will feel very embarrassed in front of a live television audience and it will reflect very badly on her professional capabilities and on her university’s reputation.

##### Prompts:

+ How many minutes does S need to check her notebook before she knows that the guest speakers name is “Dr. Woodbridge”?

- How many minutes can S check her notebook and still not know that the guest speakers name is “Dr. Woodbridge”?

#### Arson: personal value

##### ONE (LOW)

Natalie is living in an area where there have been a series of fires set by arsonists recently. Only a functioning sprinkler system can stop a fire set by an arsonist. A week ago, Natalie checked that the sprinklers were working in her storage room, which contains her garbage and recycling. If the sprinklers do not work, everything in the room is at risk from arson. But the sprinklers in the room are fully functioning.

##### TWO

Winnie is living in an area where there have been a series of fires set by arsonists recently. Only a functioning sprinkler system can stop a fire set by an arsonist. A week ago, Winnie checked that the sprinklers were working in her living room, which contains Winnie’s laptop and hard drive containing all her family photos. If the sprinklers do not work, everything in the room, including her laptop and hard drive, is at risk from arson. But the sprinklers in the room are fully functioning.

##### THREE

Becky is living in an area where there have been a series of fires set by arsonists recently. Only a functioning sprinkler system can stop a fire set by an arsonist. A week ago, Becky checked that the sprinklers were working in her spare bedroom, where the family dog sleeps. If the sprinklers do not work, everything in the room, including the family dog, is at risk from arson. But the sprinklers in the room are fully functioning.

##### FOUR (HIGH)

Kylie is living in an area where there have been a series of fires set by arsonists recently. Only a functioning sprinkler system can stop a fire set by an arsonist. A week ago, Kylie checked that the sprinklers were working in her nursery room, where her baby sleeps. If the sprinklers do not work, everything in the room, including her baby, is at risk from arson. But the sprinklers in the room are fully functioning.

##### Prompts:

+ How many times does S need to check the sprinklers before she knows that they are working in the X room?

- How many times can S check the sprinklers and still not know that they are working in the X room?

### Prompt variations

#### Symmetrical prompts (for evidence-seeking):

In a second experiment, we use these symmetrical prompts (to remove any presuppositions that might be triggered by “and still not know” in the negative prompts in the first experiment).

##### Paramedic

+ What is the minimum number of times S needs to check her GPS before she knows that she will make it to the accident without taking a wrong turn?

- What is the maximum number of times S can check her GPS and not know that she will make it to the accident without taking a wrong turn?

##### Vaccine

+ What is the minimum number of times S needs to consult her check list before she knows that she is making the vaccine correctly?

- What is the maximum number of times S can consult her check list and not know that she is making the vaccine correctly?

##### Mountaineering

+ What is the minimum number of times S needs to inspect the rope before she knows that it is tied securely?

- What is the maximum number of times S can inspect the rope and not know that it is tied securely?

##### Game show

+ What is the minimum number of minutes S needs to spend considering her answer before she knows that the capital of Tanzania is Dodoma?

- What is the maximum number of minutes S can spend considering her answer and not know that the capital of Tanzania is Dodoma?

##### Introduction

+ What is the minimum number of times S needs to check the sprinklers before she knows that they are working in the X room?

- What is the maximum number of times S can check the sprinklers and not know that they are working in the X room?

##### Arson

+ What is the minimum number of times S needs to check the sprinklers before she knows that they are working in the X room?

- What is the maximum number of times S can check the sprinklers and not know that they are working in the X room?

Note: in all evidence-seeking experiments we include the additional instructions – if you think S knows without having to check, write “0”. If you think S can never know no matter how many times she checks, write “never”.

#### Prompts (for evidence-fixed experiments):

For experiment one, we used the traditional approach of asking participants the extent to which they agree or disagree with knowledge claims:

To what extent do you agree or disagree with the following claim:

##### Paramedic

+Subject x [specific to scenario] knows that she will make it to the accident without taking a wrong turn.

- Subject x [specific to scenario] doesn’t know that she will make it to the accident without taking a wrong turn.

##### Vaccine

+Subject x [specific to scenario] knows that she is making the vaccine correctly.

- Subject x [specific to scenario] doesn’t know that she is making the vaccine correctly.

##### Mountaineering

+Subject x [specific to scenario] knows that the rope is tied securely.

- Subject x [specific to scenario] doesn’t know that the rope is tied securely.

##### Game show

+Subject x [specific to scenario] knows that the capital of Tanzania is Dodoma.

- Subject x [specific to scenario] doesn’t know that the capital of Tanzania is Dodoma.

##### Introduction

+Subject x [specific to scenario] knows that the guest speaker’s name is “Dr Woodbridge”.

- Subject x [specific to scenario] doesn’t know that the guest speaker’s name is “Dr Woodbridge”.

##### Arson

+Subject x [specific to scenario] knows that the sprinklers are working in the x room [specific to scenario].

- Subject x [specific to scenario] doesn’t know that the sprinklers are working in the x room [specific to scenario].

## Notes

1. See Gerken (2017: §2.5.b) for a longer summary of existing studies on practical effects on knowledge (stakes effects are one variety of practical effects—see §2 below for discussion).

2. The dataset from these experiments is available here: http://dx.doi.org/10.17864/1947.205.

3. Anderson and Hawthorne go on to problematize the notion of “stakes sensitivity”, but for our current purposes of evaluating the empirical evidence for theories that invoke the notion of stakes sensitivity, we will bracket their criticisms while exploiting their helpful precisified notion of stakes. See Armendt (2019) for additional dimensions (“odds” and “shape”) along which stakes can vary beyond their size.

4. The controls were simplified versions of the coin flipping scenario used in Horvath and Wiegman (2016), Swain, Alexander, and Weinberg (2008), and Weinberg, Nichols, and Stich (2001).

5. This is the only reversed Likert scale of all experiments reported in this paper.

6. Given recent concerns and emerging evidence that the integrity of MTurk-based studies has been compromised by bots or responses from individuals using Virtual Private Servers (VPS) (faking their location), screening procedures were performed by identifying identical GPS locations with unique IP addresses, determining whether IP addresses derived from an Internet Service Provider (ISP) or data center, and evaluating open-ended responses against a set of criteria (for full details regarding this procedure see Dennis, Goodson, & Pearson 2018).

7. This possibility is noted by Gerken (2017: 267 n. 8).

8. We did not include the option to write “0” if you think S knows without having to check in the negative polarity condition because the response sounded odd in response to the negative prompt.

9. More precisely, we expect that responses to the negative polarity prompts should have values one lower than responses to the positive polarity prompts. If S needs to check N times before she knows that P, then the maximum number she can check and still not know that P would be N-1. But we expected that we wouldn’t be able to detect this difference, even if it does exist. Thanks to an anonymous referee for asking about this.

10. The data summarised here follows outlier removal. For details regarding how outliers were removed and for replications of all analyses prior to outlier removal, see Appendix III.

11. When the distribution of the dependent variable (in this case, amount of evidence required) is specified as a gamma distribution, any values that are less than or equal to 0 are not used in subsequent analysis.

12. Pinillos and Simpson (2014: 40 n. 18) suggest that “never” responses “may indeed reveal a skeptical attitude toward the possibility of knowledge”.

13. Analysis GEE, poisson [loglinear]) model (count data) revealed a main effect of polarity, (Wald X2[1] = 34.33, p <.001). There was no main effect of stakes, (p =.522). The interaction of polarity x stakes was not significant, (Wald X2[3] = 7.28, p =.064).

14. Analysis (GEE, poisson [loglinear]) revealed a main effect of stakes, (Wald X2[3] = 11.02, p =.012) and a significant interaction of polarity x stakes, (Wald X2[3] = 8.65, p =.034). There was no main effect of polarity, (p =.168). When interpreting the interaction, comparisons using sequential Bonferroni indicated a significant difference between the zero counts in the stakes 1 [low] scenarios and the stakes 3 scenarios (p =.031) and between the stakes 1 [low] scenarios and the stakes 4 [high] scenarios (p =.031) with a lower number in the higher stakes scenarios. This effect was present for the positive polarity only.

15. When you ask someone how many times one can check something and still not know it, it’s reasonable to consider the fact that one can check as often as one likes and still not know it if, for example, one is checking carelessly. That is one way of understanding how the negative prompt encourages a greater frequency of "never" responses, by making salient the ever-present possibility of error. In contrast, when you ask someone how often one needs to check something before one knows it, it’s reasonable to consider the fact that one might not need to check at all because there are other ways of knowing it besides checking. That could explain why the positive prompt encourages a greater frequency of “0” responses. Thanks to Alexander Dinges for discussion of this issue.

16. In order to investigate stakes effects on these responses, a GEE analysis with poisson (loglinear) model was performed on the frequency of “never” responses in the follow-up experiment using symmetrical prompts. Analysis revealed a main effect of polarity, (Wald X2[1] = 20.82, p <.001), a main effect of stakes, (Wald X2[3] = 14.65, p =.002), and a significant interaction of polarity x stakes, (Wald X2[3] = 7.88, p =.049) (see Figure 7).

17. A referee asks whether “never” responses could be consistent with the Buckwalter and Schaffer view, if such responses were understood as meaning that the subject should never perform the relevant action (checking the rope to see if it’s secured in the climbing scenario, e.g.). We can’t rule out this interpretation, but we find it implausible that participants who respond with “never” would say that the subject should never perform the relevant action.

18. We did find evidence of a small effect of stakes on knowledge when replicating Sripada and Stanley’s (2012) evidence-fixed experiment.

19. Thanks to an anonymous referee for suggesting this possibility.

20. Our finding comports with the finding of a stakes effect on knowledge “retractions”—which also can’t be explained away as the result of a stakes effect on a deontic modal—described in Dinges and Zakkou (2019).

21. Thanks to an anonymous referee for raising this objection.

22. Pinillos’s modified experiment does not rule out the possibility that stakes are affecting knowledge by affecting confidence, however. See Bach (2005: §V) for defense of that possibility.

23. Brown (2006) and Rysiew (2007) offer such pragmatic explanations of patterns of judgments that might appear to lend support to epistemic contextualism (see Hansen & Chemla 2013; and Grindrod, Andow, & Hansen 2018 for experimental evidence that such patterns exist); Dinges (2018) and Stoutenberg (2017) are recent challenges to such pragmatic explanations of patterns of contextualist judgments.

24. Sripada and Stanley (2012: 7) argue that the quality of evidence question is needed to focus participants’ attention on the question of evidence in the knowledge question. Without it, they think, participants may be inclined to focus only on the factivity of the knowledge question.

25. Gamma distribution was not used in this instance as the data violated assumptions for analysis (errors in computing the inverse log-link function).

26. Results replicated using Poisson distribution (log function).

27. Note that no GEE analysis was performed for zero responses in this experiment as the response option was removed to create the matched design (i.e., If you think S knows without having to check, write “0”).