Current projects
Resilience
I’ve been reading and thinking a lot about the concept of resilience across different disciplines. I am presenting something related to this at the 2026 online Pacific APA. Abstract to come.
Data, Data Quality, and the Garbage In Garbage Out Principle
Data quality, or the lack thereof, is often blamed for inferential failures. But what is data quality? This paper pulls together multiple threads in the philosophy of data and scientists’ complaints about data quality under the banner “garbage in, garbage out”. I show how there exist conflicting considerations in theorizing about data quality that pull in different directions. I conclude that we ought to be cautious in our discussions of data quality as what a data gatherer mean by good data may, justifiably, be very different from what a data user expects.
I am presenting this project at the 2026 Central APA.
On bias-in, bias-out (precise title redacted for blind review)
Abstract. Discussions of algorithmic bias often assume, without reflection, that biases in predictive outcomes reflect biases in society. That is, machine learning algorithms are biased because they are bias-preserving processes. This paper challenges this assumption by pointing to a tension between the philosophy of data literature, which largely rejects the idea that data possess essential features that are always preserved during analysis, and the claim that social biases are easily preservable despite attempts at getting rid of them.
What is model interpretability good for?
When scientists fit statistical models to a set of data, it is often the case that there are still multiple options available even after the standard statistical criteria (fit, parsimony, etc.) have been applied. One strategy to break such ties is to interpret the model in non-statistical language by, for example, reading into the content of the variables or connecting the model with existing theory. Some have justified this strategy along the lines that research ought to be theory-driven, while others worry that this is a dressed up version of the old HARKing problem (“hypothesizing after results are known”). In this project, I evaluate the two options and reject them both. On the one hand, I do not think that model interpretability serves the function expected by theorists. On the other hand, I do not think it is quite the same as HARKing or other questionable research practices.
I presented this project at the 2025 EPSA. It’s currently still just ideas in my head. I’m happy to chat about it.
Complimentary science and the person-situation debate
I am working on a project in collaboration with Mike Schneider on a particular kind of scientific disagreements where (1) both sides agree that the other side is scientific (as opposed to pseudo-scientific) and that this is a scientific debate; (2) both sides recognize that they do not have a knock-down argument or obvious empirical evidence that should convince the other side once and for all; but (3) both sides nevertheless do not think that this debate presents a serious challenge to their own scientific project. This research has led me to read a lot about the person-situation debate in psychology that occurred around the 1970s. It’s cool! Ask me about it!
Past projects
What Are Statistical Modeling Assumptions About? An Answer From Perspectival Pluralism (published, 2025)
Abstract: This paper presents a perspectivist framework for understanding and evaluating statistical assumptions. Drawing on the thesis of perspectivism from the philosophy of science, this framework treats statistical assumptions not as empirical hypotheses which are descriptively accurate or inaccurate about the world but as prescribing a particular perspective from which statistical knowledge is generated. What this means is that we ought not judge statistical models solely by how closely they correspond with the world as we independently understand it, but by whether they paint a picture of the world that is epistemically significant.
This paper is published Open Access in Harvard Data Science Review.
Measuring the non-existent: validity before measurement (published, 2023)
Abstract: This paper examines the role existence plays in measurement validity. I argue that existing popular theories of measurement and of validity follow a correspondence framework, which starts by assuming that an entity exists in the real world with certain properties that allow it to be measurable. Drawing on literature from the sociology of measurement, I show that the correspondence framework faces several theoretical and practical challenges. I suggested the validity-first framework of measurement, which starts with a practice-based validation process as the basis for a measurement theory, and only posits objective existence when it is scientifically useful to do so.
This paper is published Open Access in Philosophy of Science.
I made a video abstract for this paper, available here
Sample Representation in the Social Sciences (published 2021)
Abstract: The social sciences face a problem of sample nonrepresentation, where the majority of samples consist of undergraduate students from Euro-American institutions. The problem has been identified for decades with little trend of improvement. In this paper, I trace the history of sampling theory. The dominant framework, called the design-based approach, takes random sampling as the gold standard. The idea is that a sampling procedure that is maximally uninformative prevents samplers from introducing arbitrary bias, thus preserving sample representation. I show how this framework, while good in theory, faces many challenges in application. Instead, I advocate for an alternative framework, called the model-based approach to sampling, where representative samples are those balanced in composition, however they were drawn. I argue that the model-based framework is more appropriate in the social sciences because it allows for systematic assessment of imperfect samples and methodical improvement in resource-limited scientific contexts. I end with practical proposals of improving sample quality in the social sciences.
A post-peer-review, pre-copyedit version of this article, published in Synthese, can be found here. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11229-020-02621-3
A Statistical Learning Approach to a Problem of Induction (unpublished)
Abstract: One “easier” form of the problem of induction questions our ability to pick out true regularities in nature, using limited data, with the assumption that such regularities do exist. Harman and Kulkarni (2012) take this problem to be a challenge on our ability to identify precise conditions under which the method of picking hypotheses based on limited datasets is or is not reliable. They identify an influential result from statistical learning theory, hereafter referred to as the VC theorem (Vapnik and Chervonenkis, 2015), which states that, under the condition that the starting hypotheses set has finite VC dimension, the hypothesis chosen from it converges to the true regularity as the size of the dataset goes to infinity.
This result seems to provide us with a condition (i.e., having finite VC dimension) under which a method (i.e., choosing a hypothesis based on its performance over data), is reliable. Indeed, Harman and Kulkarni take this result to be an answer to the form of the problem of induction they have identified. This paper examines this claim. By discussing the details of how VC theorem may be construed as an answer and the connection between VC theorem in statistical learning theory and the NIP property in model theory, I conclude that the VC theorem cannot give us the kind of general answers needed for Harman and Kulkarni’s response to the problem of induction.
A shorter version of the draft that was presented in the 2018 PSA meeting can be found here.
Intuitionistic Probabilism in Epistemology (unpublished)
Abstract: This paper examines the plausibility of a thesis of probabilism that is based on intuitionistic logic and exposits the difficulties faced by such a program. The paper starts by motivating intuitionistic logic as the logic of investigation along a similar reasoning as Bayesian epistemology. It then considers two existing axiom systems for intuitionistic probability functions — that of Weatherson (2003) and of Roeper and Leblanc (1999) — and discusses the relationship between the two. It will be shown that a natural adaptation of an accuracy argument in the style of Joyce (1998) and de Finetti (1974) to these systems fails. The paper concludes with some philosophical reflections on the results.
The paper has not been published. You can read a draft of it here. It was presented at the 2018 Philosophy of Logic, Mathematics, and Physics Graduate Conference (LMP) at the University of Western Ontario.