Current projects
Resilience
No abstract yet. I’ve just been reading and thinking a lot about the concept of resilience.
On bias-in, bias-out (precise title redacted for blind review)
Abstract. Discussions of algorithmic bias often assume, without reflection, that biases in predictive outcomes reflect biases in society. That is, machine learning algorithms are biased because they are bias-preserving processes. This paper challenges this assumption by pointing to a tension between the philosophy of data literature, which largely rejects the idea that data possess essential features that are always preserved during analysis, and the claim that social biases are easily preservable despite attempts at getting rid of them.
What does model interpretability tell us about its plausibility?
One diagnosis of the replication crisis in psychology is the lack of robust theory-building behind the models being tested. The idea is that, because of inherent noisiness in the data generating process, practical limitations in study design, over-flexibility of the statistical methods, etc., the fact that a model fits the data is not enough evidence for believing that the model is true or close to truth (e.g. Eronen & Bringmann, 2021; Fried, 2020; Yarkoni, 2020). One proposed remedy has been that we ought to establish a model’s prior plausibility before testing it (e.g. Muthukrishna & Henrich, 2019; Scheel et al., 2021; van Rooij & Baggio, 2021). This is done by interpreting a model, which is a mathematical object, as representing a theory, which is in the language of science, and assessing the prior plausibility of this theory by assessing its theoretical merits and connecting it to findings from adjacent fields.
This project examines the plausibility of this line of reasoning. I present arguments to the effect that model interpretability: (1) does not establish the prior probability we ought to give a model; (2) does not signal desirable model features such as simplicity or falsifiability; (3) does not track usability of a model once it is accepted. This is not to say that these aren’t important goals or that model interpretability is unimportant. Instead, if we think that some models are true and fit-to-data is a good (though fallible) way of finding out what is true, then we are, generally speaking, not justified in thinking that we can arrive at an interpretation of the model such that, if this model were true, it would be true under this interpretation, without already knowing that it is in fact true.
Peculiar scientific disagreements and the person-situation debate
I am working on a project in collaboration with Mike Schneider on a particular kind of scientific disagreements where (1) both sides agree that the other side is scientific (as opposed to pseudo-scientific) and that this is a scientific debate; (2) both sides recognize that they do not have a knock-down argument or obvious empirical evidence that should convince the other side once and for all; but (3) both sides nevertheless do not think that this debate presents a serious challenge to their own scientific project. This research has led me to read a lot about the person-situation debate in psychology that occurred around the 1970s. It’s cool! Ask me about it!
Past projects
What Are Statistical Modeling Assumptions About? An Answer From Perspectival Pluralism (published, 2025)
Abstract: This paper presents a perspectivist framework for understanding and evaluating statistical assumptions. Drawing on the thesis of perspectivism from the philosophy of science, this framework treats statistical assumptions not as empirical hypotheses which are descriptively accurate or inaccurate about the world but as prescribing a particular perspective from which statistical knowledge is generated. What this means is that we ought not judge statistical models solely by how closely they correspond with the world as we independently understand it, but by whether they paint a picture of the world that is epistemically significant.
This paper is published Open Access in Harvard Data Science Review.
Measuring the non-existent: validity before measurement (published, 2023)
Abstract: This paper examines the role existence plays in measurement validity. I argue that existing popular theories of measurement and of validity follow a correspondence framework, which starts by assuming that an entity exists in the real world with certain properties that allow it to be measurable. Drawing on literature from the sociology of measurement, I show that the correspondence framework faces several theoretical and practical challenges. I suggested the validity-first framework of measurement, which starts with a practice-based validation process as the basis for a measurement theory, and only posits objective existence when it is scientifically useful to do so.
This paper is published Open Access in Philosophy of Science.
I made a video abstract for this paper, available here
Sample Representation in the Social Sciences (published 2021)
Abstract: The social sciences face a problem of sample nonrepresentation, where the majority of samples consist of undergraduate students from Euro-American institutions. The problem has been identified for decades with little trend of improvement. In this paper, I trace the history of sampling theory. The dominant framework, called the design-based approach, takes random sampling as the gold standard. The idea is that a sampling procedure that is maximally uninformative prevents samplers from introducing arbitrary bias, thus preserving sample representation. I show how this framework, while good in theory, faces many challenges in application. Instead, I advocate for an alternative framework, called the model-based approach to sampling, where representative samples are those balanced in composition, however they were drawn. I argue that the model-based framework is more appropriate in the social sciences because it allows for systematic assessment of imperfect samples and methodical improvement in resource-limited scientific contexts. I end with practical proposals of improving sample quality in the social sciences.
A post-peer-review, pre-copyedit version of this article, published in Synthese, can be found here. The final authenticated version is available online at: http://dx.doi.org/10.1007/s11229-020-02621-3
A Statistical Learning Approach to a Problem of Induction (unpublished)
Abstract: One “easier” form of the problem of induction questions our ability to pick out true regularities in nature, using limited data, with the assumption that such regularities do exist. Harman and Kulkarni (2012) take this problem to be a challenge on our ability to identify precise conditions under which the method of picking hypotheses based on limited datasets is or is not reliable. They identify an influential result from statistical learning theory, hereafter referred to as the VC theorem (Vapnik and Chervonenkis, 2015), which states that, under the condition that the starting hypotheses set has finite VC dimension, the hypothesis chosen from it converges to the true regularity as the size of the dataset goes to infinity.
This result seems to provide us with a condition (i.e., having finite VC dimension) under which a method (i.e., choosing a hypothesis based on its performance over data), is reliable. Indeed, Harman and Kulkarni take this result to be an answer to the form of the problem of induction they have identified. This paper examines this claim. By discussing the details of how VC theorem may be construed as an answer and the connection between VC theorem in statistical learning theory and the NIP property in model theory, I conclude that the VC theorem cannot give us the kind of general answers needed for Harman and Kulkarni’s response to the problem of induction.
A shorter version of the draft that was presented in the 2018 PSA meeting can be found here.
Intuitionistic Probabilism in Epistemology (unpublished)
Abstract: This paper examines the plausibility of a thesis of probabilism that is based on intuitionistic logic and exposits the difficulties faced by such a program. The paper starts by motivating intuitionistic logic as the logic of investigation along a similar reasoning as Bayesian epistemology. It then considers two existing axiom systems for intuitionistic probability functions — that of Weatherson (2003) and of Roeper and Leblanc (1999) — and discusses the relationship between the two. It will be shown that a natural adaptation of an accuracy argument in the style of Joyce (1998) and de Finetti (1974) to these systems fails. The paper concludes with some philosophical reflections on the results.
The paper has not been published. You can read a draft of it here. It was presented at the 2018 Philosophy of Logic, Mathematics, and Physics Graduate Conference (LMP) at the University of Western Ontario.