Research
Abstract: Perception solves computationally demanding problems at lightning fast speed. It recovers sophisticated representations of the world from degraded inputs, often in a matter of milliseconds. Any theory of perception must be able to explain how this is possible; in other words, it must be able to explain perception's computational tractability. One of the few attempts to move toward such an explanation has been the information encapsulation hypothesis, which posits that perception can be fast because it keeps computational costs low by forgoing access to information stored in cognition. I argue that we have no compelling reason to believe that encapsulation explains (or even contributes to an explanation of) perceptual tractability, and much reason to doubt it. This is because there exist much deeper computational challenges for perception than information access, and these threaten to make the costs of access irrelevant. If this is right, it undermines a core computational motivation for encapsulation and sends us back to the drawing board for explanations of perceptual tractability.
Abstract: Seeing is fast and thinking is slow. This claim often appears in arguments aiming to identify mental processes as either cognitive or perceptual or to defend the existence of a divide between the two. But is it true, and if so, why? In this paper, I look at the evidence for this widely held belief and develop a potential computational explanation for it. I then show how the thesis sheds light on some otherwise puzzling phenomena in cognitive science and neuroscience. If this picture is right, it helps us understand one important way in which perception and cognition differ.
Foundations for a Science of Central Cognition (Draft available upon request)
Abstract: People are able to navigate a world of tremendous complexity. We keep track of many things that touch our lives, from the aesthetic preferences of a partner to the evidence for scientific theories. When we see something on the news we can recognize consequences for disparate parts of our lives, from our financial decisions to the well being of a close friend. We put that information to use in plans that succeed often enough to shape the world around us. How do we do this? One of the chief challenges in answering this is taming the massive computational costs that arise as we approach the problem. (The challenge is sufficiently daunting that many thinkers have concluded doing so is impossible.) I argue that certain methods in contemporary AI get us part of the way towards a solution, overcoming key barriers while also fall short in important ways. I present one way that these methods could be used as part of a larger system capable of more fully addressing the challenge. I argue that these methods put us in a position to model domain-general human cognition in a way that was not possible before.
Abstract: Contemporary AI systems now use human language and other senses to solve many tasks that people do on short time scales, such as generating a block of text or identifying the contents of an image. These same systems struggle however, to learn and integrate information in a way that coheres with their background beliefs or to plan a series of actions that yield a goal over longer time horizons. People do these things surprisingly well in comparison. We discuss how the principles that allow for sustained and approximately coherent human cognition and can inform future AI research aimed at automating these abilities.
Research Statement (Incl. Future Research):
When people learn of an event in the news, it can influence their voting behavior, their investment decisions, or their vacation plans, often in sensible ways. But how are people able to recognize such wide ranging consequences of new information, when the potential connections are nearly endless? Similarly, when people open their eyes, they effortlessly see the 3D world before them, despite the fact that the light hitting the retina is compatible with an infinite number of different 3D scenes. How do we do it? In both cases the space of possibilities is vast, but must be navigated quickly in order to see and think. The challenge of doing so is so great that, when viewed through the lens of theoretical computer science, the problems the brain solves seem like they should be impossible. Some have taken this to prove that the computational theory of mind must be false (this view that mental processes are computational processes and the foundation for modern cognitive science). Others have argued that the challenge shows that the dream of human-like AI is impossible. My work explores this challenge, the challenge of computational tractability, and uses it to shed light on philosophical questions about how the mind works and why it works that way. I anticipate my future research will extend this program in several ways, as well as explore applications of the resulting insights for non-ideal epistemology.
My written work to-date consists of a set of closely related papers on computational tractability and ‘cognitive architecture’ (the study of the large-scale structure of the mind, including its division into parts such as perception and cognition). The first of these, published in The Philosophical Review as ‘How is Perception Tractable?’, develops a framework for thinking about the computational costs of mental operations, generalizing concepts in theoretical computer science so as to be applicable to arguments in philosophy and cognitive science. It then uses this framework to undermine a key motivation for a popular view in philosophy of mind and perceptual psychology – the view that what we think cannot influence what we see. Prior work had assumed that partitioning the mind into distinct parts or ‘modules,’ defined by limits on the information they can access, was essential to explaining the tractability of mental processes like perception (e.g. seeing and hearing) and cognition (e.g. reasoning and planning). This paper argues that this classical concept of modularity does not do the work that many had hoped it would viz computational costs. At the same time, computational tractability considerations do motivate a very different set of views about the relationship between perception and cognition, including new senses of modularity in the case of perception.
A second paper picks up on this thread and develops a positive view of how perception is tractable. It approaches the issue obliquely by first asking, ‘why is seeing so much faster than thinking?’ Despite the apparent computational demands, seeing and hearing happen in fractions of a second, much faster than comparable processes in cognition. An explanation of why this difference exists stands to illuminate the unique computational strategy that perception employs to deliver tractability. I argue that a particularly important part of this strategy is reliance on memory. In the case of perception, repeated exposure to a circumscribed task (say, recovering the 3D scene from a 2D retinal projection) means that a significant portion of the computational work can be offloaded to prior information about what typical solutions to relevantly similar problem instances tend to take. In contrast, the need for flexibility in cognition prevents it from using a similar strategy, at least in canonical cases. Using information theory and methods from AI, I make this key insight precise and tie it to its implementation in perception. I then show how this perspective can help us make sense of otherwise puzzling findings in the cognitive neuroscience of vision.
A third and fourth paper examine the tractability of cognition. Cognition poses unique challenges not seen in perception because of its flexibility, holism, and open-endedness. People can reason and plan about seemingly arbitrary contents, make connections between seemingly arbitrary sets of beliefs, and do so in a way that is highly effective at accomplishing our goals. Accomplishing this requires zeroing in on relevant considerations from among unfathomably many possible considerations. Classic arguments in the philosophy of cognitive science and AI hold that this simply cannot be done within a computational framework. If that’s right, it follows that the mind is not a machine and human-like AI is impossible. In a first project I revisit these impossibility arguments and argue that recent breakthroughs in AI show critical premises to be false. In particular, Large Language Models (machine learning systems trained on vast amounts of text to predict words based on their context) show that the relevance can, in fact, be tractably computed. This allows us to defeat classic impossibility arguments and begin to see how we might reconcile the computational theory of mind with human cognitive performance. I argue that the existence of tractable methods for computing relevance clears a long-standing hurdle for cognitive science and AI, making it possible for the first time to explore classes of computational models that could plausible reproduce human reasoning. This opens up a new era in the science of cognition, with the opportunity to answer questions about human cognition that were insoluble for most of the lifetime of AI and Cognitive Science.
A following paper takes up this project and explores two key hypotheses about how human reasoning might be realized. The first is what we might call the ‘Pure Deep Learning Hypotheses,’ that reasoning and planning are best modeled by large scale neural networks, of which the Large Language Models (LLMs) discussed above are the best candidates. I draw on a large body of empirical work on LLMs’s abilities to argue that these models show systematic differences from people in the domain of normative reasoning, i.e. reasoning and planning using operations with a clear normative justification. If this is right, it motivates the search for alternatives to the Pure Deep Learning Hypothesis. In that spirit, I draw together several strands of work in AI to propose a novel architecture for human cognition that I argue can cash in on key strengths of LLMs (in particular, their capacity for relevance) while avoiding their shortcomings in the domain of reasoning. The core idea behind the architecture is that the mind works by building small, tractable cognitive models on the fly in response to task demands. These cognitive models then support reasoning and planning using familiar algorithms, which are computationally tractable at small scale. I argue that an architecture of this kind can square the mind’s capacity for relevance with its ability to reason. I end by showing how an account of this kind helps us make sense of the successes of Bayesian models in cognitive science.
I anticipate that my future work will fall into two broad strands, the first further pursuing these themes of using theoretical computer science and AI to understand the mind, the second using these insights to draw lessons for non-ideal epistemology.
Computational Philosophy of Cognitive Science:
In the work above I argued that people build models of small parts of the world in order to think. If this view is on the right track, it explains how people deliver a certain kind of local coherence in their reasoning and planning. But people do better – we undertake large planning and reasoning projects that far outstrip what we can hold in mind at once. To do this, people must coordinate the synthesis of a series of models to accomplish reasoning and planning tasks beyond the scope of any single model. This capacity for long term reasoning and planning is far outside of the abilities of contemporary AI; It captures a core component of agency and an important way in which our highly fractured minds approximate the functioning of a unified whole. In joint work with collaborators, I’m working to formalize the norms governing this process of model coordination from the perspective of an ideal reasoner. This allows us to then draw on a range of techniques in machine learning and AI to map out the space of ways such norms might be approximated by computationally limited agents. This amounts to a design space for systems that might reason like us.
In another project, one I’m just starting, I plan to explore the question of a computational account of attention. We know a lot about attention. We have a folk theory of what it does – allowing the mind to take hold of certain objects and privilege them in seeing and thinking. We know of a large number of psychophysical effects of attention and a bit about the neural mechanisms that may be involved. What we don’t know is what attention does in a precise, computational sense. What are the computational resources that attention reallocates? And how does that contribute to effective seeing, hearing, reasoning, or planning? I’d like to approach this question first in perception by thinking about the space of computational models of vision that come from machine learning – generative models, discriminative models, neural models or models exploiting Bayesian inference – and ask for each one, ‘what mechanisms do models in this class have to prioritize the computation of certain contents over others (i.e. to attend)?’ This yields a space of possible computational mechanisms, including the use of small models to approximate large models (as in cognition), inference techniques that trade precision in one dimension for imprecision in another, as well as more conventional changes to the prior, likelihood, or proposal distributions involved. I’d like to develop a few of the more promising mechanisms and assess their fit to the known psychological and neural data, helping us to zero in on a computational account of attention.
Non-Ideal Epistemology:
Across papers three through five discussed above, I develop a view of human cognition on which we reason by building small cognitive models, tailored to the task at hand. I have a couple projects in early stages of development that spell out consequences of this view for non-ideal and social epistemology. The first of these looks at the effect of relevance on non-ideal epistemology. Ideal epistemology classically foregrounds one question, ‘what does my evidence support?’ (or, ‘what am I justified in believing’?). For computationally limited reasoners like us there is another question that must be answered before the question of justification can even be entertained; namely, ‘what is relevant to think about?’. The strategies our minds use to answer this, highlighting certain things as salient and worthy of attention and backgrounding limitless others, have consequences that ramify throughout our beliefs. I’d like to explore how different such strategies might account for certain high-profile cognitive failures, such as confirmation bias, belief polarization, or weakness of will. I think each of these cases can be explained as an occasion on which a generally good cue to relevance (prior beliefs, the relevance judgements of others, or a merely proximal good) leads us astray.
Another project looks at the foundations of non-ideal epistemology and the theory of credences or degrees of belief. Epistemologists who work on credences typically take an anti-realist perspective: credences are convenient fictions – part of a rational redescription of the agent’s behavior. Indeed, the full set of realist requirements on credences cannot be met by anything really in the head (precisely for computational tractability reasons). But if the view of cognition described above is on the right track, then there are probabilistic representations in the head that play many of the essential roles traditionally played by credences. I use this observation to develop a pluralist, but realist, account of credences. This is significant because the anti-realist account of credences has historically made non-ideal epistemology very hard. (When we’re silent about what things actually exist in the mind, we can’t easily draw distinctions between epistemic norms that are actionable and those that are not!) I argue that having a clearer idea of what credences are puts non-ideal epistemology on a more solid foundation, helping us uncover the norms of reasoning for agents with minds like ours.