# Accepted Papers

#### (Invited Talk) Algorithmic bias & other human-centric challenges in AI

*David Danks*

The introduction of autonomous capabilities produces a range of challenges, not only technological, but also more human-centric. I will discuss several such human-centric challenges raised by AI, as well as some possible human, social, & technological solutions. As a case study, I will focus particularly on issues of algorithmic bias (to use the widely employed term). Throughout, I will emphasize the importance & value of causal knowledge in identifying, characterizing, and responding to these human-centric challenges.

#### (Invited Talk) Towards a Decision-Theoretic Foundation for (Imprecise) Interventional Probabilities

*Jiji Zhang*

Causal Bayesian networks have proved to be a useful representation of causal knowledge in psychology and cognitive science as well as in machine learning. For the former purpose, it is natural to interpret interventional probabilities subjectively, as credences or degrees of belief (of some sort). In this talk I argue that these credences are in general imprecise and should usually be modelled by non-convex sets of probabilities. I then outline a generalization of Seidenfeld et al.'s theory of coherent choice functions that invokes such interventional credences and can potentially provide a decision-theoretic foundation for a subjective interpretation of causal Bayesian networks or the more general causal credal networks.

#### Causal Consistency of Structural Equation Models

*Paul K Rubenstein*, Sebastian Weichwald*, Stephan Bongers, Joris M. Mooij, Dominik Janzing, Moritz Grosse-Wentrup, Bernhard Schoelkopf*

Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise this notion of consistency in the case of Structural Equation Models (SEMs) by introducing exact transformations between SEMs. This provides a general language to consider, for instance, the different levels of description in the following three scenarios: (a) models with large numbers of variables versus models in which the ‘irrelevant’ or unobservable variables have been marginalised out; (b) micro-level models versus macro-level models in which the macrovariables are aggregate features of the microvariables; (c) dynamical time series models versus models of their stationary behaviour. Our analysis stresses the importance of well specified interventions in the causal modelling process and sheds light on the interpretation of cyclic SEMs.

#### Probabilistic Active Learning of Functions in Structural Causal Models

*Paul K Rubenstein, Bernhard Schoelkopf, Ilya Tolstikhin and Philipp Hennig*

We consider the problem of learning the functions computing children from parents in a Structural Causal Model once the underlying causal graph has been identified. This is in some sense the second step after causal discovery. Taking a probabilistic approach to estimating these functions, we derive a natural myopic active learning scheme that identifies the intervention which is optimally informative about all of the unknown functions jointly, given previously observed data. We test the derived algorithms on simple examples, to demonstrate that they produce a structured exploration policy that significantly improves on unstructured base-lines.

#### Algebraic Equivalence of Linear Structural Equation Models

*Thijs van Ommen, Joris M. Mooij*

Despite their popularity, many questions about the algebraic constraints imposed by linear structural equation models remain open problems. For causal discovery, two of these problems are especially important: the enumeration of the constraints imposed by a model, and deciding whether two graphs define the same statistical model. We show how the half-trek criterion can be used to make progress in both of these problems. We apply our theoretical results to a small-scale model selection problem, and find that taking the additional algebraic constraints into account may lead to significant improvements in model selection accuracy.

#### SAT-Based Causal Discovery under Weaker Assumptions

*Zhalama, Jiji Zhang, Frederick Eberhardt, Wolfgang Mayer*

Using the flexibility of recently developed methods for causal discovery based on Boolean satisfiability (SAT) solvers, we encode a variety of assumptions that weaken the Faithfulness assumption. The encoding results in a number of SAT-based algorithms whose asymptotic correctness relies on weaker conditions than are standardly assumed. This implementation of a whole set of assumptions in the same platform enables us to systematically explore the effect of weakening the Faithfulness assumption on causal discovery. An important effect, suggested by simulation results, is that adopting weaker assumptions greatly alleviates the problem of conflicting constraints and substantially shortens solving time. As a result, SAT-based causal discovery is potentially more scalable under weaker assumptions.

#### Counting Markov Equivalence Classes by Number of Immoralities

*Adit Radhakrishnan, Liam Thomas Solus, Caroline Uhler*

Two directed acyclic graphs (DAGs) are called Markov equivalent if and only if they have the same underlying undirected graph (i.e. skeleton) and the same set of immoralities. When using observational data alone and typical identifiability assumptions, such as faithfulness, a DAG model can only be determined up to Markov equivalence. Therefore, it is desirable to understand the size and number of Markov equivalence classes (MECs) combinatorially. In this paper, we address this enumerative question using a pair of generating functions that encode the number and size of MECs on a skeleton G, and in doing so we connect this problem to classical problems in combinatorial optimization. The first generating function is a graph polynomial that counts the number of MECs on G by their number of immoralities. Using connections to the independent set problem, we show that computing a DAG on G with the maximum possible number of immoralities is NP-hard. The second generating function counts the MECs on G according to their size. Via computer enumeration, we show that this generating function is distinct for every connected graph on p nodes for all p≤10.

#### Overcoming Impediments to Estimation with Missing-data

*Karthika Mohan, Judea Pearl*

#### Learning Dynamic Structure from Undersampled Data

*John W. Cook, David Danks,Sergey M. Plis *

Most causal learning algorithms for time se- ries data assume that the underlying generative process operates on approximately the same timescale as the measurement process (or that any differences do not impede learning). This assumption fails in many domains, and so we first show that undersampling creates learn- ing challenges at the measurement timescale, even for simple generative processes. We then describe four algorithmic generalizations (some previously proposed, none previously tested)—two for continuous data, and two for either continuous or discrete data—and test them on simulated data. The results suggest that measurement timescale structure learning from undersampled time series data is feasible, but the appropriate model class needs to be used. Moreover, explicitly representing the possibility of undersampling can yield valuable regularization benefits.

#### Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information

*Jakob Runge *

Conditional independence testing is a fundamental problem underlying causal discovery and a particularly challenging task in the presence of nonlinear and high-dimensional dependencies. Here a fully non-parametric shuffle test based on conditional mutual information is presented. Through a nearest neighbor scheme it efficiently adapts to highly heterogeneous distributions due to strongly nonlinear dependencies. Numerical experiments demonstrate that the test reliably simulates the null distribution even for small sample sizes and with highdimensional conditioning sets. Especially for sample sizes below 2000 the test is better calibrated than kernel-based tests and reaches the same or higher power levels. While the conditional mutual information estimator scales more favorably with sample size than kernel-based approaches, a drawback of the test is its computational expensive shuffle scheme making more theoretical research to analytically approximate the null distribution desirable.

#### Causal Discovery in the Presence of Measurement Noise: Identifiability Conditions

*Kun Zhang, Mingming Gong, Joseph Ramsey, Kayhan Batmanghelich, Peter Spirtes, Clark Glymour*

Measurement error in the observed values of the variables can greatly change the output of various causal discovery methods. This problem has received much attention in multiple fields, but it is not clear to what extent the causal model for the measurement-error-free variables can be identified in the presence of measurement error with unknown variance. In this paper, we study precise sufficient identifiability conditions for the measurement-error-free causal model and show what information of the causal model can be recovered from observed data. In particular, we present two different sets of identifiability conditions, based on the second-order statistics and higher-order statistics of the data, respectively. The former was inspired by the relationship between the generating model of the measurement-error-contaminated data and the factor analysis model, and the latter makes use of the identifiability result of the over-complete independent component analysis problem.