Tuesday, 14 January 2020

Notes of "DAGs with NO TEARS: Continuous Optimization for Structure Learning"

Notes of "DAGs with NO TEARS: Continuous Optimization for Structure Learning"


The paper contributed to the score-based causal discovery methods which are combinatorial constraint optimization problems given score functions. Luckily, the combinatorial constraint, i.e., the acyclicity, can be transformed into a continuous optimization problem by designing a score function of a matrix in linear cases to capture the acyclicity. Next, the problem can be solved by the standard numerical algorithms.

  •  In linear cases, this is an optimization problem over a matrix that represents graph structure. In (post-)nonlinear cases, the advantages may be disappeared.
  • What is the philosophy of score-based methods? Why an edge in this way can be a causal relation? Can we remove the acyclicity constraint, and then what does the result mean?
  • A score function should be coherent with the optimization step. In other words, the score of a graph structure should tell us more about the potential correct structure that we are looking for. This is the built-in information of a score function, which is not being used well. And then can we make use of it to search the structure? The gradient could be a useful guide, what could be the other types of such guide?
  • What could be the relation between AutoML and score-based methods?

3 comments:

  1. Hi, Ruibo! I also like this paper. For your second question, I think the acyclicity constraint is necessary. For SEM like: X = AX + Z, A is adjacency matrix and G is the graph. If G is acyclic, we can find a node x_0 which does not have a parent. We can begin with x_0 and generate a sample X according to A. But is A is cyclic, maybe we can not find such x_0.

    ReplyDelete
    Replies
    1. Hi Yinghua,

      I just realized that there is a comment function in this blog. I think this work is a great example of applying machine learning techniques to causal discovery problems, which must be explored more in the near future.

      Yes, you are right, the assumption is necessary because the acyclicity is used as a key to be optimized over and do the model selection such that the causal graph can be achieved.
      In the example, you mentioned that sampling from a cyclic causal graph might be problematic, but we do have cyclic causal models such as the research in brain function study with fMRI. The cyclic indicates the dynamics. Suppose that there is a start point, which is the same with supposing that we have an egg. Then we have a bird, an egg, and a bird sequentially. Therefore, I guess for sampling from cyclic causal graphs, we need to suppose a start variable at a time step, and then sample all the other variables.

      Delete
  2. This comment has been removed by the author.

    ReplyDelete