A similar idea is by the way discussed in a post by Jaime Sevilla on the limits of causal discovery: https://towardsdatascience.com/the-limits-of-graphical-causal-discovery-92d92aed54d6
Related to your causality comment above, two days ago I submitted a research proposal on Causal Representation Learning for AI Safety. You may want to see it here: https://www.lesswrong.com/posts/5BkEoJFEqQEWy9GcL/an-open-philanthropy-grant-proposal-causal-representation
A similar idea is by the way discussed in a post by Jaime Sevilla on the limits of causal discovery: https://towardsdatascience.com/the-limits-of-graphical-causal-discovery-92d92aed54d6
Related to your causality comment above, two days ago I submitted a research proposal on Causal Representation Learning for AI Safety. You may want to see it here: https://www.lesswrong.com/posts/5BkEoJFEqQEWy9GcL/an-open-philanthropy-grant-proposal-causal-representation