Data science without causal inference is like a fish without water

Learning about foundational concepts of causal inference is crucial for data analytics and AI because most questions of interest are causal in their nature.

Whether we perform impact evaluations, A/B testing, quality control or clinical trials, causal inference is the method of choice.

Causal inference is one of the most complex data inference methods, but with high rewards in terms of provided insights. In its essence, the theory and methods behind causal inference enable us to analyse causal relationships and thus fully unlock the value that data holds.

Being familiar with causal inference methods and techniques also equips us with problem solving skills that are crucial for analysing data in a scientifically objective way.

The main problem of causal inference is missing data and how to handle it. Because incomplete data is a typical problem in data analytics (often associated with biased results), it is important to become familiar with causal inference methods and techniques — the knowledge that enhances our skills for dealing effectively with incomplete data.

Understanding incomplete data is the path to handle missing data.

For a long time, causal inference was allowed to be performed only within a randomised experimental framework. Due to recent developments in statistical-methodological science, we are now able to perform causal inference also with observational data, i.e., a non-randomised experimental data.

In recent years, causal inference has become one of the most popular methods for analysing data. However, many still struggle with complexities of causal inference conceptual framework.

The conceptual framework of causal inference provides foundational knowledge about the required causal reasoning as also the use of modern statistical thinking when designing studies and analysing data in a causal effect fashion. Such foundational knowledge is critical to be able to analyse causal relationships in a scientifically objective way.

Causal inference also provides us with understanding of the impact that study designs have on trustworthiness of obtained data insights as also on capacity to unlock the value from data.

Some examples of questions that causal inference can answer:

Not all causal questions can be answered

For example, do black students perform better in education attainment than white or Hispanic students?

In this example, the race is considered to be the cause. However, because we cannot manipulate such a cause, meaning that there is no simple intervention with which we could transform a white person into a black person, results of such study cannot be called causal effects, but rather associations which are conditional on a set of covariates used in comparative analysis.

Another example of a cause that cannot be manipulated is sex. We cannot give a magic pill to an individual and transform him/her into an opposite sex.

Within the randomised experimental framework we use intervention to manipulate units of one group in comparison. For example, we apply intervention to units of one group (usually called a treated group), while not applying it to units of another group, i.e., control group. The intervention is well-formulated, known cause.

The known cause is the cause that can be manipulated.

When we can define the known cause, we are able to use causal inference methods and techniques to perform causal effect studies also with observational data.

However, we must make sure that when using observational data, we make all the effort to come up with comparable groups, meaning, to have two or more approximately identical groups of units with respect to important covariates, which can differ only with respect to applied intervention, i.e., the known cause.

Selection of covariates

A careful selection is of utmost importance, in order to be able to reconstruct observational data structure to mimic a data structure of a randomised experiment. Such reconstruction is a complex task, but in its essence it requires a reconstruction of an assignment mechanism of observational data in a way to mimic an assignment mechanism of randomised experimental data.

What is Assignment Mechanism?

In a two group experimental randomised design, units are assigned to either Group 1 or Group 2, popularly called a treated and a control group. The mechanism which assigns units randomly is called an assignment mechanism.

Because with observational data such assignment mechanism either does not exist or it is broken, it is important to reconstruct it in a way to mimic an assignment mechanism of the randomised experiment.

The process of reconstructing the assignment mechanism can be in many ways considered an art work. Yes, science requires art! However, this ‘art’ requires from us to be well-familiar with the necessary causal inference assumptions and ways to satisfy them.

Causal Inference without assumptions is mission impossible

There is a set of causal assumptions that are required to be satisfied if we want to obtain trustworthy conclusions of causal effect estimates. Justifying causal assumptions is challenging. It requires creative thinking, modern statistical thinking and understanding about the science of causal thinking.

Understanding assumptions and how to justify them is of great importance when designing causal inference studies because effectiveness of causal designs depends on capacity to justify the required assumptions.

The more effective the causal design is, the better we can justify required assumptions. The importance of a good design is of such that “Sometimes the design effort can be so extensive that a description of it, with no analyses of any outcome data, can be itself publishable” – Donald B. Rubin (2008) For Objective Causal Inference, design trumps analysis. The Annals of Applied Statistics.

To be able to design causal inference studies effectively, it is important to get familiar with conceptual foundations of causal inference. Causal inference is not an algorithm and neither an equation, but a methodological and analytical approach for analysing causal relationships that requires heavy use of ‘human-mind’ software. Learn more about causal inference’s foundations here.

Want to learn more about how to analyse data in a scientifically objective way?

Join our online courses!

More To Explore

A New Approach To Statistical Thinking

Statistical thinking requires a new approach due to recent developments of the modern statistical science. This new approach puts causal thinking at the heart of the key statistical thinking concepts, which reflects new developments of modern statistical science in the field of causal inference. This new approach is based on

Do you wonder how statistics lies?

Statistics lies in presence of ignorance. By definition, ignorance means lack of knowledge, understanding, or information about something. Statistics lies when a person presenting statistical data lacks knowledge, understanding, or information about statistical-methodological techniques which enables us to analyse data in a scientifically objective way. Although we live in an evidence-based

Data science without causal inference is like a fish without water

Understanding incomplete data is the path to handle missing data.

Selection of covariates

What is Assignment Mechanism?

Causal Inference without assumptions is mission impossible

Want to learn more about how to analyse data in a scientifically objective way?

Join our online courses!

More To Explore

A New Approach To Statistical Thinking

Do you wonder how statistics lies?

Contacts

Pages

Data science without causal inference is like a fish without water

Understanding incomplete data is the path to handle missing data.

Selection of covariates

What is Assignment Mechanism?

Causal Inference without assumptions is mission impossible

Want to learn more about how to analyse data in a scientifically objective way?

Join our online courses!

More To Explore

A New Approach To Statistical Thinking

Do you wonder how statistics lies?

Footer

Contacts

Pages