STATISTICAL AND MACHINE METHODS FOR AUTOMATICALLY EXTRACTING CAUSAL RELATIONSHIPS FROM TEXT (REVIEW)

Authors

Keywords:

Causality, causal knowledge, natural language processing, machine learning, computational linguistics, hidden causality

Abstract

Until the 2000s, the concept of non-statistical methods was used to solve the problem of
automatic extraction of causal relationships (CR). These methods used manually constructed
linguistic templates. Obviously, the CR that did not fit into the built templates could not be
defined. Non-statistical methods required constant manual control by experts, up to the evaluation.
Almost all methods were aimed at extracting explicit CR. In some methods, attempts
were made to untie the extraction system from a specific subject area. To eliminate the above
disadvantages, the methods developed in the future began to shift towards statistical data
processing and machine learning. In this article, statistical and machine methods of CR e xtraction
are considered. A few valuable papers related to the new paradigm of CR extraction
were analyzed. The aim of the research was to evaluate new methods with the ability to identify
their advantages and disadvantages. The great advantage of machine and statistical
methods is independence from the subject area while maintaining the accuracy of extraction.
Such methods are worse in accuracy, but they are not tied to a specific problem area. The
methods themselves, unlike non-statistical ones, which used linguistic and syntactic comparison
with templates manually, are focused on finding these templates. Even though machine
and statistical methods are mostly independent of the subject area and use large corpora oftext for teaching, they are intended mainly for the English language. There is also no standardized
data set that would allow methods to be compared with each other. All works devoted
to methods ignored the extraction of implicit CR.

References

Downloads

Published

2024-01-05

Issue

Section

SECTION I. INFORMATION PROCESSING ALGORITHMS