МЕТОДЫ ГЛУБОКОГО ОБУЧЕНИЯ ДЛЯ ОБРАБОТКИ ТЕКСТОВ НА ЕСТЕСТВЕННОМ ЯЗЫКЕ

В. В. Курейчик; С. И. Родзин; В.В. Бова

Authors

V.V. Kureichik Southern Federal University
S.I. Rodzin Southern Federal University
V.V. Bova Southern Federal University

Keywords:

Deep learning, natural language processing, neural networks, convolutional neural networks, recursive neural networks, representation learning

Abstract

The analysis of approaches based on deep learning (DL) to natural language processing
(NLP) tasks is presented. The study covers various NLP tasks implemented using artificial neural
networks (ANNs), convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
These architectures allow solving a wide range of natural language processing tasks that previously
could not be effectively solved: sentence modeling, semantic role labeling, named entity
recognition, answers to questions, text categorization, machine translation. Along with the advantages
of using CNN to solve NLP problems, there are problems associated with a large number
of variable network parameters and the choice of its architecture. We propose an evolutionary
algorithm for optimizing the architecture of convolutional neural networks. The algorithm initializes
a random population of a small number of agents (no more than 5) and uses the fitness function
to get estimates of each agent in the population. Then a tournament selection is carried out
between all agents and a crossover operator is applied between the selected agents. The algorithm
has such an advantage as the small size of the network population, it uses several types of CNN
layers: convolutional layer, maximum pooling layer (subdiscretization), medium pooling layer and
fully connected layer. The algorithm was tested on a local computer with an ASUS Cerberus Ge-
Force ® GTX 1050 Ti OC Edition 4 GB GDDR5, 8 GB of RAM and an Intel(R) Core(TM) i5-4670
processor. The experimental results showed that the proposed neuroevolutionary approach is able
to quickly find an optimized CNN architecture for a given data set with an acceptable accuracy
value. It took about 1 hour to complete the algorithm execution. The popular TensorFlow framework
was used to create and train CNN. To evaluate the algorithm, public datasets were used:
MNIST and MNIST-RB. The kits contained black-and-white images of handwritten letters and
numbers with 50,000 training samples and 10,000 test samples.

DEEP LEARNING METHODS FOR NATURAL LANGUAGE TEXT PROCESSING

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

links

Language

journal

index

Information