МЕТОДЫ ГЛУБОКОГО ОБУЧЕНИЯ ДЛЯ ОБРАБОТКИ ТЕКСТОВ НА ЕСТЕСТВЕННОМ ЯЗЫКЕ

V.V. Kureichik; S.I. Rodzin; V.V. Bova

V.V. Kureichik Southern Federal University
S.I. Rodzin Southern Federal University
V.V. Bova Southern Federal University

Keywords: Deep learning, natural language processing, neural networks, convolutional neural networks, recursive neural networks, representation learning

Abstract

The analysis of approaches based on deep learning (DL) to natural language processing
(NLP) tasks is presented. The study covers various NLP tasks implemented using artificial neural
networks (ANNs), convolutional neural networks (CNNs) and recurrent neural networks (RNNs).
These architectures allow solving a wide range of natural language processing tasks that previously
could not be effectively solved: sentence modeling, semantic role labeling, named entity
recognition, answers to questions, text categorization, machine translation. Along with the advantages
of using CNN to solve NLP problems, there are problems associated with a large number
of variable network parameters and the choice of its architecture. We propose an evolutionary
algorithm for optimizing the architecture of convolutional neural networks. The algorithm initializes
a random population of a small number of agents (no more than 5) and uses the fitness function
to get estimates of each agent in the population. Then a tournament selection is carried out
between all agents and a crossover operator is applied between the selected agents. The algorithm
has such an advantage as the small size of the network population, it uses several types of CNN
layers: convolutional layer, maximum pooling layer (subdiscretization), medium pooling layer and
fully connected layer. The algorithm was tested on a local computer with an ASUS Cerberus Ge-
Force ® GTX 1050 Ti OC Edition 4 GB GDDR5, 8 GB of RAM and an Intel(R) Core(TM) i5-4670
processor. The experimental results showed that the proposed neuroevolutionary approach is able
to quickly find an optimized CNN architecture for a given data set with an acceptable accuracy
value. It took about 1 hour to complete the algorithm execution. The popular TensorFlow framework
was used to create and train CNN. To evaluate the algorithm, public datasets were used:
MNIST and MNIST-RB. The kits contained black-and-white images of handwritten letters and
numbers with 50,000 training samples and 10,000 test samples.

References

1. Ba L. & Caurana R. Do Deep Nets Really Need to be Deep?, arXiv preprint arXiv:1312.6184,
2013, 521(7553), pp. 1-6.
2. Kalchbrenner N., Grefenstette E., & Blunsom P. A Convolutional Neural Network for Modelling
Sentences, In Proc. of the 52nd Annual Meeting of the Association for Computational
Linguistics, ACL’2014, Baltimore, MD, USA, 2014, Vol. 1, pp. 655-665.
3. Collobert R., Weston J., Bottou L., Karlen M., Kavukcuglu K., Kuksa P. Natural Language
Processing from Scratch, Journal of Machine Learning Research, 2011, 12:2493-2537.
4. Santos C.N., Dos & Guimarães V. Boosting Named Entity Recognition with Neural Character
Embeddings, ACL’2015, pp. 25-33.
5. Iyyer M., Boyd-Graber J., Claudino L., Socher R. A neural network for factoid question answering
over paragraphs, In EMNLP, 2014.
6. Johnson R. & Zhang T. Semi-supervised Convolutional Neural Networks for Text Categorization
via Region Embedding, 2015. pp. 1-12.
7. Golosio B., Cangelosi A., Gamotina O. & Masala G.L. A Cognitive Neural Architecture Able
to Learn and Communicate through Natural Language, Plos-One, 2015, 10 (11).
8. Weston J., America N.E. C.L. & Way I. A Unified Architecture for Natural Language, Processing:
Deep Neural Networks with Multitask Learning, 2008, pp. 160-167.
9. Sun Y., Lin L., Tang D., Yang N., Ji Z. & Wang X. Modelling Mention, Con-text and Entity
with Neural Networks for Entity Disambiguation, 2015. pp. 1333-1339.
10. Malinowski M., Rohrbach M. & Fritz M. 2015. Ask Your Neurons: A Neural based Approach
to Answering Questions about Images, IEEE Int. Conf. on Computer Vision, 2015, pp. 1-9.
11. Yu L., Hermann K.M., Blunsom P. & Pulman S. Deep Learning for Answer Sentence Selection,
NIPS Deep Learning Workshop, 2014, 9.
12. Karl Moritz Hermann and Phil Blunsom. Multilingual Models for Compositional Distributional
Semantics, In Proceedings of ACL, 2014.
13. Shang L., Lu Z. & Li H. Neural Responding Machine for Short-Text Conversation, ACL’2015,
2015. pp. 1577-1586.
14. Wang P., Xu J., Xu B., Liu C., Zhang H., Wang F. & Hao H. Semantic Cluster-ing and Convolutional
Neural Network for Short Text Categorization. Proceedings ACL’2015, 2015, pp. 352-357.
15. Dong L., Wei F., Tan C., Tang D., Zhou M., & Xu K. Adaptive Recursive Neural Network for
Target-dependent Twitter Sentiment Classification, ACL’2014, 2014, pp. 49-54.
16. LeCun Y. and Bengio, YConvolutional networks for images, speech, and time-series. MIT
Press, 1995, pp. 255-258.
17. Kim Y. Convolutional Neural Networks for Sentence Classification, Proc. of the 2014 Conference
on Empirical Methods in Natural Language Processing (EMNLP’2014), 2014, pp. 1746-1751.
18. Gao H., Mao J., Zhou J., Huang Z., Wang L. & Xu W. Are You Talking to a Machine? Dataset
and Methods for Multilingual Image Question Answering, Arxiv, 2015, pp. 1-10.
19. Shen Y., He X., Gao J., Deng L. & Mesnil G. A Latent Semantic Model with Convolutional-
Pooling Structure for Information Retrieval, Proc. of the 23rd ACM Int. Conf. on CIKM’2014,
2014). pp. 101-110.
20. Rodzin S., Rodzina O., Rodzina L. Neuroevolution: Problems, algorithms, and experiments,
Proc. of the 10th IEEE Int. Conf. on Application of Information and Communication Technologies
(AICT’2016), 2016, 7991745.
21. Alshahrani S., and Kapetanios E. Are Deep Learning Approaches Suitable for Natural Language
Processing?, Proc. of the 21st Int. Conf. on Applications of Natural Language to Information
Systems (NLDB’2016), 2016. DOI: 10.1007/978-3-319-41754-7_33.
22. LeCun Y., et al. Back propagation applied to handwritten zip code recognition, Neural
Comput, 1989, 1 (4), pp. 541-551.
23. Larochelle H., Erhan D., Courville A., Bergstra J., Bengio Y. An empirical evaluation of deep
architectures on problems with many factors of variation, in ACM International Conf. Proc.
Series, 2007, Vol. 227, No. 2006, pp. 473-480.

DEEP LEARNING METHODS FOR NATURAL LANGUAGE TEXT PROCESSING

Abstract

References