What neural networks know about linguistic complexity
Journal: Russian Journal of Linguistics (Vol.26, No. 2)Publication Date: 2022-06-30
Authors : Serge Sharoff;
Page : 371-390
Keywords : automatic text classification; deep learning; interpreting neural networks;
Abstract
Linguistic complexity is a complex phenomenon, as it manifests itself on different levels (complexity of texts to sentences to words to subword units), through different features (genres to syntax to semantics), and also via different tasks (language learning, translation training, specific needs of other kinds of audiences). Finally, the results of complexity analysis will differ for different languages, because of their typological properties, the cultural traditions associated with specific genres in these languages or just because of the properties of individual datasets used for analysis. This paper investigates these aspects of linguistic complexity through using artificial neural networks for predicting complexity and explaining the predictions. Neural networks optimise millions of parameters to produce empirically efficient prediction models while operating as a black box without determining which linguistic factors lead to a specific prediction. This paper shows how to link neural predictions of text difficulty to detectable properties of linguistic data, for example, to the frequency of conjunctions, discourse particles or subordinate clauses. The specific study concerns neural difficulty prediction models which have been trained to differentiate easier and more complex texts in different genres in English and Russian and have been probed for the linguistic properties which correlate with predictions. The study shows how the rate of nouns and the related complexity of noun phrases affect difficulty via statistical estimates of what the neural model predicts as easy and difficult texts. The study also analysed the interplay between difficulty and genres, as linguistic features often specialise for genres rather than for inherent difficulty, so that some associations between the features and difficulty are caused by differences in the relevant genres.
Other Latest Articles
- ReaderBench: Multilevel analysis of Russian text characteristics
- Natural language processing and discourse complexity studies
- Computational linguistics and discourse complexology: Paradigms and research methods
- Advertising Strategies of Selected Brands in Times of Covid-19: A Comparative Analysis
- RIPENING BEHAVIOR OF CARABAO MANGO FRUITS (Mangifera indica) TREATED WITH CLIMACTERIC FRUITS AS RIPENING STIMULANTS
Last modified: 2022-06-30 03:46:24