ResearchBib Share Your Research, Maximize Your Social Impacts
Sign for Notice Everyday Sign up >> Login

An algorithm of idiom search in program source codes using subtree counting

Journal: Software & Systems (Vol.35, No. 1)

Publication Date:

Authors : ;

Page : 065-074

Keywords : python; ast; data mining; refactoring; program analysis; programming idiom;

Source : Downloadexternal Find it from : Google Scholarexternal

Abstract

The paper is dedicated to programming idiom extraction algorithm design. Programming idiom is the fragment of source code which often occurs in different programs and used for solving one typical programming task. In this research the programming idiom is a source code fragment that often occurs in different programs and used for solving one typical programming task. In this research, the programming idiom is considered as the part of a program abstract syntax tree (AST), which provides maximum reduce of information quantity in a source code, when all of programming idiom occurrences are replaced with certain syntax construction (e.g., function call). The developed subtree value metric estimates information amount reduce after such replace. Therefore, the idiom extraction is reduced to search of subtree value function maximum on AST subtree set. To reduce a number of subtrees inspected, the authors use steepest descent method for subtree value function maximum search. At each step subtree is extended with one node, which provides maximum increase of a subtree value metric. Subtrees are stored in a data structure that is a generalization of a trie data structure. The paper proposes an accelerated algorithm of idiom extraction. Programming idiom extraction speedup is achieved through reusing results of idiom efficiency maximum search. The paper also de-scribes the implementation of the developed algorithms. The algorithms are implemented in Python programming language. The implementation extracts programming idioms from source code written in Python. This programming language is chosen due to a large corpus of texts written in such language; it also includes convenient tools for building AST. The authors carried out an idiom extraction experiment using the developed implementation. The idioms were extracted from corpora of an open-source program source code. The extracted program-ming idioms are source code fragments with own meaning. It is also shown that applying developed algorithms to a source code of a single software project can reveal possibilities of investigated program refactoring.

Last modified: 2022-07-06 17:34:55