An algorithm of idiom search in program source codes using subtree counting
Journal: Software & Systems (Vol.35, No. 1)Publication Date: 2022-03-16
Authors : D.A. Orlov;
Page : 065-074
Keywords : python; ast; data mining; refactoring; program analysis; programming idiom;
Abstract
The paper is dedicated to programming idiom extraction algorithm design. Programming idiom is the fragment of source code which often occurs in different programs and used for solving one typical programming task. In this research the programming idiom is a source code fragment that often occurs in different programs and used for solving one typical programming task. In this research, the programming idiom is considered as the part of a program abstract syntax tree (AST), which provides maximum reduce of information quantity in a source code, when all of programming idiom occurrences are replaced with certain syntax construction (e.g., function call). The developed subtree value metric estimates information amount reduce after such replace. Therefore, the idiom extraction is reduced to search of subtree value function maximum on AST subtree set. To reduce a number of subtrees inspected, the authors use steepest descent method for subtree value function maximum search. At each step subtree is extended with one node, which provides maximum increase of a subtree value metric. Subtrees are stored in a data structure that is a generalization of a trie data structure. The paper proposes an accelerated algorithm of idiom extraction. Programming idiom extraction speedup is achieved through reusing results of idiom efficiency maximum search. The paper also de-scribes the implementation of the developed algorithms. The algorithms are implemented in Python programming language. The implementation extracts programming idioms from source code written in Python. This programming language is chosen due to a large corpus of texts written in such language; it also includes convenient tools for building AST. The authors carried out an idiom extraction experiment using the developed implementation. The idioms were extracted from corpora of an open-source program source code. The extracted program-ming idioms are source code fragments with own meaning. It is also shown that applying developed algorithms to a source code of a single software project can reveal possibilities of investigated program refactoring.
Other Latest Articles
- COMPARATIVE ANALYSIS OF THERMAL METHODS FOR PROTECTION OF GAS EXHAUST DUCTS OF BOILER PLANTS
- NEUTRALIZATION OF ACIDIC WATER CONDENSATE OF GAS-FIRED BOILER UNITS BY DECARBONIZATION METHOD INTO THE GRANULAR TYPE FILTER
- Software for solving the precedence constrained generalized traveling salesman problem
- AN ALGORITHM FOR FINDING THE OPTIMIZED AND SHORTEST PATH IN A SOFTWARE SYSTEM FOR AUTOMATION OF TRANSPORT TARIFFIC MANAGEMENT AND PROVISION OF INFORMATION ON TRANSPORT ROUTES
- SYSTEM FOR DETERMINING MOVEMENT PARAMETERS OF OBJECTS WITH THE HELP OF DRONES
Last modified: 2022-07-06 17:34:55