Visualization and Learning Applications of Code Clones

September 8, 2022

Muhammad Hammad defended his PhD thesis at the department of Mathematics and Computer Science on August 29th.

Source: Shutterstock.

We use software in our phones and washing machines, while in business, software helps with interactions between customers and partners. Software is based on computer code written from scratch or duplicated via code clones, which are repeated code patterns from other software applications. Clones can be detected with algorithms for various reasons such as plagiarism detection, program understanding, or bug detection. Post-detection analysis can help developers zone in on code clones as opposed to false positives.For his PhD research, Muhammad Hammad looked at different aspects of post-detection analysis, like visualizations and utilizing learning approaches, to achieve certain software engineering tasks.

Source: Shutterstock.

Clones are repeatable patterns. Clones exists everywhere such as clones in architecture, clones in animals, clones in leaves, and clones in cars. Clones, which exist in software applications, are called code clones.

Marc Andreessen famously stated that “software is eating the world”. Today, software plays a central role in society, touching the lives of billions everyday. It is the primary way that companies run their businesses today too – from interacting with customers, prospects and partners to making business decisions. To develop and maintain software, code needs to be written in special programming languages such as Java, C#, or Python. These lines of code can be duplicated or cloned.

Code clones 101

According to Roy and Cordy, around 5–50% of the code in software systems could be contained in code clones. Code clones can be detected with the help of different algorithmic techniques, and several clone detection tools have been developed based on those techniques.

Clone detection tools can identify a huge number of clones in large software systems. Even larger numbers of clones are reported when clone detectors are used to identify clones in a family of similar software systems. Furthermore, tools often report false positive clones. Post detection analysis of clones is then a must to help developers discard false positives and to zoom into the areas of interest in the bulk of cloning information. Visualization, learning approaches, along with abstraction, clustering, and filtering are the cornerstone techniques for post-detection clone analysis.

Post-detection analysis

In this thesis, Hammad focuses on the different aspects of post-detection analysis, such as visualizations and utilizing learning approaches to achieve certain goals of code clones. His thesis is divided into two phases.

The first phase focuses on performing a literature survey of software engineering tasks called user goals and information needs required to achieve those goals. Similarly, Hammad performs a comprehensive survey on clone visualization techniques. He believes his survey on clone visualization will be useful to many developers (i.e., developers of clone analysis tools), tool vendors, and researchers in the area of software clones.

The second phase of this research looks at the capability of using code clones for exploratory and rapid development of software. Reusability of duplicated patterns are also common in several fields, such as building constructions, car modeling, and clothes texturing.

Code clones can be considered as the ideal piece of code for this purpose, as they are more stable and possess less risk than developing new code from scratch. Writing new code is also an expensive activity, consuming considerable time and effort. Developers frequently perform code reuse, searching for code snippets over the web or in some codebase, followed by judicious copying and pasting. Features like code search, code prediction, and code generation can help developers to write code faster and more easily. All these tasks are considered to be a part of “code completion" user goal.

Deep-Clone

Code clones are often neglected as just being part of code completion. For predicting new tokens, Hammad proposes a novel technique known as ‘Deep-Clone’, which is to predict a complete clone method body based on the code written so far. This technique is further refined and named as ‘Clone-Advisor’.

He also proposes the ‘Clone-Seeker approach’, which helps developers to perform code clone searches based on a search query written either as source code terms or as natural language. The main objective behind the proposed learning approaches is to develop and evaluate an automated software tool known as ‘Clone-Writer’, which helps developers to develop code swiftly by using code clones. The tool contains different visualization features, which he identifies in the survey, and integrates the proposed learning approaches.

Title of PhD thesis: Visualization and Learning Applications of Code Clones. Supervisors: Mark van de Brand, Önder Babur, and H.A. Basit.

Media contact

Barry Fitzgerald
(Science Information Officer)

Latest news

Keep following us