Better insight into interconnected graph data

23 april 2024

Larissa Capobianco Shimomura's work helps data analysts understand graph data better and faster

A vast amount of new data is created every day by activities such as browsing new websites, conducting online searches and accepting social media friend requests. One of the types of data created is graph data, which represents information in a network-like structure. Because of its diversity and complexity, graph data is not always easy for data analysts to understand and interpret. To facilitate this, PhD researcher Larissa Capobianco Shimomura introduced a novel formalism called Graph Generating Dependencies (GGDs), which enables the expression of semantic information derived from graph data. She defended her thesis on Friday April 19th.

PhD researcher Larissa Capobianco Shimomura

Consider a social network graph, to illustrate with a simple example. A GGD can express that if two people share the same last name and are ‘friends’ they should also be connected as ‘family’. This previously hidden connection is made immediately clear with the GGD - saving a data analyst a great deal of time and burden.

A powerful resource for data analysts

For data analysts, it is essential to understand and extract relevant data so that the applications and solutions they develop can be improved. With the rise of graph data, there is growing interest in how to express information from graphs. But understanding graph data is far from always immediately obvious. To gain more insights into this matter, a specific research topic exists in Computer Science: data profiling. Its purpose is to develop new methods to represent data and algorithms for gaining insights into datasets.

For her PhD thesis, Larissa Capobianco Shimomura introduces Graph Generating Dependencies (GGDs) in the field of data profiling: a new class of formalism for property graphs. In these graphs, objects are interconnected based on relationships or associations. GGDs can aid to express semantic information from a graph. Subsequently, data analysts can use that information to better understand the content of the data – and also to identify and correct possible errors. 

A step forward for data analysis and profiling

By introducing this formalism and outlining its practical applications, Capobianco Shimomura’s research contributes to advancing the field of data analysis and profiling, particularly in the context of graph data. Furthermore, her thesis outlines open challenges and directions for tasks like data profiling and data cleaning in graph data management systems.

Title of PhD thesis: On Graph Generating Dependencies and their Applications in Data Profiling
Supervisors: prof.dr. G.H.L. Fletcher, dr. N. Yakovets

Media contact

Anke Langelaan
(Science Information Officer)

More on AI and Data Science

Latest news

keep following us