קולוקוויום בית הספר למדעי המחשב
"Recovering Data Semantics"
Speaker: Roee Shraga (Khoury College, Northeastern University, Boston, MA)Room 420 | Checkpoint Building
|
---|
Abstract In data science, it is increasingly the case that the main challenge is finding, curating, and understanding the data that is available to solve a problem at hand. Furthermore, modern-day data is challenging in that it lacks many forms of semantics ("meaning of data"). Metadata may be incomplete or unreliable, data sources are unknown, and data documentation rarely exists. To address these challenges, the objective of my research is to recover data semantics throughout data discovery, versioning, integration, and quality. In this talk, I will discuss current data science challenges and highlight two specific aspects of my research that assist with such challenges. In particular, I will present ALITE, the first scalable integration solution for tables that may have been discovered in data lakes (repositories of big data). ALITE relaxes previous assumptions that tables share common attribute names (which completely determine the join columns), are complete (without null values), and have acyclic join patterns. I will also introduce Explain-Da-V, a solution that explains dataset versions by generating data transformations that resolve data changes.
|