Spark sessions 002

When? Dec 10 2024 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

Intro: Is There a Data Engineering Minimum for a Professional Data Scientist? by Erik Štrumbelj

Seeing Through Data: Eye Tracking Insights into Graph Comprehension by Leon Hvastja (Data Science Student, University of Ljubljana)

Eye-tracking technology has gained traction over the past decade, mainly due to its increased availability. Traditionally, graphs were evaluated through performance metrics, but eye tracking offers a deeper understanding of the underlying cognitive processes involved. This technology has broad applications, including in education to enhance learning, in understanding disabilities, building cognitive models of perception, and understanding the differences between expert and non-expert users.

Machine Learning and AI in Analysis of Neuroimaging Data by Grega Repovš (Professor, Department of Psychology, University of Ljubljana)

Neuroimaging processing and analysis works on large amounts of multimodal data in spatial and temporal domains resulting in a wide range of features at mulitiple levels of observation. Machine learning and AI are becoming important tools in optimizing data preprocessing, enabling novel analytical approaches, relating neuroimaging features to individuals’ cognition and behavior, and supporting diagnosis and treatment of brain diseases.

Blazing Fast Computation with JAX by David Nabergoj (Young Researcher, University of Ljubljana)

JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. We’ll see what makes it so much faster than PyTorch/Tensorflow, and how it achieves an over 4000x speed-up in reinforcement learning applications.

What’s It Like to Crawl 100M Web Pages Every Week by Roq Xever (CEO and Founder, PredictLeads)

I’ll talk about the infrastructure, costs, and challenges we face while crawling the web for information on over 70 million companies.

Using Knowledge Graphs to Improve Proprietary LLM-based Text Embeddings by Boshko Koloski (Young Researcher, Jozef Stefan Institute)

Semantic knowledge bases hold vast factual knowledge, but dense text representations often underutilize these resources. This work shows that augmenting LLM-based embeddings with knowledge base information improves text classification. Using AutoML and low-dimensional projections via matrix factorization, we achieve faster more accurate classifiers with minimal performance loss, validated on five LLMs and six datasets.

DATA_FAIR by Dunja Rosina (DATA_CONFERENCE organizer and initiator)

DATA_FAIR is a conference dedicated to fostering an inclusive environment for knowledge exchange, networking, and upskilling in data engineering and data science. I’ll briefly explain our purpose, what to expect in the next edition of the conference, and kindy invite you all to join us on Feb 13 2025.