DataScience

The Spark Sessions are a recurring event designed to facilitate the exchange of ideas and inspiration. The event consists of several snappy presentations, followed by a casual gathering with refreshments.

Interested in giving a talk? Apply here!

Our Spark Sessions Partners

Upcoming: Spark Sessions 006

When? January 2025.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

Upcoming: Spark Sessions 005

When? November 18 2025 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

A new and improved traffic model for Slovenia - leveraging mobile positioning data by Leon Hvastja (Data scientist, Medius)

Are we able to replace classic survey-based traffic models using mobile data? Mobile telecommunication logs generate vast datasets on user positions, offering a powerful resource for modeling daily movement patterns. However, effectively harnessing this data presents significant challenges, including complex noise reduction, the absence of a ground truth for validation, and the need for computationally efficient algorithms to handle the sheer volume of data.

Lessons from Real-World Automated Grading and Simulation Systems by Frenk Dragar (Founder, Trivial Group)

In this talk, we’ll explore practical lessons from deploying automated grading and simulation systems in real-world educational settings. How do grader bias, incomplete data, and student feedback shape our system development? How to make automated feedback not only accurate and scalable, but also constructive and fair? We’ll share insights from production-ready interactive learning experiences.

AI Real-Time Feedback in Sports by Maja Kolar (Data scientist, Valira)

Traditional gym training lacks immediate, objective feedback on exercise technique and performance. We developed a real-time computer vision system that tracks multiple athletes simultaneously, automatically recognizes exercises, counts repetitions, and provides instant technique corrections during workouts. In this talk, we will present the core challenges of building the end-to-end pipeline while maintaining real-time performance on live video streams.

In-house Tooling by Luka Androjna (Head of Data Science, Aurion11)

A quick primer on what in-house tooling is, with a focus on: - Why it’s good to have in-house developed tooling. - When and how to decide to invest in it. - How to develop it and maintain it. - How to get buy-in from others and spread its use.

Emotional intelligence vs gender and academic success by Stefan Popov (Senior Data Scientist, Sportradar)

Is one gender more emotionally intelligent that the other? Does higher emotional intelligence lead to better (or worse) grades? In this talk, we present a research effort that examines the relationship between emotional intelligence, gender, and average grade. We sent out a survey to 45 students, and analysed the results. We present the motivation for our research, the methodology, results, and conclusions.

Past Event: Spark Sessions 004

When? June 10 2025 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

On the Surprising Benefits of Vibe-Coding for Interdisciplinary Collaboration by Boshko Koloski (Young Researcher, Jozef Stefan Institute)

Stepping midstream, inter-disciplinary project—where documentation is sparse and data are disorganized—poses a significant challenge. Thanks to recent advances in LLMs, live “vibe-coding” workshops now offer a practical way to bridge those gaps. In this talk, I will present case studies from interdisciplinary collabs which real-time LLM-driven coding enabled teams to explore, prototype, and co-construct understanding on the fly, even under tight deadlines and uncertain conditions.

Automated Assignment Grading with Large Language Models: Insights From a Bioinformatics Course by Pavlin Poličar (assistant, UL FRI)

Are LLMs ready to replace TAs and grade student assignments? In a blind study conducted at UL-FRI, we tested whether LLMs are able to grade student submissions and generate written feedback with the same accuracy and quality as human TAs. We compared six LLMs—both commercial and open-source—and found that, with the right prompts, LLMs can achieve human-level performance both in terms of grading and feedback.

A General Approach to Visualizing Uncertainty in Statistical Graphics by Bernarda Petek (PhD student and researcher, UL FRI)

Quantifying and communicating uncertainty is integral to scientific communication. Despite this, it’s often neglected. Without general methods or guidelines, practitioners must rely on niche, domain-specific techniques or create ad hoc solutions that are time-consuming, complex, and error-prone. All this narrows how uncertainty is taught and understood. In this talk, I will present our new general approach to visualizing uncertainty and bootplot, our open-source Python implementation.

Automating Value Investing: A Data Scientist’s Journey into Finance by Lidija Jovanovska (Senior Data Scientist, Sportradar)

Explore the intersection of data analysis and personal finance. This talk illustrates the use of Python for analyzing financial data, enabling programmatic stock screening and informed personal finance decisions.

Trials & tribulations of pre-flight scoring in online advertising by Vid Stropnik (Data Scientist, Celtra)

Discover how Celtra built its first pre-flight ad scoring model—turning ad design into a data-powered prediction game! From wrangling Meta’s graph API chaos to debating regressors vs. classifiers, we’ll explore the messiness of creative performance, why “good” is all relative, and what it takes to forecast ad success before it ever goes live.

Past Event: Spark Sessions 003

When? Feb 25 2025 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

Game, Set, Match: Neural Networks in Tennis Video Analysis by Ivan Ivashnev (Senior Computer Vision Engineer, Sportradar) In this presentation, we’ll explore how AI and neural networks revolutionize tennis video analysis. We’ll dive into the models we use, our approach to video processing, and the insights we generate. From identifying player movements to extracting performance metrics, discover how technology is shaping the future of sports analytics.

ML in High Efficiency Production by Luka Androjna ( Cast AI, Senior Data Scientist / ML Guild Master) Going from an experimentation and model validation environment to using models in production is not a trivial task, especially when other constraints come into picture as well, like access to data, limited resources available for inference, latency, deployment method, etc. This talk will give a brief overview of such constraints and explain how they affect our choice in the modelling stage.

Increasing forecast accuracy via statistical inference by Živa Stepančič (Quantitative analyst, GEN-I) We will present the challenge of forecasting electricity demand in Sl energy market and how to strengthen our belief in model predictions. One can increase forecasting probability of energy demand by building a new prediction model, using ensemble models, using domain knowledge or statistical corrections. We are testing the last approach by determining the expected demand at different weather events through modeling prediction bias of weather variables and its effects on energy demand forecast.

TabPFN: Approximating Bayesian Inference with Transformers by Valter Hudovernik (Data Science Student at FRI) TabPFN (Tabular Prior-Data Fitted Network) is a transformer-based foundation model designed for tabular data classification. Trained to approximate Bayesian inference on millions of synthetic datasets, it leverages in-context during inference, enabling fast predictions without retraining. TabPFN outperforms or matches traditional models in accuracy and efficiency on smaller datasets. In this talk, we’ll explore its capabilities and discuss how to integrate it into data science workflows.

Pie Charts: An Apology by Erik Štrumbelj (Researcher at FRI, University of Ljubljana) For as long as I can remember, it has been my mission to point out that pie charts, while engaging, are terrible as statistical plots. To finally put the matter to rest, I studied 100 years of empirical results on pie charts. And while they are indeed very flawed in many ways, they also have some surprising qualities. I’ll share these, along with other insights into visualizing parts of a whole.

Past Event: Spark Sessions 002

When? Dec 10 2024 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana.

What?

Intro: Is There a Data Engineering Minimum for a Professional Data Scientist? by Erik Štrumbelj

Seeing Through Data: Eye Tracking Insights into Graph Comprehension by Leon Hvastja (Data Science Student, University of Ljubljana)

Eye-tracking technology has gained traction over the past decade, mainly due to its increased availability. Traditionally, graphs were evaluated through performance metrics, but eye tracking offers a deeper understanding of the underlying cognitive processes involved. This technology has broad applications, including in education to enhance learning, in understanding disabilities, building cognitive models of perception, and understanding the differences between expert and non-expert users.

Machine Learning and AI in Analysis of Neuroimaging Data by Grega Repovš (Professor, Department of Psychology, University of Ljubljana)

Neuroimaging processing and analysis works on large amounts of multimodal data in spatial and temporal domains resulting in a wide range of features at mulitiple levels of observation. Machine learning and AI are becoming important tools in optimizing data preprocessing, enabling novel analytical approaches, relating neuroimaging features to individuals’ cognition and behavior, and supporting diagnosis and treatment of brain diseases.

Blazing Fast Computation with JAX by David Nabergoj (Young Researcher, University of Ljubljana)

JAX is a Python library for accelerator-oriented array computation and program transformation, designed for high-performance numerical computing and large-scale machine learning. We’ll see what makes it so much faster than PyTorch/Tensorflow, and how it achieves an over 4000x speed-up in reinforcement learning applications.

What’s It Like to Crawl 100M Web Pages Every Week by Roq Xever (CEO and Founder, PredictLeads)

I’ll talk about the infrastructure, costs, and challenges we face while crawling the web for information on over 70 million companies.

Using Knowledge Graphs to Improve Proprietary LLM-based Text Embeddings by Boshko Koloski (Young Researcher, Jozef Stefan Institute)

Semantic knowledge bases hold vast factual knowledge, but dense text representations often underutilize these resources. This work shows that augmenting LLM-based embeddings with knowledge base information improves text classification. Using AutoML and low-dimensional projections via matrix factorization, we achieve faster more accurate classifiers with minimal performance loss, validated on five LLMs and six datasets.

DATA_FAIR by Dunja Rosina (DATA_CONFERENCE organizer and initiator)

DATA_FAIR is a conference dedicated to fostering an inclusive environment for knowledge exchange, networking, and upskilling in data engineering and data science. I’ll briefly explain our purpose, what to expect in the next edition of the conference, and kindy invite you all to join us on Feb 13 2025.

Past Event: Spark Sessions 001

When? Oct 22 2024 at 18:00.

Where? Faculty of Computer and Information Science, Večna pot 113, Ljubljana (lecture room: TBA).

What? (click for a summary)

LLMDataForge: Framework that Leverages Large Language Models (LLMs) to Generate High-quality Datasets Tailored to Your Needs by Gal Petkovšek (Data Scientist at Medius) and Tadej Justin (Chief Data Scientist at Medius)
This talk explores generating high-quality synthetic datasets using LLMs, utilising the LLMDataForge framework for filtering and prompt adjustments to enhance training smaller models for NLP tasks.
One Transformation to Rule Them All: Automated Search for Feature Transformations at Scale by Mark Žnidar (Data Science Intern at Outbrain)
Automated feature search for field-aware factorization models, generating and evaluating many interactions. This method boosts efficiency and accuracy, enabling scalable, robust AutoML model search.
Generating Synthetic Relational Data by Valter Hudovernik (Data Science Student at FRI)
An introduction to the emerging field of relational data generation, focusing on the strengths and limitations of current methods in preserving the characteristics of the original datasets.
Weighing in on Evaluating LLM System Performance by Greta Gašparac (Data Scientist at Sportradar)
Since the release of ChatGPT, there has been a surge of interest in various LLM-powered solutions across industries. However, discussions on evaluating these systems deserve equal, if not greater, attention. In this talk, we explore the critical aspects of LLM system evaluation through a practical example of developing a customer support AI assistant.
Next-Generation AI for Intelligent Waste Management: Leveraging LLMOps and Semantic Entity Linking by dr. Stevanče Nikoloski (Head of Data & AI at Result)
We present a next-gen intelligent waste management solution powered by AI, focusing on semantic entity linking. Using LLMOps, we enhance data collection and fine-tune our LLMs (Mistral, Zephyr, Llama2, Falcon), ensuring local infrastructure compatibility and data privacy. The architecture integrates vector databases with advanced chunking (map-reduce, refine) for applications like summarization, chatbots, and entity linking, enabling continuous improvement and adaptability in waste management.
Value-Based Conversion Tracking in Online Controlled Experiments: Frequentist vs. Bayesian Approach by Aljiša Vodopija (Data Scientist at Outbrain)
This talk explores value-based conversion tracking in online controlled experiments, contrasting frequentist and Bayesian approaches to optimize revenue from differently valued conversions.

Don't miss out on all the latest news and events!