Data Science Conference 3.0
Speakers and Talks

Keynote speakers


Full Schedule


Data Science and Big data are not just buzzwords anymore. Insights from data can help companies making better choices. On this presentation, Darko will talk about the experience from data projects from real use cases. VIP Mobile Use Case is about applying Machine Learning on telco network data. Understanding of data can improve planning of network capacity in the economy scale of manner, improving customer experience and boosting sales with minimal investments in hardware. The idea of this presentation is to be less technical and more about how companies can benefit from Data Science


This talk will cover introduction and application of data monetization in small and medium sized enterprises. We will start by emphasizing the significance of all pre-conditions: people, domain knowledge, business processes, infrastructure, architecture and finally, data. Then we will look at a working example within a sales company and demonstrate how to monetize two actual business processes. The first example will deal with improving the supply process. This is a story about how linear regression and moving averages can help us automate and optimize supply of goods for wholesale and prevent missed sales due to inadequate supply of said goods. The second example will take us through an attempt to increase cash flow by detecting outstanding receivables based on the receivables turnover ratio. Stalling of receivables is a known regional issue and our approach increases visibility and makes things clearer.


WIn data-driven world where every business entity relies on data to make better decisions, correct interaction with customers becomes imperative. One of key imperatives for successful interaction with customer is applying approach which best suits that customer. Every personality trait will demand different approach for communication, and thus, our intention is to show how interaction over Social Media or other channels, can help us to model and predict personality traits for every individual who gives appropriate consent. Based on that personality traits prediction, combined with other personal information (demographic, interest in our services, etc.) we can create unique strategy for approaching persons on individual level. Knowledge of personality trait can help any industry to proactively take appropriate action. There is study showing that members of teen population with specific traits are more prone to drug abuse and for example this can help with preventive and corrective actions on school level.


Genomics Data Science deals with some very interesting challenges, to some extent unique among other domains in which data science is applicable. Advances in high-throughput sequencing led to enormous amounts of data and increasing rate of data production, which makes genomics one of top data science demanding domains (Big Data: Astronomical or Genomical? doi: 10.1371/journal.pbio.1002195). From methodological perspective one of the main challenges is to extract biologically meaningful results from highly heterogeneous and multidimensional data. Focus of this talk will be on analysis of high-dimensional transcriptomics data from GTEx project ( Methods for data normalization, feature extraction and dimensionality reduction will be discussed in details and compared. Counterintuitive result from this research - that reduced data sometimes reveals structures not present in high-dimensional representation - will be shown and distributed to attendees, as interactive report, to tweak and play with.


BigDL is a deep learning framework modeled after Torch and open-sourced by Intel in 2016. BigDL runs on Apache Spark, a fast, general, distributed computing platform that is widely used for Big Data processing and machine learning tasks. Apache Spark comes with its own machine learning library of algorithms, but it still lacks deep learning capabilities. BigDL efficiently fills this void by providing rich deep learning support and high performance through Intel's Math Kernel Library and multi-threaded task execution.

In this talk I will give a short overview of Apache Spark, a description of BigDL architecture and API, and then I will show a couple of demo examples of using BigDL on some real data.


We all know that every response to an issue ticket we receive comes with a cost that we have to cover. On the flip side, the time taken from the reception of the issue ticket to its response is often unacceptable from a user perspective and contributes to additional frustration. So we are dealing with two problems: money and user satisfaction. In order to tackle these problems, we designed a classification model that learns from tracked support agents’ action logs, which contain data about their behaviour. This model provides immediate answers on user issue tickets that, in most cases, match what would user get as an answer. This is made possible because a great share of resolved tickets are covered with a finite set of predefined template answers. So we are automatically resolving issues that agents would normally reply to with predefined answer. The model contains two parts, the textual and the game domain part. The textual part of model extracts information from the text that is written in issue ticket, and other, the game domain one, relies on in-game data about the user and their recent activities.


There has been a huge explosion of big data stored and captured by both the private and public sectors where 90% of the data in the world today has been created in the last two years. As organizations increasingly rely on the insights gleaned from big data to make critical business decisions, the role of the data scientist has become crucial. McKinsey predicts that by 2018, the global demand for data scientists is projected to exceed supply by more than 50%. Declared the sexiest job of the 21st century, Data Scientists are tasked with adding value to organizations by extracting knowledge from data and generating it into actionable insights and innovative solutions as well as enable data-driven decision-making. Data scientists are a new breed of analytical data experts who have the technical skills to solve complex problems and the curiosity to explore what problems need to be solved.

Usually machine learning algorithms are regarded in a monolithic fashion, i.e. being black boxes. They are usually taught to students in such a fashion. However, in reality an algorithm faces many challenges to adapt to data and users requirements. Boris Delibašić and his team have conducted research in white-box machine learning algorithms since 2009 by developing a framework, called WhiBo, where algorithms are constructed from reusable components after the dataset and requirements are given. In such a way the paradigm in algorithm design is shifted from choosing algorihtms from a set of existing algorithms to building algorithms from reusable components. The WhiBo team has already done research in the area of decision trees, partitioning clustering algorithms, and data preprocessing. However, the area is open to other machine algorihtm families as well.


TA key to support efficient learning and extraction of knowledge from data often lies in the employment of an adequate numerical optimization algorithm, e.g., to minimize an empirical loss function defined over the available data. In the era of Big Data, datasets are frequently too large to be stored and processed efficiently in conventional ways, e.g., by standard optimization methods implemented on a single computer. Instead, modern learning problems call for the development of parallel and distributed optimization methods, where data is partitioned over a number of processing nodes interconnected in a network; the nodes then process their locally stored data and exchange messages with (subsets) of the remaining nodes to solve the global learning problem of interest. In this talk, we review a selected body of recent state of the art distributed optimization methods, highlighting important ideas and paradigms that enable the progress in the field.


In the ever-growing ecosystem of machine learning applications, we’re trying to help people when making important decisions, like buying a car. Join us for a session where we will share our journey building an iOS solution, which makes searching for cars easy and intuitive for everyone - just snap a photo of a car that you like and we’ll do the work for you. We'll talk about how we dealt with privacy and performance issues, using the brand new CoreML framework, and how it fits the entire development flow, from model training to performing in-app classifications.


In this talk we give some insights why it is beneficial to host your own GPU box. Moreover, we show how we design, train and test image classifiers based on state-of-the-art methods. We outline the do's and don’t from a practical point of view.


Machine Learning is starting to enable us to understand individual disease course, and treatment effect. Together with medical imaging it is a key to precision medicine – optimizing the treatment of individual patients. In the talk I will give an overview of how machine learning impacts our way of working with imaging biomarkers, how it makes the discovery of novel marker patterns possible, and how these patterns can be translated into predictive models that support treatment and patient management.


During the last 20 years the volume and variety of broadly available data has driven the science and business to start searching the ways for extracting valuable information and knowledge from the data.

Big data volumes have overcome well known frames of statistics and manual data analysis. At the same time, super-fast technology changes combined with the development of mathematical algorithms, customized to achieve the best performances in the new technological and big data environment, has created a new frame for data analyses.

Data science has grown up into the most favorite applied science in the business sector. Companies in almost every industry are focused on collecting and exploiting internal and external data for competitive advantage. TeleSign generates a tremendous amount of Call Detail Record data per day. The big data possession enables us to use data science tools to automatize process of fraud detection.

Automated fraud detection has become mandatory segment of our business after the fraud has evolved into a big business. Large profits, expansion of modern technology and the global superhighways of communication have justified the growth of well-organized and well-informed community of fraudsters, resulting in the huge financial loss worldwide each year. Fraud detection methods are continuously developed to defend criminals in adapting to their strategies. This presentation describes a few main types of data science techniques available for automated fraud detection.


Businesses from all corners of the world as well as in Serbia have begun migrating from the physical world of handing out leaflets, selling stuff with word of mouth, to the mobile realm. While many consumers have started buying products via mobile app, B2B customers are still reluctant to take this path as they need approvals from their superiors. However, they should also have easy access to services that will help them in every day work.

In this presentation we are going to show how B2B customers are targeted with roaming add-ons on My Telenor app using Machine Learning in R-language. For this add-ons, B2B customers with pre-approved activation by Tariff Plan selection, have been chosen.


Paid Search Marketing is becoming much more than just using Google’s suggested bid on a keyword in order to reach that first spot on the search result page. We can use the copious amounts of data we have about a customer’s journey, from that very first impression, to the click on an ad, and the path a customer takes on your website, in order maximize ROI. In this talk, I will go through some strategies and algorithms I use to optimize keyword bidding as well as share the results we got from implementing those tactics in our B2C marketing efforts.

There are many settings where we want to infer effect of one released feature on user metrics and detect causal relationship. However, it becomes particularly hard to estimate how one thing impact another when AB tests are inapplicable and there are several effects overlapping in time. Thus, common question is how recent changes around Top Eleven product impact our revenue, retention, and other game relevant metrics since some changes can hush-up positive impact of others or cause some negative impact to be overlooked. In this talk, we are going to present causal inference methods that helped us detect causality and estimate impact in highly dimensional dataset and discuss traits that we had to tackle in order to provide results. Furthermore, we are going to stress importance and potential of using advances in causal inference to provide needed insights that would lead to most optimal decisions. These methods are well established in social and biomedical research and any other field involving observational studies, and we are going to discus them from gaming industry perspective.

Most of global corporations have similar challenge – understand where the given industry is heading and what will be the global trends in the following years. Data science is the only profession that can address this topic based on hard facts - real numbers and proven statistical methods. CBS Systems has developed machine learning model for one of the global leaders in tobacco industry. This model, implemented in analytical software application, is able to predict global tobacco industry size in terms of number of smokers, sales volumes and sales value for next 15 years and over 60 countries. Furthermore, creation of simulation (what-if) scenarios is enabled, supporting decision making on executive level.


Introduction to SISENSE two major components for visualization and data preparation. How SISENSE works and its performance in crunching large data sets, combining different data sources and user experience of interactive visualizations. Talking about implementation process and user engagement. What were our expectations from SISENSE and how it fitted our business needs. Pros & Cons and is it good value for money.


In this talk I'll show you an overview of scenarios I’ve come across working as a front-end developer on a web application for Big Data visualization. The talk will cover several real-world examples of big data visualization in a web browser, covering both realtime and batch data. As the main part of the talk I will compare different JavaScript libraries which can be used for such purposes with focus on performance, usability and extendibility.

"90% of the data in the world today has been created in the last two years alone" - IBM 2012.

This surge in data production is further complicated by the varied data needs of different organisations, and as a result, the complexity of development infrastructure is growing. Tools needs to be easy to use and to provide fast prototyping for development.

In this talk I will introduce Apache Nifi as one of the solutions for data flood and how we can use it in organisations.

The talk will introduce the blockchain (called also distributed ledger technology) as a development platform. We will explain the technical problem these technologies are solving, present current and potential use cases and give a short overview and comparisons of most popular blockchain platforms.


This talk will narrate our on-going journey in transforming this Malta based and heavily distributed startup into a fully data informed and driven organization. Specifically, we will describe our big data infrastructure, challenges we are facing while building it from scratch, our self-service platform to share insights derived from data science and analytics, and most importantly: our approach to shift the cultural mindset in which we want to fuse creativity, intuition and quantitative sciences.


Developing cloud based software solution usually means selecting vendor and technology in advance, and writing software very specific for that vendor, platform or service.

With growing offer of managed Spark services, Spark is turning into a platform that allows vendor-independent big data solution, which is a concept that Content Insights tries to use as much as possible.

This talk will go through pros and cons of such approach, from the perspective of a startup or any other case that implies a great deal of uncertainty and need for fast adaptation.

Be among the first to buy a ticket with a Earyl Bird discount today.