# Diz 79 en

# Scheduling with uncertain processing times given by empirical distributions

**Author**: Antonín Novák

The processing time of a job is arguably one of the most important parameters of scheduling problems, but often it is also the one that is very difficult to obtain - either it is not known, costly to measure, or it does not have a deterministic nature. Classical deterministic scheduling methods that replace the missing information with, e.g., single-point estimates may produce solutions that are fragile and susceptible to unexpected realizations of the parameters. Therefore, the optimization approaches that can deal with imprecise knowledge of the problem parameters must be used in cases where protection against unfavorable realizations is vital. The important essence in the design of such optimization algorithms is the choice of the uncertainty model and the approach to mitigate its effects. Is the estimation of, e.g., the mean and variance enough? Can we assume that, at most, a certain number of parameters will deviate from their estimates? Do we need to have a risk-averse objective, and what is the price for that? These are essential considerations since too simplistic models can lead to conservative and inefficient solutions or may fail to protect from uncertainty at all. Luckily, the advancements in collecting large datasets for many aspects of industrial processes allow us to apply powerful data-driven approaches to scheduling under uncertainty.

In this work, we study different means of integrating distributional knowledge of processing times into scheduling problems. The part of the proposed methods in this thesis takes inspiration from the machine learning field, as the techniques used there are scalable and deal with uncertain parameters also. As an example, we have used in-sample data to estimate the parameters of an ambiguity set to obtain a distributionally robust solution via norm regularization techniques which greatly reduces the run time of the algorithm. Another way of improving the scalability of the scheduling algorithms lies in the choice of a suitable model of uncertainty. For example, we have proposed a way to model jobs with uncertain processing times with an abstraction called F-shape, which discretizes the cumulative distribution function of processing times.The resulting problem formulation is essentially a packing problem that can be solved efficiently in practice with the advantage that the probabilistic guarantees of the resulting schedules can be verified within the framework of Bayesian networks. Finally, we study a setting where the distribution of processing times can be approximated by a normal distribution. The full distributional knowledge is utilized in a beta-robust formulation that maximizes the probability that all jobs are completed before a given common due date. For the researched scheduling problems with uncertain processing times utilizing distributional knowledge, we have obtained their complexity characterizations, and we have developed scalable algorithms that provide high-quality solutions for instances with hundreds of jobs.