Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers
Proportional, Integral and Derivative (PID) control structures are still the main control tool for industrial applications, especially in process industries . The large portion of PID controlled applications is mainly due to a past success, wide availability and simplicity in use. Even in multi-loop or multivariate control problems PID networks can often be employed . In practice, control design is achieved by tedious manual tuning of rules using linear transfer function models and PID control design options. Furthermore, decoupling the system into independent PID controllers is also necessary, although many industrial processes are inherently multivariate .
In this work, we extend a framework called PILCO  which, in contrast to classical control methods, takes into account the full non-linear model of system dynamics and all couplings between control loops. In this framework, a model-based optimal control concept is employed to find an optimal policy for relevant operation scenarios. Gaussian Process Regression , a specific non-parametric function estimator, is leveraged to model the state transition function of system dynamics . These models can be viewed as distribution over a function space and, thus, do not only provide a prediction but also an uncertainty measure. In areas of sparse data, uncertain knowledge about the system’s behavior and in areas exhibiting more complex dynamic behavior, this system model can be utilized to react accordingly by implementing cautious control  or further exploration to improve the model knowledge . The main focus is on data-efficiency to reduce the necessary interaction time with the real system.
Approach & Results
We extend the policy search framework to the setting of multivariate PID networks and demonstrate the applicability of this auto-tuning method to a challenging real world problem, coping with imperfect low level tracking controllers and unobserved dynamics. We deploy an augmentation of the system’s state to capture the information required for a PID controller. This state representation is subsequently used to represent arbitrary multivariate and coupled PID controller structures as a linear feedback law in the extended state. Using the model-based policy search technique, e.g. PILCO, we optimize the parametrized policy resulting in an optimal PID controller given sampled data. The evaluation results show that the approach can be used to obtain PID controllers for complex robotics tasks, such as pole balancing.
Video of Apollo balancing the pole
 P. Cominos and N. Munro, “PID controllers: Recent tuning methods and design to specification,” IET Proceedings on Control Theory and Applications, vol. 149, no. 1, pp. 46–53, 2002.
 M. A. Johnson and M. H. Moradi, PID control. Springer, 2005.
 T. Yamamoto and S. Shah, “Design and experimental evaluation of a multivariable self-tuning PID controller,” IET Proceedings on Control Theory and Applications, vol. 151, no. 5, pp. 645–652, 2004.
 M. Deisenroth and C. E. Rasmussen, “PILCO: A model-based and data-efficient approach to policy search,” in Proceedings of the International Conference on Machine Learning (ICML), 2011, pp. 465–472.
 J. Kocijan, A. Girard, B. Banko, and R. Murray-Smith, “Dynamic systems identification with Gaussian processes,” Mathematical and Computer Modelling of Dynamical Systems, vol. 11, no. 4, pp. 411–424, 2005.
 R. Murray-Smith, D. Sbarbaro, C. E. Rasmussen, and A. Girard, “Adaptive, cautious, predictive control with Gaussi-an process priors,” 2003.
 R. Murray-Smith and D. Sbarbaro, “Nonlinear adaptive control using non-parametric Gaussian process prior mod-els,” 2002.
 C. K. Williams and C. E. Rasmussen, “Gaussian processes for machine learning”, the MIT Press, vol. 2, no. 3, p. 4, 2006.