Deep Learning

What Motivates Us

Deep learning refers to a modern class of machine learning systems that rely on multi-stage processing of data in neural networks. As a field, it has produced some of the most exciting advances in AI in recent years: breakthroughs in computer vision, language understanding and speech recognition have all had their roots in deep learning. This makes deep learning the key enabler for Bosch applications in fields like automated driving, robotics, or embedded AI. However, despite this promise, modern deep learning systems also have some deficiencies that make them ill-suited to tasks relevant to Bosch: the systems are compute and data-hungry, often brittle in real-world deployments, and typically unexplainable. Addressing these challenges motivates our team to create robust, safe, and efficient deep learning systems as part of continuously learning AIoT products.

Our Approach

At Bosch, we are pushing forward the boundaries of deep learning to make these systems fully realizable within products. We create new methods of explainable and robust deep learning models that are ready to be validated and safely used in real-world environments. We develop novel training methods and tools to scale deep learning systems towards working efficiently on various types of embedded hardware. This way, we extend deep learning concepts from computer vision to the entire range of Bosch sensors. Data-efficient learning and generative models allow us to augment data sets with realistic variations of scenes, thereby reducing the efforts for data collection and labelling.

Application

The broad product portfolio of Bosch offers countless application fields for modern deep learning systems. Some of our core domains include perception for driving assistants and automated driving, robotics, health care, smart cameras, and many other applications of Bosch AIoT devices.

Embedded Deep Learning

Embedded deep learning aims to make the superior performance of deep learning algorithms for perception available for all types of sensors at Bosch while at the same time optimizing their computational footprint for deployment on embedded AI platforms and products.

In order to make perception models run efficiently on resource-constrained hardware, we focus on methods such as hardware-aware neural architecture search. Those methods are key enablers for AI-powered products.

Embedded deep learning is therefore being used in video, radar and ultrasonic sensors, amongst others, to improve their performance and efficiency. At Bosch, we collaborate closely between AI and hardware and domain experts to achieve the greatest benefits for embedded AI systems.

Use Case

Hardware-aware Neural Architecture Search Learning - Highly performant and efficient deep neural networks for embedded AI

Figure 1

Introduction

Imagine that your machine learning algorithm could automatically design the best possible neural network model for your task, and even adapt the network to run as efficiently as possible on the chosen hardware platform. Our research team has developed tools and algorithms that save time and costs for training your AI model, and deliver results that achieve great accuracy with orders of magnitude, smaller compute effort, and thus lower hardware costs and higher energy efficiency.

Deep learning, despite its numerous performance improvements over classical machine learning techniques, builds computationally intensive models. This often precludes their use in embedding systems, where strict compute limits and performance guarantees are crucial. Our research on neural architecture search (NAS) aims for two goals:

Developing methods to extract the best possible performance from deep learning architectures for a specific compute budget, or tailored for a specific hardware platform.
Demonstrating the practicality of these methods by developing efficient deep learning architectures for domains of interest, such as deep networks, that enable the latest generation of video cameras or radar sensors.

Our Research

Tools for NAS

Differentiable and evolutionary NAS

We develop flexible and efficient re-usable Python tools that are building blocks for embedded AI applications in many domains. We integrate our latest research results, and specific optimizations for relevant hardware platforms.

Maximizing Efficiency

Hardware awareness leads to perfectly tailored solutions

In collaboration with hardware experts, we define models and cost functions that allow fine-tuning of deep learning models for various hardware platforms. The result are highly performant, and potentially orders of magnitude more efficient.

Pilot Use Cases

Demonstrating the benefits of NAS in embedded AI applications

NAS saves time to design network architectures for novel application fields and delivers near optimal solutions for complicated multi-objective optimization problems. Our tools are powerful, easy to apply, and can be integrated into continuously learning products.

Ahead of the Curve

Cutting-edge research in neural architecture search

Our NAS tools and methods result from world-class research at BCAI and our academic collaborators at the University of Freiburg. We constantly challenge and advance the state-of-the-art in the field, publish at top-tier venues, and put methods to test in real-world Bosch applications.

Figure 2

References

Elsken, T., Metzen, J. H., & Hutter, F. (2019). Neural architecture search: A survey. The Journal of Machine Learning Research, 20(1), 1997-2017. [PDF]

Elsken, T., Metzen, J. H., & Hutter, F. (2018). Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution. ICLR. [PDF]

Elsken, T., Zela, A., Metzen, J. H., Staffler, B., Brox, T., Valada, A., & Hutter, F. (2022). Neural Architecture Search for Dense Prediction Tasks in Computer Vision. arXiv preprint. [PDF]

Schorn, C., Elsken, T., Vogel, S., Runge, A., Guntoro, A., & Ascheid, G. (2020). Automated design of error-resilient and hardware-efficient deep neural networks. Neural Computing and Applications, 32(24), 18327-18345. [PDF]

Data Efficiency and Uncertainty in Deep Learning

Data is essential to achieve good performance and generalization of deep learning models. However, many restrictions in the data collection process often lead to the absence of sufficient amount of training data to enable adequate production performance.

Therefore, data efficiency is important in deep learning since it enables the models to learn in complex domains without requiring large quantities of data. For this reason, we focus on synthetic data generation as well as unsupervised learning and domain adaptation techniques.

A further challenge with deep learning models trained with limited data, but operating in an open world, is that they must be aware of the limits of their knowledge and the inherent uncertainty at the inference stage.

To encounter that challenge, we are working on deep learning models that can make decisions under uncertainty and provide high-quality uncertainty quantification in order to increase safety of the systems and improve their out-of-distribution generalization.

An example for application are deep learning models in the automotive domain that are robust to different corner cases like unusual weather or rare objects in the scene.

Use Case

Synthetic Data Generation and Augmentation - Enabling high performance and robustness of AI models with less efforts on data collection

To ensure safety, a model put into production must generalize well across different scenarios and be robust to various corner-cases. To enable it, these rare cases must be sufficiently present in the training data.

Introduction

Imagine an AI model that generalizes well to different scenarios (e.g. rare objects in the scene) and being robust to various corner-cases (e.g. unusual weather). To enable this, rare corner cases must be sufficiently present in your training data. However, they are often extremely hard and costly to collect. One way to increase the representation of these rare samples in the training data is by synthesizing them using deep generative models, such as Generative Adversarial Networks (GANs) or diffusion models.

At BCAI, we are developing novel and high-performance models for data synthesis aiming to:

Reduce data collection and labelling costs by using synthetic data for augmentation,
Enable synthesis of realistically looking samples with a particular focus on the underrepresented samples in the collected data,
Improve overall performance of the downstream models and their robustness on rare cases by using synthetic data for training.

Images synthesized by our OASIS model by manipulating latent directions, corresponding to day-to-night and cloudy-to-rainy scene properties.

Our Research

Extra Data in the Loop for Free

Reducing the data collection and annotation costs

Data collection and annotation costs can be significantly reduced by using synthetic data generation tools for augmentation. Large amount of high quality training data can be synthesized for free and used to improve the performance of downstream models.

Dealing with Limited Data

Learning from one or a few data samples

Our algorithms also focus on use cases where training data is extremely limited, and only one or a few samples are available. Synthetic data generation methods can augment these samples with non-trivial transformations, boosting the performance of the downstream application models.

Improved Robustness

Unlocking new potential by learning from synthetic data

Our developed methods enable synthesis of realistically looking outlier and corner cases as well as synthesis of the underrepresented samples in the dataset. Those samples are crucial to improve the robustness of the model and often very difficult or even impossible to further collect.

Ahead of the Curve

Benefitting from cutting-edge research on synthetic data generation

Our research team is continuously producing novel methods for synthetic data generation. We constantly challenge and advance the state-of-the-art in the field, publish at top-tier venues, and put methods to test in various real-world Bosch applications.

References

Zhang, D., & Khoreva, A. (2019). Progressive Augmentation of GANs. NeurIPS. [PDF] [Code]

Schoenfeld, E., Schiele, B., & Khoreva, A. (2020). A U-Net Based Discriminator for Generative Adversarial Networks. CVPR. [PDF] [Code] [Video]

Schoenfeld, E., Sushko, V., Zhang, D., Gall, J., Schiele, B., Khoreva, A. (2021). You Only Need Adversarial Supervision for Semantic Image Synthesis. ICLR. [PDF] [Code] [Video]

Sushko, V., Gall, J., Khoreva, A. (2021). One-Shot GAN: Learning to Generate Samples from Single Images and Videos. CVPR Workshops. [PDF] [Code] [Video]

Explainable and Robust Deep Learning

Intelligent autonomous systems need to understand their environment – even if this environment changes over time and looks unlike what has been encountered previously. Moreover, providing an understanding of how and when the systems work reliably and robust can increase trust of users.

We put a special emphasis on designing deep learning-based approaches for complex perception tasks such as multi-task perception, where for instance several object detection and semantic segmentation tasks are handled by a single neural network. Moreover, we are part of a larger team at Bosch exploring deep learning-based approaches for tracking objects over time and for localizing them in the 3D-world using multiple views of a scene.

Besides providing high-performing solutions for perception problems, we also develop tools for quantifying and improving robustness, explainability, and calibration of neural networks.

The main use cases of our perception systems are in domains like driver assistance, automated driving, or video surveillance.

Use Case

Deep Learning for Temporal and Multi-View Fusion - Vision: Scalable, spatio-temporal consistent and real-time capable environment representation of vehicle surrounding

A vehicle equipped with several cameras (left, fields of view in purple) generates a temporal sequence of per-camera frames (middle). We do research on AI-driven perception functionalities for generating an environment representation covering for instance information on vehicle locations and movements (right).

Introduction

Many intelligent systems need to be able to understand their surroundings based on sensors such as RGB cameras. Designing and training deep neural networks for perception tasks can be challenging as:

Information from various sensors need to aggregated to get a full understanding of a scene,
Sensory data need to be integrated over time to be able to track objects or estimate movements,
The size of neural architectures in embedded systems is limited by constraints on latency, energy consumption, and cost of hardware.

We address these challenges by developing neural architectures that are suited for multi-task perception, i.e. different perception tasks can be handled with the same network which reduces computation costs. In addition, we extend these networks to perform multiple view and temporal fusion which allows reasoning of 3D-geometry of the surrounding and tracking moving objects.

A second challenge when designing neural networks for perception in highly automated systems is understanding their reliability and safety:

Can we understand the internal processing of sensory information in a perception system and use this understanding to increase trust in the system?
Can a trained network deal with situations that were not part of its training data (domain shift and corner cases)? How can we quantify and improve the robustness of a network to such unexpected data?
How can a neural network provide reliable estimates on the confidence of its predictions?
How can we validate a neural network for perception in order to deploy it in safety-critical use cases?

We aim on building tools that advance the state-of-the art of deep learning-based perception thus providing solutions for these challenging problems and questions.

Our Research

Multi-Task Perception

Handling different perception tasks with a joint network

Neural networks can be used as models in different perception tasks such as object detection, semantic segmentation, or optical flow estimation. Using a shared neural backbone can reduce hardware requirements and latency.

Temporal and Multi-View Perception

Learning from temporal information of multiple sensors

Dynamic scenes are hard to capture in their entirety from a single static image. Integrating information over time and from sensors providing different views allows to achieve a much better understanding of the actual current state of the 3D-environment.

Improved Robustness

Dealing with domain shift and adversarial attacks

Neural networks need to reliably process inputs that look unlike the typical inputs provided during their training, for instance those with a domain shift or those maliciously altered by an attacker. We devise architectures, training processes, and certification methods for robust neural perception.

Interpretability and Validation

Understanding how and when neural networks work

Wide-spread adoption of neural networks for perception in sensitive or safety-critical domains requires understanding on how they operate internally and that they do not exploit spurious correlations that could hamper robustness and fairness. We develop novel approaches for explainability and validation of deep neural networks.

Shown is a sequence of semantic segmentations (top) and associated semantic grids (bottom, black areas correspond to unknown parts). Based on frame 1 and 2, a prediction of the semantic grid and associated uncertainty is generated and compared to a target.

References

Eulig, E., Saranrittichai, P., Mummadi, C., Rambach, K., Beluch, W., Shi, X., & Fischer, V. (2021). DiagViB-6: A Diagnostic Benchmark Suite for Vision Models in the Presence of Shortcut and Generalization Opportunities. IEEE/CVF (ICCV). [PDF][Code]

Fuchs, F., Worrall, D., Fischer, V., & Welling, M. (2020). SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks. NeurIPS. [PDF] [Code] [Video]

Mummadi, C., Subramaniam, R., Hutmacher, R., Vitay, J., Fischer, V., & Metzen J. (2021). Does enhanced shape bias improve neural network robustness to common corruptions? ICLR. [PDF] [Video]

Mummadi, C., Hutmacher, R., Rambach, K., Levinkov, E., Brox, T., & Metzen J. (2021). Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation. arXiv. [PDF]

Wong, E., Schneider, T., Schmitt, J., Schmidt, F. R., Kolter, J. Z. (2020). Neural Network Virtual Sensors for Fuel Injection Quantities with Provable Performance Specifications (IEEE IV). [PDF]

Wong, E., Schmidt, F. R., Kolter, J. Z. (2019). Wasserstein Adversarial Examples via Projected Sinkhorn Iterations (ICML). [PDF]

Wong, E., Schmidt, F. R., Metzen, J. H., Kolter, J. Z. (2018). Scaling Provable Adversarial Defenses. In Neural Information Processing Systems (NeurIPS). [PDF]

Academic Collaborations

Learn more

Publications

Learn more

Deep Learning

What Motivates Us

Our Approach

Application

Embedded Deep Learning

Use Case

Introduction

Our Research

References ​

Data Efficiency and Uncertainty in Deep Learning​

Use Case

Introduction

Our Research

References ​

Explainable and Robust Deep Learning​

Use Case

Introduction

Our Research

References ​

References

Data Efficiency and Uncertainty in Deep Learning

References

Explainable and Robust Deep Learning

References