When you run Kubernetes in production and at scale, you encounter many issues both on the infrastructure side as well as in user-space. Some of these issues come with time and increased usage and size of clusters as well as amount of workloads, some might only come once you go global and into regions that have vastly different technology landscapes like China. This talk goes into detail on learnings from concurrently operating 100+ clusters for big enterprises in production on different clouds and data centers around the globe. Over the years we have fixed 100s of post mortems and want to share both operations and development best-practices that can help avoid the issues we ran into. A big focus of this talk is getting towards a hardened and reliable cluster setup and the handling of multi-tenancy in clusters that are used by a multitude of teams.