PSP as an admission controller resource is being deprecated. Deployed PodSecurityPolicy's will keep working until version 1.25, their target removal from the codebase. A new feature, with a working title of "PSP replacement policy", is being developed in KEP-2579. To learn more, read PodSecurityPolicy Deprecation: Past, Present, and Future.
Kustomize version in kubectl had a jump from v2.0.3 to v4.0.5. Kustomize is now treated as a library and future updates will be less sporadic.
Default Container Labels
Pod with multiple containers can use kubectl.kubernetes.io/default-container label to have a container preselected for kubectl commands. More can be read in KEP-2227.
Immutable Secrets and ConfigMaps
Immutable Secrets and ConfigMaps graduates to GA. This feature allows users to specify that the contents of a particular Secret or ConfigMap is immutable for its object lifetime. For such instances, Kubelet will not watch/poll for changes and therefore reducing apiserver load.
Structured Logging in Kubelet
Kubelet has adopted structured logging, thanks to community effort in accomplishing this within the release timeline. Structured logging in the project remains an ongoing effort -- for folks interested in participating, keep an eye / chime in to the mailing list discussion.
Storage Capacity Tracking
Traditionally, the Kubernetes scheduler was based on the assumptions that additional persistent storage is available everywhere in the cluster and has infinite capacity. Topology constraints addressed the first point, but up to now pod scheduling was still done without considering that the remaining storage capacity may not be enough to start a new pod. Storage capacity tracking addresses that by adding an API for a CSI driver to report storage capacity and uses that information in the Kubernetes scheduler when choosing a node for a pod. This feature serves as a stepping stone for supporting dynamic provisioning for local volumes and other volume types that are more capacity constrained.
Generic Ephemeral Volumes
Generic ephermeral volumes feature allows any existing storage driver that supports dynamic provisioning to be used as an ephemeral volume with the volume’s lifecycle bound to the Pod. It can be used to provide scratch storage that is different from the root disk, for example persistent memory, or a separate local disk on that node. All StorageClass parameters for volume provisioning are supported. All features supported with PersistentVolumeClaims are supported, such as storage capacity tracking, snapshots and restore, and volume resizing.
CSI Service Account Token
CSI Service Account Token feature moves to Beta in 1.21. This feature improves the security posture and allows CSI drivers to receive pods' bound service account tokens. This feature also provides a knob to re-publish volumes so that short-lived volumes can be refreshed.
CSI Health Monitoring
The CSI health monitoring feature is being released as a second Alpha in Kubernetes 1.21. This feature enables CSI Drivers to share abnormal volume conditions from the underlying storage systems with Kubernetes so that they can be reported as events on PVCs or Pods. This feature serves as a stepping stone towards programmatic detection and resolution of individual volume health issues by Kubernetes.
Known Issues
TopologyAwareHints feature falls back to default behavior
The feature gate currently falls back to the default behavior in most cases. Enabling the feature gate will add hints to EndpointSlices, but functional differences are only observed in non-dual stack kube-proxy implementation. The fix will be available in coming releases.
Urgent Upgrade Notes
(No, really, you MUST read this before you upgrade)
Kube-proxy's IPVS proxy mode no longer sets the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558. (#92938, @lbernail) [SIG Network and Release]
Kubeadm: during "init" an empty cgroupDriver value in the KubeletConfiguration is now always set to "systemd" unless the user is explicit about it. This requires existing machine setups to configure the container runtime to use the "systemd" driver. Documentation on this topic can be found here: https://kubernetes.io/docs/setup/production-environment/container-runtimes/. When upgrading existing clusters / nodes using "kubeadm upgrade" the old cgroupDriver value is preserved, but in 1.22 this change will also apply to "upgrade". For more information on migrating to the "systemd" driver or remaining on the "cgroupfs" driver see: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/. (#99471, @neolit123) [SIG Cluster Lifecycle]
Newly provisioned PVs by EBS plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99130, @ayberk) [SIG Cloud Provider, Storage and Testing]
Newly provisioned PVs by OpenStack Cinder plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99719, @jsafrane) [SIG Cloud Provider and Storage]
Newly provisioned PVs by gce-pd will no longer have the beta FailureDomain label. gce-pd volume plugin will start to have GA topology label instead. (#98700, @Jiawei0227) [SIG Cloud Provider, Storage and Testing]
OpenStack Cinder CSI migration is on by default, Clinder CSI driver must be installed on clusters on OpenStack for Cinder volumes to work. (#98538, @dims) [SIG Storage]
Remove alpha CSIMigrationXXComplete flag and add alpha InTreePluginXXUnregister flag. Deprecate CSIMigrationvSphereComplete flag and it will be removed in v1.22. (#98243, @Jiawei0227)
Remove storage metrics storage_operation_errors_total, since we already have storage_operation_status_count.And add new field status for storage_operation_duration_seconds, so that we can know about all status storage operation latency. (#98332, @JornShen) [SIG Instrumentation and Storage]
The metric storage_operation_errors_total is not removed, but is marked deprecated, and the metric storage_operation_status_count is marked deprecated. In both cases the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total). (#99045, @mattcary)
ServiceNodeExclusion, NodeDisruptionExclusion and LegacyNodeRoleBehavior features have been promoted to GA. ServiceNodeExclusion and NodeDisruptionExclusion are now unconditionally enabled, while LegacyNodeRoleBehavior is unconditionally disabled. To prevent control plane nodes from being added to load balancers automatically, upgrade users need to add "node.kubernetes.io/exclude-from-external-load-balancers" label to control plane nodes. (#97543, @pacoxu)
Changes by Kind
Deprecation
Aborting the drain command in a list of nodes will be deprecated. The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain. For now, users can try such experience by enabling --ignore-errors flag. (#98203, @yuzhiquan)
Delete deprecated service.beta.kubernetes.io/azure-load-balancer-mixed-protocols mixed procotol annotation in favor of the MixedProtocolLBService feature (#97096, @nilo19) [SIG Cloud Provider]
Deprecate the topologyKeys field in Service. This capability will be replaced with upcoming work around Topology Aware Subsetting and Service Internal Traffic Policy. (#96736, @andrewsykim) [SIG Apps]
Kube-proxy: remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
Kubeadm: deprecated command "alpha selfhosting pivot" is now removed. (#97627, @knight42)
Kubeadm: graduate the command kubeadm alpha kubeconfig user to kubeadm kubeconfig user. The kubeadm alpha kubeconfig user command is deprecated now. (#97583, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the "kubeadm alpha certs" command is removed now, please use "kubeadm certs" instead. (#97706, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the deprecated kube-dns is no longer supported as an option. If "ClusterConfiguration.dns.type" is set to "kube-dns" kubeadm will now throw an error. (#99646, @rajansandeep) [SIG Cluster Lifecycle]
Kubectl: The deprecated kubectl alpha debug command is removed. Use kubectl debug instead. (#98111, @pandaamanda) [SIG CLI]
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Remove deprecated --generator, --replicas, --service-generator, --service-overrides, --schedule from kubectl run
Deprecate --serviceaccount, --hostport, --requests, --limits in kubectl run (#99732, @soltysh)
Remove the deprecated metrics "scheduling_algorithm_preemption_evaluation_seconds" and "binding_duration_seconds", suggest to use "scheduler_framework_extension_point_duration_seconds" instead. (#96447, @chendave) [SIG Cluster Lifecycle, Instrumentation, Scheduling and Testing]
Removing experimental windows container hyper-v support with Docker (#97141, @wawa0210) [SIG Node and Windows]
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
The GA TokenRequest and TokenRequestProjection feature gates have been removed and are unconditionally enabled. Remove explicit use of those feature gates in CLI invocations. (#97148, @wawa0210) [SIG Node]
The PodSecurityPolicy API is deprecated in 1.21, and will no longer be served starting in 1.25. (#97171, @deads2k) [SIG Auth and CLI]
The batch/v2alpha1 CronJob type definitions and clients are deprecated and removed. (#96987, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
The export query parameter (inconsistently supported by API resources and deprecated in v1.14) is fully removed. Requests setting this query parameter will now receive a 400 status response. (#98312, @deads2k) [SIG API Machinery, Auth and Testing]
audit.k8s.io/v1beta1 and audit.k8s.io/v1alpha1 audit policy configuration and audit events are deprecated in favor of audit.k8s.io/v1, available since v1.13. kube-apiserver invocations that specify alpha or beta policy configurations with --audit-policy-file, or explicitly request alpha or beta audit events with --audit-log-version / --audit-webhook-version must update to use audit.k8s.io/v1 and accept audit.k8s.io/v1 events prior to v1.24. (#98858, @carlory) [SIG Auth]
discovery.k8s.io/v1beta1 EndpointSlices are deprecated in favor of discovery.k8s.io/v1, and will no longer be served in Kubernetes v1.25. (#100472, @liggitt)
diskformat storage class parameter for in-tree vSphere volume plugin is deprecated as of v1.21 release. Please consider updating storageclass and remove diskformat parameter. vSphere CSI Driver does not support diskformat storageclass parameter.
vSphere releases less than 67u3 are deprecated as of v1.21. Please consider upgrading vSphere to 67u3 or above. vSphere CSI Driver requires minimum vSphere 67u3.
VM Hardware version less than 15 is deprecated as of v1.21. Please consider upgrading the Node VM Hardware version to 15 or above. vSphere CSI Driver recommends Node VM's Hardware version set to at least vmx-15.
Multi vCenter support is deprecated as of v1.21. If you have a Kubernetes cluster spanning across multiple vCenter servers, please consider moving all k8s nodes to a single vCenter Server. vSphere CSI Driver does not support Kubernetes deployment spanning across multiple vCenter servers.
Support for these deprecations will be available till Kubernetes v1.24. (#98546, @divyenpatel)
API Change
PodAffinityTerm includes a namespaceSelector field to allow selecting eligible namespaces based on their labels.
A new CrossNamespacePodAffinity quota scope API that allows restricting which namespaces allowed to use PodAffinityTerm with corss-namespace reference via namespaceSelector or namespaces fields. (#98582, @ahg-g) [SIG API Machinery, Apps, Auth and Testing]
Add Probe-level terminationGracePeriodSeconds field (#99375, @ehashman) [SIG API Machinery, Apps, Node and Testing]
Added .spec.completionMode field to Job, with accepted values NonIndexed (default) and Indexed. This is an alpha field and is only honored by servers with the IndexedJob feature gate enabled. (#98441, @alculquicondor) [SIG Apps and CLI]
Adds support for endPort field in NetworkPolicy (#97058, @rikatz) [SIG Apps and Network]
CSIServiceAccountToken graduates to Beta and enabled by default. (#99298, @zshihang)
Cluster admins can now turn off /debug/pprof and /debug/flags/v endpoint in kubelet by setting enableProfilingHandler and enableDebugFlagsHandler to false in the Kubelet configuration file. Options enableProfilingHandler and enableDebugFlagsHandler can be set to true only when enableDebuggingHandlers is also set to true. (#98458, @SaranBalaji90)
DaemonSets accept a MaxSurge integer or percent on their rolling update strategy that will launch the updated pod on nodes and wait for those pods to go ready before marking the old out-of-date pods as deleted. This allows workloads to avoid downtime during upgrades when deployed using DaemonSets. This feature is alpha and is behind the DaemonSetUpdateSurge feature gate. (#96441, @smarterclayton) [SIG Apps and Testing]
Enable SPDY pings to keep connections alive, so that kubectl exec and kubectl portforward won't be interrupted. (#97083, @knight42) [SIG API Machinery and CLI]
FieldManager no longer owns fields that get reset before the object is persisted (e.g. "status wiping"). (#99661, @kevindelgado) [SIG API Machinery, Auth and Testing]
Generic ephemeral volumes are beta. (#99643, @pohly) [SIG API Machinery, Apps, Auth, CLI, Node, Storage and Testing]
Hugepages request values are limited to integer multiples of the page size. (#98515, @lala123912) [SIG Apps]
Implement the GetAvailableResources in the podresources API. (#95734, @fromanirh) [SIG Instrumentation, Node and Testing]
IngressClass resource can now reference a resource in a specific namespace
for implementation-specific configuration (previously only Cluster-level resources were allowed).
This feature can be enabled using the IngressClassNamespacedParams feature gate. (#99275, @hbagdi)
Jobs API has a new .spec.suspend field that can be used to suspend and resume Jobs. This is an alpha field which is only honored by servers with the SuspendJob feature gate enabled. (#98727, @adtac)
Kubelet Graceful Node Shutdown feature graduates to Beta and enabled by default. (#99735, @bobbypage)
Kubernetes is now built using go1.15.7 (#98363, @cpanato) [SIG Cloud Provider, Instrumentation, Node, Release and Testing]
Namespace API objects now have a kubernetes.io/metadata.name label matching their metadata.name field to allow selecting any namespace by its name using a label selector. (#96968, @jayunit100) [SIG API Machinery, Apps, Cloud Provider, Storage and Testing]
One new field "InternalTrafficPolicy" in Service is added.
It specifies if the cluster internal traffic should be routed to all endpoints or node-local endpoints only.
"Cluster" routes internal traffic to a Service to all endpoints.
"Local" routes traffic to node-local endpoints only, and traffic is dropped if no node-local endpoints are ready.
The default value is "Cluster". (#96600, @maplain) [SIG API Machinery, Apps and Network]
PodDisruptionBudget API objects can now contain conditions in status. (#98127, @mortent) [SIG API Machinery, Apps, Auth, CLI, Cloud Provider, Cluster Lifecycle and Instrumentation]
PodSecurityPolicy only stores "generic" as allowed volume type if the GenericEphemeralVolume feature gate is enabled (#98918, @pohly) [SIG Auth and Security]
Promote CronJobs to batch/v1 (#99423, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
Promote Immutable Secrets/ConfigMaps feature to Stable. This allows to set immutable field in Secret or ConfigMap object to mark their contents as immutable. (#97615, @wojtek-t) [SIG Apps, Architecture, Node and Testing]
Remove support for building Kubernetes with bazel. (#99561, @BenTheElder) [SIG API Machinery, Apps, Architecture, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network, Node, Release, Scalability, Scheduling, Storage, Testing and Windows]
Scheduler extender filter interface now can report unresolvable failed nodes in the new field FailedAndUnresolvableNodes of ExtenderFilterResult struct. Nodes in this map will be skipped in the preemption phase. (#92866, @cofyc) [SIG Scheduling]
Services can specify loadBalancerClass to use a custom load balancer (#98277, @XudongLiuHarold)
Storage capacity tracking (= the CSIStorageCapacity feature) graduates to Beta and enabled by default, storage.k8s.io/v1alpha1/VolumeAttachment and storage.k8s.io/v1alpha1/CSIStorageCapacity objects are deprecated (#99641, @pohly)
Support for Indexed Job: a Job that is considered completed when Pods associated to indexes from 0 to (.spec.completions-1) have succeeded. (#98812, @alculquicondor) [SIG Apps and CLI]
The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.
This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]
The EndpointSlice Controllers are now GA. The EndpointSliceController will not populate the deprecatedTopology field and will only provide topology information through the zone and nodeName fields. (#99870, @swetharepakula)
The Endpoints controller will now set the endpoints.kubernetes.io/over-capacity annotation to "warning" when an Endpoints resource contains more than 1000 addresses. In a future release, the controller will truncate Endpoints that exceed this limit. The EndpointSlice API can be used to support significantly larger number of addresses. (#99975, @robscott) [SIG Apps and Network]
The PodDisruptionBudget API has been promoted to policy/v1 with no schema changes. The only functional change is that an empty selector ({}) written to a policy/v1 PodDisruptionBudget now selects all pods in the namespace. The behavior of the policy/v1beta1 API remains unchanged. The policy/v1beta1 PodDisruptionBudget API is deprecated and will no longer be served in 1.25+. (#99290, @mortent) [SIG API Machinery, Apps, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Scheduling and Testing]
The EndpointSlice API is now GA. The EndpointSlice topology field has been removed from the GA API and will be replaced by a new per Endpoint Zone field. If the topology field was previously used, it will be converted into an annotation in the v1 Resource. The discovery.k8s.io/v1alpha1 API is removed. (#99662, @swetharepakula)
The controller.kubernetes.io/pod-deletion-cost annotation can be set to offer a hint on the cost of deleting a Pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion cost are deleted first. This is an alpha feature. (#99163, @ahg-g)
The kube-apiserver now resets managedFields that got corrupted by a mutating admission controller. (#98074, @kwiesmueller)
Topology Aware Hints are now available in alpha and can be enabled with the TopologyAwareHints feature gate. (#99522, @robscott) [SIG API Machinery, Apps, Auth, Instrumentation, Network and Testing]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#97099, @pacoxu) [SIG CLI]
Feature
A client-go metric, rest_client_exec_plugin_call_total, has been added to track total calls to client-go credential plugins. (#98892, @ankeesler) [SIG API Machinery, Auth, Cluster Lifecycle and Instrumentation]
A new histogram metric to track the time it took to delete a job by the TTLAfterFinished controller (#98676, @ahg-g)
AWS cloud provider supports auto-discovering subnets without any kubernetes.io/cluster/<clusterName> tags. It also supports additional service annotation service.beta.kubernetes.io/aws-load-balancer-subnets to manually configure the subnets. (#97431, @kishorj)
Aborting the drain command in a list of nodes will be deprecated. The new behavior will make the drain command go through all nodes even if one or more nodes failed during the drain. For now, users can try such experience by enabling --ignore-errors flag. (#98203, @yuzhiquan)
Add --permit-address-sharing flag to kube-apiserver to listen with SO_REUSEADDR. While allowing to listen on wildcard IPs like 0.0.0.0 and specific IPs in parallel, it avoids waiting for the kernel to release socket in TIME_WAIT state, and hence, considerably reducing kube-apiserver restart times under certain conditions. (#93861, @sttts)
Add csi_operations_seconds metric on kubelet that exposes CSI operations duration and status for node CSI operations. (#98979, @Jiawei0227) [SIG Instrumentation and Storage]
Add migrated field into storage_operation_duration_seconds metric (#99050, @Jiawei0227) [SIG Apps, Instrumentation and Storage]
Add flag --lease-reuse-duration-seconds for kube-apiserver to config etcd lease reuse duration. (#97009, @lingsamuel) [SIG API Machinery and Scalability]
Add metric etcd_lease_object_counts for kube-apiserver to observe max objects attached to a single etcd lease. (#97480, @lingsamuel) [SIG API Machinery, Instrumentation and Scalability]
Add support to generate client-side binaries for new darwin/arm64 platform (#97743, @dims) [SIG Release and Testing]
Added ephemeral_volume_controller_create[_failures]_total counters to kube-controller-manager metrics (#99115, @pohly) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Storage]
Added support for installing arm64 node artifacts. (#99242, @liu-cong)
Adds alpha feature VolumeCapacityPriority which makes the scheduler prioritize nodes based on the best matching size of statically provisioned PVs across multiple topologies. (#96347, @cofyc) [SIG Apps, Network, Scheduling, Storage and Testing]
Adds the ability to pass --strict-transport-security-directives to the kube-apiserver to set the HSTS header appropriately. Be sure you understand the consequences to browsers before setting this field. (#96502, @249043822) [SIG Auth]
Adds two new metrics to cronjobs, a histogram to track the time difference when a job is created and the expected time when it should be created, as well as a gauge for the missed schedules of a cronjob (#99341, @alaypatel07)
Alpha implementation of Kubectl Command Headers: SIG CLI KEP 859 enabled when KUBECTL_COMMAND_HEADERS environment variable set on the client command line. (#98952, @seans3)
Base-images: Update to debian-iptables:buster-v1.4.0
CRIContainerLogRotation graduates to GA and unconditionally enabled. (#99651, @umohnani8)
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99385, @YoyinZyc) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Release]
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99738, @YoyinZyc) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
EmptyDir memory backed volumes are sized as the the minimum of pod allocatable memory on a host and an optional explicit user provided value. (#100319, @derekwaynecarr) [SIG Node]
Enables Kubelet to check volume condition and log events to corresponding pods. (#99284, @fengzixu) [SIG Apps, Instrumentation, Node and Storage]
EndpointSliceNodeName graduates to GA and thus will be unconditionally enabled -- NodeName will always be available in the v1beta1 API. (#99746, @swetharepakula)
Export NewDebuggingRoundTripper function and DebugLevel options in the k8s.io/client-go/transport package. (#98324, @atosatto)
Kube-proxy iptables: new metric sync_proxy_rules_iptables_total that exposes the number of rules programmed per table in each iteration (#99653, @aojea) [SIG Instrumentation and Network]
Kube-scheduler now logs plugin scoring summaries at --v=4 (#99411, @damemi) [SIG Scheduling]
Kubeadm now includes CoreDNS v1.8.0. (#96429, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: IPv6DualStack feature gate graduates to Beta and enabled by default (#99294, @pacoxu)
Kubeadm: a warning to user as ipv6 site-local is deprecated (#99574, @pacoxu) [SIG Cluster Lifecycle and Network]
Kubeadm: add support for certificate chain validation. When using kubeadm in external CA mode, this allows an intermediate CA to be used to sign the certificates. The intermediate CA certificate must be appended to each signed certificate for this to work correctly. (#97266, @robbiemcmichael) [SIG Cluster Lifecycle]
Kubeadm: amend the node kernel validation to treat CGROUP_PIDS, FAIR_GROUP_SCHED as required and CFS_BANDWIDTH, CGROUP_HUGETLB as optional (#96378, @neolit123) [SIG Cluster Lifecycle and Node]
Kubeadm: apply the "node.kubernetes.io/exclude-from-external-load-balancers" label on control plane nodes during "init", "join" and "upgrade" to preserve backwards compatibility with the lagacy LB mode where nodes labeled as "master" where excluded. To opt-out you can remove the label from a node. See #97543 and the linked KEP for more details. (#98269, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: if the user has customized their image repository via the kubeadm configuration, pass the custom pause image repository and tag to the kubelet via --pod-infra-container-image not only for Docker but for all container runtimes. This flag tells the kubelet that it should not garbage collect the image. (#99476, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: perform pre-flight validation on host/node name upon kubeadm init and kubeadm join, showing warnings on non-compliant names (#99194, @pacoxu)
Kubectl version changed to write a warning message to stderr if the client and server version difference exceeds the supported version skew of +/-1 minor version. (#98250, @brianpursley) [SIG CLI]
Kubectl: Add --use-protocol-buffers flag to kubectl top pods and nodes. (#96655, @serathius)
Kubectl: kubectl get will omit managed fields by default now. Users could set --show-managed-fields to true to show managedFields when the output format is either json or yaml. (#96878, @knight42) [SIG CLI and Testing]
Kubectl: a Pod can be preselected as default container using kubectl.kubernetes.io/default-container annotation (#99833, @mengjiao-liu)
Kubectl: add bash-completion for comma separated list on kubectl get (#98301, @phil9909)
Kubernetes is now built using go1.15.8 (#98834, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
Kubernetes is now built with Golang 1.16 (#98572, @justaugustus) [SIG API Machinery, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Release and Testing]
Kubernetes is now built with Golang 1.16.1 (#100106, @justaugustus) [SIG Cloud Provider, Instrumentation, Release and Testing]
Metrics can now be disabled explicitly via a command line flag (i.e. '--disabled-metrics=metric1,metric2') (#99217, @logicalhan)
New admission controller DenyServiceExternalIPs is available. Clusters which do not need the Service externalIPs feature should enable this controller and be more secure. (#97395, @thockin)
Overall, enable the feature of PreferNominatedNode will improve the performance of scheduling where preemption might frequently happen, but in theory, enable the feature of PreferNominatedNode, the pod might not be scheduled to the best candidate node in the cluster. (#93179, @chendave) [SIG Scheduling and Testing]
Persistent Volumes formatted with the btrfs filesystem will now automatically resize when expanded. (#99361, @Novex) [SIG Storage]
Port the devicemanager to Windows node to allow device plugins like directx (#93285, @aarnaud) [SIG Node, Testing and Windows]
Removes cAdvisor JSON metrics (/stats/container, /stats//, /stats////) from the kubelet. (#99236, @pacoxu)
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
Sysctls graduates to General Availability and thus unconditionally enabled. (#99158, @wgahnagl)
The Kubernetes pause image manifest list now contains an image for Windows Server 20H2. (#97322, @claudiubelu) [SIG Windows]
The NodeAffinity plugin implements the PreFilter extension, offering enhanced performance for Filter. (#99213, @AliceZhang2016) [SIG Scheduling]
The CronJobControllerV2 feature flag graduates to Beta and set to be enabled by default. (#98878, @soltysh)
The EndpointSlice mirroring controller mirrors endpoints annotations and labels to the generated endpoint slices, it also ensures that updates on any of these fields are mirrored.
The well-known annotation endpoints.kubernetes.io/last-change-trigger-time is skipped and not mirrored. (#98116, @aojea)
The RunAsGroup feature has been promoted to GA in this release. (#94641, @krmayankk) [SIG Auth and Node]
The ServiceAccountIssuerDiscovery feature has graduated to GA, and is unconditionally enabled. The ServiceAccountIssuerDiscovery feature-gate will be removed in 1.22. (#98553, @mtaufen) [SIG API Machinery, Auth and Testing]
The TTLAfterFinished feature flag is now beta and enabled by default (#98678, @ahg-g)
The apimachinery util/net function used to detect the bind address ResolveBindAddress() takes into consideration global IP addresses on loopback interfaces when 1) the host has default routes, or 2) there are no global IPs on those interfaces in order to support more complex network scenarios like BGP Unnumbered RFC 5549 (#95790, @aojea) [SIG Network]
The feature gate RootCAConfigMap graduated to GA in v1.21 and therefore will be unconditionally enabled. This flag will be removed in v1.22 release. (#98033, @zshihang)
The pause image upgraded to v3.4.1 in kubelet and kubeadm for both Linux and Windows. (#98205, @pacoxu)
Update pause container to run as pseudo user and group 65535:65535. This implies the release of version 3.5 of the container images. (#97963, @saschagrunert) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Release, Security and Testing]
Update the latest validated version of Docker to 20.10 (#98977, @neolit123) [SIG CLI, Cluster Lifecycle and Node]
Upgrade node local dns to 1.17.0 for better IPv6 support (#99749, @pacoxu) [SIG Cloud Provider and Network]
Upgrades IPv6Dualstack to Beta and turns it on by default. New clusters or existing clusters are not be affected until an actor starts adding secondary Pods and service CIDRS CLI flags as described here: IPv4/IPv6 Dual-stack (#98969, @khenidak)
Users might specify the kubectl.kubernetes.io/default-container annotation in a Pod to preselect container for kubectl commands. (#99581, @mengjiao-liu) [SIG CLI]
When downscaling ReplicaSets, ready and creation timestamps are compared in a logarithmic scale. (#99212, @damemi) [SIG Apps and Testing]
When the kubelet is watching a ConfigMap or Secret purely in the context of setting environment variables
for containers, only hold that watch for a defined duration before cancelling it. This change reduces the CPU
and memory usage of the kube-apiserver in large clusters. (#99393, @chenyw1990) [SIG API Machinery, Node and Testing]
WindowsEndpointSliceProxying feature gate has graduated to beta and is enabled by default. This means kube-proxy will read from EndpointSlices instead of Endpoints on Windows by default. (#99794, @robscott) [SIG Network]
kubectl wait ensures that observedGeneration >= generation to prevent stale state reporting. An example scenario can be found on CRD updates. (#97408, @KnicKnic)
Documentation
Azure file migration graduates to beta, with CSIMigrationAzureFile flag off by default
as it requires installation of AzureFile CSI Driver. Users should enable CSIMigration and
CSIMigrationAzureFile features and install the AzureFile CSI Driver
to avoid disruption to existing Pod and PVC objects at that time. Azure File CSI driver does not support using same persistent
volume with different fsgroups. When CSI migration is enabled for azurefile driver, such case is not supported.
(there is a case we support where volume is mounted with 0777 and then it readable/writable by everyone) (#96293, @andyzhangx)
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Set kubelet option --volume-stats-agg-period to negative value to disable volume calculations. (#96675, @pacoxu) [SIG Node]
Failing Test
Escape the special characters like [, ] and that exist in vsphere windows path (#98830, @liyanhui1228) [SIG Storage and Windows]
Kube-proxy: fix a bug on UDP NodePort Services where stale connection tracking entries may blackhole the traffic directed to the NodePort (#98305, @aojea)
Kubelet: fixes a bug in the HostPort dockershim implementation that caused the conformance test "HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol" to fail. (#98755, @aojea) [SIG Cloud Provider, Network and Node]
Bug or Regression
AcceleratorStats will be available in the Summary API of kubelet when cri_stats_provider is used. (#96873, @ruiwen-zhao) [SIG Node]
All data is no longer automatically deleted when a failure is detected during creation of the volume data file on a CSI volume. Now only the data file and volume path is removed. (#96021, @huffmanca)
Clean ReplicaSet by revision instead of creation timestamp in deployment controller (#97407, @waynepeking348) [SIG Apps]
Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98133, @nilo19) [SIG Cloud Provider]
Client-go exec credential plugins will pass stdin only when interactive terminal is detected on stdin. This fixes a bug where previously it was checking if stdout is an interactive terminal. (#99654, @ankeesler)
Cloud-controller-manager: routes controller should not depend on --allocate-node-cidrs (#97029, @andrewsykim) [SIG Cloud Provider and Testing]
Cluster Autoscaler version bump to v1.20.0 (#97011, @towca)
Creating a PVC with DataSource should fail for non-CSI plugins. (#97086, @xing-yang) [SIG Apps and Storage]
EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99345, @robscott) [SIG Apps and Network]
EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#100103, @robscott) [SIG Apps and Network]
EndpointSliceMirroring controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99756, @robscott) [SIG Apps and Network]
Ensure all vSphere nodes are are tracked by volume attach-detach controller (#96689, @gnufied)
Ensure empty string annotations are copied over in rollbacks. (#94858, @waynepeking348)
Ensure only one LoadBalancer rule is created when HA mode is enabled (#99825, @feiskyer) [SIG Cloud Provider]
Ensure that client-go's EventBroadcaster is safe (non-racy) during shutdown. (#95664, @DirectXMan12) [SIG API Machinery]
Explicitly pass KUBE_BUILD_CONFORMANCE=y in package-tarballs to reenable building the conformance tarballs. (#100571, @puerco)
Fix Azure file migration e2e test failure when CSIMigration is turned on. (#97877, @andyzhangx)
Fix CSI-migrated inline EBS volumes failing to mount if their volumeID is prefixed by aws:// (#96821, @wongma7) [SIG Storage]
Fix CVE-2020-8555 for Gluster client connections. (#97922, @liggitt) [SIG Storage]
Fix NPE in ephemeral storage eviction (#98261, @wzshiming) [SIG Node]
Fix PermissionDenied issue on SMB mount for Windows (#99550, @andyzhangx)
Fix bug that would let the Horizontal Pod Autoscaler scale down despite at least one metric being unavailable/invalid (#99514, @mikkeloscar) [SIG Apps and Autoscaling]
Fix cgroup handling for systemd with cgroup v2 (#98365, @odinuge) [SIG Node]
Fix counting error in service/nodeport/loadbalancer quota check (#97451, @pacoxu) [SIG API Machinery, Network and Testing]
Fix errors when accessing Windows container stats for Dockershim (#98510, @jsturtevant) [SIG Node and Windows]
Fix kube-proxy container image architecture for non amd64 images. (#98526, @saschagrunert)
Fix nil VMSS name when setting service to auto mode (#97366, @nilo19) [SIG Cloud Provider]
Fix privileged config of Pod Sandbox which was previously ignored. (#96877, @xeniumlee)
Fix the panic when kubelet registers if a node object already exists with no Status.Capacity or Status.Allocatable (#95269, @SataQiu) [SIG Node]
Fix the regression with the slow pods termination. Before this fix pods may take an additional time to terminate - up to one minute. Reversing the change that ensured that CNI resources cleaned up when the pod is removed on API server. (#97980, @SergeyKanzhelev) [SIG Node]
Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
Fixed a bug in kubelet that will saturate CPU utilization after containerd got restarted. (#97174, @hanlins) [SIG Node]
Fixed a bug that causes smaller number of conntrack-max being used under CPU static policy. (#99225, @xh4n3) (#99613, @xh4n3) [SIG Network]
Fixed a bug that on k8s nodes, when the policy of INPUT chain in filter table is not ACCEPT, healthcheck nodeport would not work.
Added iptables rules to allow healthcheck nodeport traffic. (#97824, @hanlins) [SIG Network]
Fixed a bug that the kubelet cannot start on BtrfS. (#98042, @gjkim42) [SIG Node]
Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted. (#95783, @roycaihw) [SIG API Machinery]
Fixed an issue with garbage collection failing to clean up namespaced children of an object also referenced incorrectly by cluster-scoped children (#98068, @liggitt) [SIG API Machinery and Apps]
Fixed authentication_duration_seconds metric scope. Previously, it included whole apiserver request duration which yields inaccurate results. (#99944, @marseel)
Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
Fixed bug that caused cAdvisor to incorrectly detect single-socket multi-NUMA topology. (#99315, @iwankgb) [SIG Node]
Fixed cleanup of block devices when /var/lib/kubelet is a symlink. (#96889, @jsafrane) [SIG Storage]
Fixed no effect namespace when exposing deployment with --dry-run=client. (#97492, @masap) [SIG CLI]
Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
Fixes a bug of identifying the correct containerd process. (#97888, @pacoxu)
Fixes add-on manager leader election to use leases instead of endpoints, similar to what kube-controller-manager does in 1.20 (#98968, @liggitt)
Fixes connection errors when using --volume-host-cidr-denylist or --volume-host-allow-local-loopback (#98436, @liggitt) [SIG Network and Storage]
Fixes problem where invalid selector on PodDisruptionBudget leads to a nil pointer dereference that causes the Controller manager to crash loop. (#98750, @mortent)
Fixes spurious errors about IPv6 in kube-proxy logs on nodes with IPv6 disabled. (#99127, @danwinship)
Fixing a bug where a failed node may not have the NoExecute taint set correctly (#96876, @howieyuen) [SIG Apps and Node]
GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
Ignore update pod with no new images in alwaysPullImages admission controller (#96668, @pacoxu) [SIG Apps, Auth and Node]
Improve speed of vSphere PV provisioning and reduce number of API calls (#100054, @gnufied) [SIG Cloud Provider and Storage]
KUBECTL_EXTERNAL_DIFF now accepts equal sign for additional parameters. (#98158, @dougsland) [SIG CLI]
Kube-apiserver: an update of a pod with a generic ephemeral volume dropped that volume if the feature had been disabled since creating the pod with such a volume (#99446, @pohly) [SIG Apps, Node and Storage]
Kube-proxy: remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
Kubeadm installs etcd v3.4.13 when creating cluster v1.19 (#97244, @pacoxu)
Kubeadm: Fixes a kubeadm upgrade bug that could cause a custom CoreDNS configuration to be replaced with the default. (#97016, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: Some text in the kubeadm upgrade plan output has changed. If you have scripts or other automation that parses this output, please review these changes and update your scripts to account for the new output. (#98728, @stmcginnis) [SIG Cluster Lifecycle]
Kubeadm: fix a bug in the host memory detection code on 32bit Linux platforms (#97403, @abelbarrera15) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm join" would not properly handle missing names for existing etcd members. (#97372, @ihgann) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm upgrade" commands can fail if CoreDNS v1.8.0 is installed. (#97919, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where external credentials in an existing admin.conf prevented the CA certificate to be written in the cluster-info ConfigMap. (#98882, @kvaps) [SIG Cluster Lifecycle]
Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
Kubeadm: skip validating pod subnet against node-cidr-mask when allocate-node-cidrs is set to be false (#98984, @SataQiu) [SIG Cluster Lifecycle]
Kubectl logs: --ignore-errors is now honored by all containers, maintaining consistency with parallelConsumeRequest behavior. (#97686, @wzshiming)
Kubectl-convert: Fix no kind "Ingress" is registered for version error (#97754, @wzshiming)
Kubectl: Fixed panic when describing an ingress backend without an API Group (#100505, @lauchokyip) [SIG CLI]
Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
Kubelet.exe on Windows now checks that the process running as administrator and the executing user account is listed in the built-in administrators group. This is the equivalent to checking the process is running as uid 0. (#96616, @perithompson) [SIG Node and Windows]
Kubelet: Fix kubelet from panic after getting the wrong signal (#98200, @wzshiming) [SIG Node]
Kubelet: Fixed the bug of getting the number of cpu when the number of cpu logical processors is more than 64 in windows (#97378, @hwdef) [SIG Node and Windows]
Limits lease to have 1000 maximum attached objects. (#98257, @lingsamuel)
Mitigate CVE-2020-8555 for kube-up using GCE by preventing local loopback folume hosts. (#97934, @mattcary) [SIG Cloud Provider and Storage]
On single-stack configured (IPv4 or IPv6, but not both) clusters, Services which are both headless (no clusterIP) and selectorless (empty or undefined selector) will report ipFamilyPolicy RequireDualStack and will have entries in ipFamilies[] for both IPv4 and IPv6. This is a change from alpha, but does not have any impact on the manually-specified Endpoints and EndpointSlices for the Service. (#99555, @thockin) [SIG Apps and Network]
Performance regression #97685 has been fixed. (#97860, @MikeSpreitzer) [SIG API Machinery]
Pod Log stats for windows now reports metrics (#99221, @jsturtevant) [SIG Node, Storage, Testing and Windows]
Pod status updates faster when reacting on probe results. The first readiness probe will be called faster when startup probes succeeded, which will make Pod status as ready faster. (#98376, @matthyx)
Readjust kubelet_containers_per_pod_count buckets to only show metrics greater than 1. (#98169, @wawa0210)
Remove CSI topology from migrated in-tree gcepd volume. (#97823, @Jiawei0227) [SIG Cloud Provider and Storage]
Requests with invalid timeout parameters in the request URL now appear in the audit log correctly. (#96901, @tkashem) [SIG API Machinery and Testing]
Resolve a "concurrent map read and map write" crashing error in the kubelet (#95111, @choury) [SIG Node]
Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
ResourceQuota of an entity now inclusively calculate Pod overhead (#99600, @gjkim42)
Return zero time (midnight on Jan. 1, 1970) instead of negative number when reporting startedAt and finishedAt of the not started or a running Pod when using dockershim as a runtime. (#99585, @Iceber)
Reverts breaking change to inline AzureFile volumes; referenced secrets are now searched for in the same namespace as the pod as in previous releases. (#100563, @msau42)
Scores from InterPodAffinity have stronger differentiation. (#98096, @leileiwan) [SIG Scheduling]
Specifying the KUBE_TEST_REPO environment variable when e2e tests are executed will instruct the test infrastructure to load that image from a location within the specified repo, using a predefined pattern. (#93510, @smarterclayton) [SIG Testing]
Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
Sync node status during kubelet node shutdown.
Adds an pod admission handler that rejects new pods when the node is in progress of shutting down. (#98005, @wzshiming) [SIG Node]
The calculation of pod UIDs for static pods has changed to ensure each static pod gets a unique value - this will cause all static pod containers to be recreated/restarted if an in-place kubelet upgrade from 1.20 to 1.21 is performed. Note that draining pods before upgrading the kubelet across minor versions is the supported upgrade path. (#87461, @bboreham) [SIG Node]
The maximum number of ports allowed in EndpointSlices has been increased from 100 to 20,000 (#99795, @robscott) [SIG Network]
Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory)
Updated k8s.gcr.io/ingress-gce-404-server-with-metrics-amd64 to a version that serves /metrics endpoint on a non-default port. (#97621, @vbannai) [SIG Cloud Provider]
Updates the commands `
kubectl kustomize {arg}
kubectl apply -k {arg}
`to use same code as kustomize CLI v4.0.5 (#98946, @monopole)
Use force unmount for NFS volumes if regular mount fails after 1 minute timeout (#96844, @gnufied) [SIG Storage]
Use network.Interface.VirtualMachine.ID to get the binded VM
Skip standalone VM when reconciling LoadBalancer (#97635, @nilo19) [SIG Cloud Provider]
Using exec auth plugins with kubectl no longer results in warnings about constructing many client instances from the same exec auth config. (#97857, @liggitt) [SIG API Machinery and Auth]
When a CNI plugin returns dual-stack pod IPs, kubelet will now try to respect the
"primary IP family" of the cluster by picking a primary pod IP of the same family
as the (primary) node IP, rather than assuming that the CNI plugin returned the IPs
in the order the administrator wanted (since some CNI plugins don't allow
configuring this). (#97979, @danwinship) [SIG Network and Node]
When dynamically provisioning Azure File volumes for a premium account, the requested size will be set to 100GB if the request is initially lower than this value to accommodate Azure File requirements. (#99122, @huffmanca) [SIG Cloud Provider and Storage]
When using Containerd on Windows, the C:\Windows\System32\drivers\etc\hosts file will now be managed by kubelet. (#83730, @claudiubelu)
VolumeBindingArgs now allow BindTimeoutSeconds to be set as zero, while the value zero indicates no waiting for the checking of volume binding operation. (#99835, @chendave) [SIG Scheduling and Storage]
kubectl exec and kubectl attach now honor the --quiet flag which suppresses output from the local binary that could be confused by a script with the remote command output (all non-failure output is hidden). In addition, print inline with exec and attach the list of alternate containers when we default to the first spec.container. (#99004, @smarterclayton) [SIG CLI]
Other (Cleanup or Flake)
APIs for kubelet annotations and labels from k8s.io/kubernetes/pkg/kubelet/apis are now moved under k8s.io/kubelet/pkg/apis/ (#98931, @michaelbeaumont)
Apiserver_request_duration_seconds is promoted to stable status. (#99925, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
Bump github.com/Azure/go-autorest/autorest to v0.11.12 (#97033, @patrickshan) [SIG API Machinery, CLI, Cloud Provider and Cluster Lifecycle]
Clients required to use go1.15.8+ or go1.16+ if kube-apiserver has the goaway feature enabled to avoid unexpected data race condition. (#98809, @answer1991)
Delete deprecated service.beta.kubernetes.io/azure-load-balancer-mixed-protocols mixed procotol annotation in favor of the MixedProtocolLBService feature (#97096, @nilo19) [SIG Cloud Provider]
EndpointSlice generation is now incremented when labels change. (#99750, @robscott) [SIG Network]
Featuregate AllowInsecureBackendProxy graduates to GA and unconditionally enabled. (#99658, @deads2k)
Increase timeout for pod lifecycle test to reach pod status=ready (#96691, @hh)
Increased CSINodeIDMaxLength from 128 bytes to 192 bytes. (#98753, @Jiawei0227)
Kube-apiserver: The OIDC authenticator no longer waits 10 seconds before attempting to fetch the metadata required to verify tokens. (#97693, @enj) [SIG API Machinery and Auth]
Kube-proxy: Traffic from the cluster directed to ExternalIPs is always sent directly to the Service. (#96296, @aojea) [SIG Network and Testing]
Kubeadm: change the default image repository for CI images from 'gcr.io/kubernetes-ci-images' to 'gcr.io/k8s-staging-ci-images' (#97087, @SataQiu) [SIG Cluster Lifecycle]
Kubectl: The deprecated kubectl alpha debug command is removed. Use kubectl debug instead. (#98111, @pandaamanda) [SIG CLI]
Kubelet command line flags related to dockershim are now showing deprecation message as they will be removed along with dockershim in future release. (#98730, @dims)
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97618, @jherrera123) [SIG Release and Testing]
Process start time on Windows now uses current process information (#97491, @jsturtevant) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Windows]
Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]
The AttachVolumeLimit feature gate (GA since v1.17) has been removed and now unconditionally enabled. (#96539, @ialidzhikov)
The CSINodeInfo feature gate that is GA since v1.17 is unconditionally enabled, and can no longer be specified via the --feature-gates argument. (#96561, @ialidzhikov) [SIG Apps, Auth, Scheduling, Storage and Testing]
The apiserver_request_total metric is promoted to stable status and no longer has a content-type dimensions, so any alerts/charts which presume the existence of this will fail. This is however, unlikely to be the case since it was effectively an unbounded dimension in the first place. (#99788, @logicalhan)
The default delegating authorization options now allow unauthenticated access to healthz, readyz, and livez. A system:masters user connecting to an authz delegator will not perform an authz check. (#98325, @deads2k) [SIG API Machinery, Auth, Cloud Provider and Scheduling]
The deprecated feature gates CSIDriverRegistry, BlockVolume and CSIBlockVolume are now unconditionally enabled and can no longer be specified in component invocations. (#98021, @gavinfish) [SIG Storage]
The deprecated feature gates RotateKubeletClientCertificate, AttachVolumeLimit, VolumePVCDataSource and EvenPodsSpread are now unconditionally enabled and can no longer be specified in component invocations. (#97306, @gavinfish) [SIG Node, Scheduling and Storage]
The e2e suite can be instructed not to wait for pods in kube-system to be ready or for all nodes to be ready by passing --allowed-not-ready-nodes=-1 when invoking the e2e.test program. This allows callers to run subsets of the e2e suite in scenarios other than perfectly healthy clusters. (#98781, @smarterclayton) [SIG Testing]
The feature gates WindowsGMSA and WindowsRunAsUserName that are GA since v1.18 are now removed. (#96531, @ialidzhikov) [SIG Node and Windows]
The new -gce-zones flag on the e2e.test binary instructs tests that check for information about how the cluster interacts with the cloud to limit their queries to the provided zone list. If not specified, the current behavior of asking the cloud provider for all available zones in multi zone clusters is preserved. (#98787, @smarterclayton) [SIG API Machinery, Cluster Lifecycle and Testing]
Windows nodes on GCE will take longer to start due to dependencies installed at node creation time. (#98284, @pjh) [SIG Cloud Provider]
apiserver_storage_objects (a newer version of etcd_object_counts) is promoted and marked as stable. (#100082, @logicalhan)
Uncategorized
GCE L4 Loadbalancers now handle > 5 ports in service spec correctly. (#99595, @prameshj) [SIG Cloud Provider]
The DownwardAPIHugePages feature is beta. Users may use the feature if all workers in their cluster are min 1.20 version. The feature will be enabled by default in all installations in 1.22. (#99610, @derekwaynecarr) [SIG Node]
(No, really, you MUST read this before you upgrade)
Migrated pkg/kubelet/cm/cpuset/cpuset.go to structured logging. Exit code changed from 255 to 1. (#100007, @utsavoza) [SIG Instrumentation and Node]
Changes by Kind
API Change
Add Probe-level terminationGracePeriodSeconds field (#99375, @ehashman) [SIG API Machinery, Apps, Node and Testing]
CSIServiceAccountToken is Beta now (#99298, @zshihang) [SIG Auth, Storage and Testing]
Discovery.k8s.io/v1beta1 EndpointSlices are deprecated in favor of discovery.k8s.io/v1, and will no longer be served in Kubernetes v1.25. (#100472, @liggitt) [SIG Network]
FieldManager no longer owns fields that get reset before the object is persisted (e.g. "status wiping"). (#99661, @kevindelgado) [SIG API Machinery, Auth and Testing]
Generic ephemeral volumes are beta. (#99643, @pohly) [SIG API Machinery, Apps, Auth, CLI, Node, Storage and Testing]
Implement the GetAvailableResources in the podresources API. (#95734, @fromanirh) [SIG Instrumentation, Node and Testing]
The Endpoints controller will now set the endpoints.kubernetes.io/over-capacity annotation to "warning" when an Endpoints resource contains more than 1000 addresses. In a future release, the controller will truncate Endpoints that exceed this limit. The EndpointSlice API can be used to support significantly larger number of addresses. (#99975, @robscott) [SIG Apps and Network]
The PodDisruptionBudget API has been promoted to policy/v1 with no schema changes. The only functional change is that an empty selector ({}) written to a policy/v1 PodDisruptionBudget now selects all pods in the namespace. The behavior of the policy/v1beta1 API remains unchanged. The policy/v1beta1 PodDisruptionBudget API is deprecated and will no longer be served in 1.25+. (#99290, @mortent) [SIG API Machinery, Apps, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Scheduling and Testing]
Topology Aware Hints are now available in alpha and can be enabled with the TopologyAwareHints feature gate. (#99522, @robscott) [SIG API Machinery, Apps, Auth, Instrumentation, Network and Testing]
Feature
Add e2e test to validate performance metrics of volume lifecycle operations (#94334, @RaunakShah) [SIG Storage and Testing]
EmptyDir memory backed volumes are sized as the the minimum of pod allocatable memory on a host and an optional explicit user provided value. (#100319, @derekwaynecarr) [SIG Node]
Enables Kubelet to check volume condition and log events to corresponding pods. (#99284, @fengzixu) [SIG Apps, Instrumentation, Node and Storage]
Introduce a churn operator to scheduler perf testing framework. (#98900, @Huang-Wei) [SIG Scheduling and Testing]
Kubernetes is now built with Golang 1.16.1 (#100106, @justaugustus) [SIG Cloud Provider, Instrumentation, Release and Testing]
Migrated pkg/kubelet/cm/devicemanager to structured logging (#99976, @knabben) [SIG Instrumentation and Node]
Migrated pkg/kubelet/cm/memorymanager to structured logging (#99974, @knabben) [SIG Instrumentation and Node]
Migrated pkg/kubelet/cm/topologymanager to structure logging (#99969, @knabben) [SIG Instrumentation and Node]
Rename metrics etcd_object_counts to apiserver_storage_object_counts and mark it as stable. The original etcd_object_counts metrics name is marked as "Deprecated" and will be removed in the future. (#99785, @erain) [SIG API Machinery, Instrumentation and Testing]
Update pause container to run as pseudo user and group 65535:65535. This implies the release of version 3.5 of the container images. (#97963, @saschagrunert) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Release, Security and Testing]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#99833, @mengjiao-liu) [SIG CLI]
Bug or Regression
Add ability to skip OpenAPI handler installation to the GenericAPIServer (#100341, @kevindelgado) [SIG API Machinery]
Count pod overhead against an entity's ResourceQuota (#99600, @gjkim42) [SIG API Machinery and Node]
EndpointSlice controllers are less likely to create duplicate EndpointSlices. (#100103, @robscott) [SIG Apps and Network]
Ensure only one LoadBalancer rule is created when HA mode is enabled (#99825, @feiskyer) [SIG Cloud Provider]
Fixed a race condition on API server startup ensuring previously created webhook configurations are effective before the first write request is admitted. (#95783, @roycaihw) [SIG API Machinery]
Fixed authentication_duration_seconds metric. Previously it included whole apiserver request duration. (#99944, @marseel) [SIG API Machinery, Instrumentation and Scalability]
Fixes issue where inline AzueFile secrets could not be accessed from the pod's namespace. (#100563, @msau42) [SIG Storage]
Improve speed of vSphere PV provisioning and reduce number of API calls (#100054, @gnufied) [SIG Cloud Provider and Storage]
Kubectl: Fixed panic when describing an ingress backend without an API Group (#100505, @lauchokyip) [SIG CLI]
Kubectl: fix case of age column in describe node (#96963, @bl-ue) (#96963, @bl-ue) [SIG CLI]
Kubelet.exe on Windows now checks that the process running as administrator and the executing user account is listed in the built-in administrators group. This is the equivalent to checking the process is running as uid 0. (#96616, @perithompson) [SIG Node and Windows]
Kubelet: Fixed the bug of getting the number of cpu when the number of cpu logical processors is more than 64 in windows (#97378, @hwdef) [SIG Node and Windows]
Pass KUBE_BUILD_CONFORMANCE=y to the package-tarballs to reenable building the conformance tarballs. (#100571, @puerco) [SIG Release]
Pod Log stats for windows now reports metrics (#99221, @jsturtevant) [SIG Node, Storage, Testing and Windows]
Other (Cleanup or Flake)
A new storage E2E testsuite covers CSIStorageCapacity publishing if a driver opts into the test. (#100537, @pohly) [SIG Storage and Testing]
Convert cmd/kubelet/app/server.go to structured logging (#98334, @wawa0210) [SIG Node]
If kube-apiserver enabled goaway feature, clients required golang 1.15.8 or 1.16+ version to avoid un-expected data race issue. (#98809, @answer1991) [SIG API Machinery]
Increased CSINodeIDMaxLength from 128 bytes to 192 bytes. (#98753, @Jiawei0227) [SIG Apps and Storage]
Migrate pkg/kubelet/pluginmanager to structured logging (#99885, @qingwave) [SIG Node]
Migrate pkg/kubelet/preemption/preemption.go and pkg/kubelet/logs/container_log_manager.go to structured logging (#99848, @qingwave) [SIG Node]
Migrate pkg/kubelet/(volume,container) to structured logging (#98850, @yangjunmyfm192085) [SIG Node]
Migrate pkg/kubelet/kubelet_node_status.go to structured logging (#98154, @yangjunmyfm192085) [SIG Node and Release]
Migrate pkg/kubelet/lifecycle,oom to structured logging (#99479, @mengjiao-liu) [SIG Instrumentation and Node]
Migrate cmd/kubelet/+ pkg/kubelet/cadvisor/cadvisor_linux.go + pkg/kubelet/cri/remote/util/util_unix.go + pkg/kubelet/images/image_manager.go to structured logging (#99994, @AfrouzMashayekhi) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/container_manager_linux.go and pkg/kubelet/cm/container_manager_stub.go to structured logging (#100001, @shiyajuan123) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/cpumanage/{topology/togit pology.go, policy_none.go, cpu_assignment.go} to structured logging (#100163, @lala123912) [SIG Instrumentation and Node]
Migrate pkg/kubelet/cm/cpumanager/state to structured logging (#99563, @jmguzik) [SIG Instrumentation and Node]
Migrate pkg/kubelet/config to structured logging (#100002, @AfrouzMashayekhi) [SIG Instrumentation and Node]
Migrate pkg/kubelet/kubelet.go to structured logging (#99861, @navidshaikh) [SIG Instrumentation and Node]
Migrate pkg/kubelet/kubeletconfig to structured logging (#100265, @ehashman) [SIG Node]
Migrate pkg/kubelet/kuberuntime to structured logging (#99970, @krzysiekg) [SIG Instrumentation and Node]
Migrate pkg/kubelet/prober to structured logging (#99830, @krzysiekg) [SIG Instrumentation and Node]
Migrate pkg/kubelet/winstats to structured logging (#99855, @hexxdump) [SIG Instrumentation and Node]
Migrate probe log messages to structured logging (#97093, @aldudko) [SIG Instrumentation and Node]
Migrate remaining kubelet files to structured logging (#100196, @ehashman) [SIG Instrumentation and Node]
apiserver_storage_objects (a newer version of `etcd_object_counts) is promoted and marked as stable. (#100082, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
(No, really, you MUST read this before you upgrade)
Kubeadm: during "init" an empty cgroupDriver value in the KubeletConfiguration is now always set to "systemd" unless the user is explicit about it. This requires existing machine setups to configure the container runtime to use the "systemd" driver. Documentation on this topic can be found here: https://kubernetes.io/docs/setup/production-environment/container-runtimes/. When upgrading existing clusters / nodes using "kubeadm upgrade" the old cgroupDriver value is preserved, but in 1.22 this change will also apply to "upgrade". For more information on migrating to the "systemd" driver or remaining on the "cgroupfs" driver see: https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/configure-cgroup-driver/. (#99471, @neolit123) [SIG Cluster Lifecycle]
Migrate pkg/kubelet/(dockershim, network) to structured logging
Exit code changed from 255 to 1 (#98939, @yangjunmyfm192085) [SIG Network and Node]
Migrate pkg/kubelet/certificate to structured logging
Exit code changed from 255 to 1 (#98993, @SataQiu) [SIG Auth and Node]
Newly provisioned PVs by EBS plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99130, @ayberk) [SIG Cloud Provider, Storage and Testing]
Newly provisioned PVs by OpenStack Cinder plugin will no longer use the deprecated "failure-domain.beta.kubernetes.io/zone" and "failure-domain.beta.kubernetes.io/region" labels. It will use "topology.kubernetes.io/zone" and "topology.kubernetes.io/region" labels instead. (#99719, @jsafrane) [SIG Cloud Provider and Storage]
OpenStack Cinder CSI migration is on by default, Clinder CSI driver must be installed on clusters on OpenStack for Cinder volumes to work. (#98538, @dims) [SIG Storage]
Package pkg/kubelet/server migrated to structured logging
Exit code changed from 255 to 1 (#99838, @adisky) [SIG Node]
Pkg/kubelet/kuberuntime/kuberuntime_manager.go migrated to structured logging
Exit code changed from 255 to 1 (#99841, @adisky) [SIG Instrumentation and Node]
Changes by Kind
Deprecation
Kubeadm: the deprecated kube-dns is no longer supported as an option. If "ClusterConfiguration.dns.type" is set to "kube-dns" kubeadm will now throw an error. (#99646, @rajansandeep) [SIG Cluster Lifecycle]
Remove deprecated --generator --replicas --service-generator --service-overrides --schedule from kubectl run
Deprecate --serviceaccount --hostport --requests --limits in kubectl run (#99732, @soltysh) [SIG CLI and Testing]
audit.k8s.io/v1beta1 and audit.k8s.io/v1alpha1 audit policy configuration and audit events are deprecated in favor of audit.k8s.io/v1, available since v1.13. kube-apiserver invocations that specify alpha or beta policy configurations with --audit-policy-file, or explicitly request alpha or beta audit events with --audit-log-version / --audit-webhook-version must update to use audit.k8s.io/v1 and accept audit.k8s.io/v1 events prior to v1.24. (#98858, @carlory) [SIG Auth]
diskformat stroage class parameter for in-tree vSphere volume plugin is deprecated as of v1.21 release. Please consider updating storageclass and remove diskformat parameter. vSphere CSI Driver does not support diskformat storageclass parameter.
vSphere releases less than 67u3 are deprecated as of v1.21. Please consider upgrading vSphere to 67u3 or above. vSphere CSI Driver requires minimum vSphere 67u3.
VM Hardware version less than 15 is deprecated as of v1.21. Please consider upgrading the Node VM Hardware version to 15 or above. vSphere CSI Driver recommends Node VM's Hardware version set to at least vmx-15.
Multi vCenter support is deprecated as of v1.21. If you have a Kubernetes cluster spanning across multiple vCenter servers, please consider moving all k8s nodes to a single vCenter Server. vSphere CSI Driver does not support Kubernetes deployment spanning across multiple vCenter servers.
Support for these deprecations will be available till Kubernetes v1.24. (#98546, @divyenpatel) [SIG Cloud Provider and Storage]
API Change
PodAffinityTerm includes a namespaceSelector field to allow selecting eligible namespaces based on their labels.
A new CrossNamespacePodAffinity quota scope API that allows restricting which namespaces allowed to use PodAffinityTerm with corss-namespace reference via namespaceSelector or namespaces fields. (#98582, @ahg-g) [SIG API Machinery, Apps, Auth and Testing]
Add a default metadata name labels for selecting any namespace by its name. (#96968, @jayunit100) [SIG API Machinery, Apps, Cloud Provider, Storage and Testing]
Added .spec.completionMode field to Job, with accepted values NonIndexed (default) and Indexed (#98441, @alculquicondor) [SIG Apps and CLI]
DaemonSets accept a MaxSurge integer or percent on their rolling update strategy that will launch the updated pod on nodes and wait for those pods to go ready before marking the old out-of-date pods as deleted. This allows workloads to avoid downtime during upgrades when deployed using DaemonSets. This feature is alpha and is behind the DaemonSetUpdateSurge feature gate. (#96441, @smarterclayton) [SIG Apps and Testing]
EndpointSlice API is now GA. The EndpointSlice topology field has been removed from the GA API and will be replaced by a new per Endpoint Zone field. If the topology field was previously used, it will be converted into an annotation in the v1 Resource. The discovery.k8s.io/v1alpha1 API is removed. (#99662, @swetharepakula) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network and Testing]
EndpointSlice Controllers are now GA. The EndpointSlice Controller will not populate the deprecatedTopology field and will only provide topology information through the zone and nodeName fields. (#99870, @swetharepakula) [SIG API Machinery, Apps, Auth, Network and Testing]
IngressClass resource can now reference a resource in a specific namespace
for implementation-specific configuration(previously only Cluster-level resources were allowed).
This feature can be enabled using the IngressClassNamespacedParams feature gate. (#99275, @hbagdi) [SIG API Machinery, CLI and Network]
Introduce conditions for PodDisruptionBudget (#98127, @mortent) [SIG API Machinery, Apps, Auth, CLI, Cloud Provider, Cluster Lifecycle and Instrumentation]
Jobs API has a new .spec.suspend field that can be used to suspend and resume Jobs (#98727, @adtac) [SIG API Machinery, Apps, Node, Scheduling and Testing]
Kubelet Graceful Node Shutdown feature is now beta. (#99735, @bobbypage) [SIG Node]
Limit the quest value of hugepage to integer multiple of page size. (#98515, @lala123912) [SIG Apps]
One new field "InternalTrafficPolicy" in Service is added.
It specifies if the cluster internal traffic should be routed to all endpoints or node-local endpoints only.
"Cluster" routes internal traffic to a Service to all endpoints.
"Local" routes traffic to node-local endpoints only, and traffic is dropped if no node-local endpoints are ready.
The default value is "Cluster". (#96600, @maplain) [SIG API Machinery, Apps and Network]
PodSecurityPolicy only stores "generic" as allowed volume type if the GenericEphemeralVolume feature gate is enabled (#98918, @pohly) [SIG Auth and Security]
Promote CronJobs to batch/v1 (#99423, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
Remove support for building Kubernetes with bazel. (#99561, @BenTheElder) [SIG API Machinery, Apps, Architecture, Auth, Autoscaling, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Network, Node, Release, Scalability, Scheduling, Storage, Testing and Windows]
Setting loadBalancerClass in load balancer type of service is available with this PR.
Users who want to use a custom load balancer can specify loadBalancerClass to achieve it. (#98277, @XudongLiuHarold) [SIG API Machinery, Apps, Cloud Provider and Network]
Storage capacity tracking (= the CSIStorageCapacity feature) is beta, storage.k8s.io/v1alpha1/VolumeAttachment and storage.k8s.io/v1alpha1/CSIStorageCapacity objects are deprecated (#99641, @pohly) [SIG API Machinery, Apps, Auth, Scheduling, Storage and Testing]
Support for Indexed Job: a Job that is considered completed when Pods associated to indexes from 0 to (.spec.completions-1) have succeeded. (#98812, @alculquicondor) [SIG Apps and CLI]
The apiserver now resets managedFields that got corrupted by a mutating admission controller. (#98074, @kwiesmueller) [SIG API Machinery and Testing]
controller.kubernetes.io/pod-deletion-cost annotation can be set to offer a hint on the cost of deleting a pod compared to other pods belonging to the same ReplicaSet. Pods with lower deletion cost are deleted first. This is an alpha feature. (#99163, @ahg-g) [SIG Apps]
Feature
A client-go metric, rest_client_exec_plugin_call_total, has been added to track total calls to client-go credential plugins. (#98892, @ankeesler) [SIG API Machinery, Auth, Cluster Lifecycle and Instrumentation]
Add --use-protocol-buffers flag to kubectl top pods and nodes (#96655, @serathius) [SIG CLI]
Add support to generate client-side binaries for new darwin/arm64 platform (#97743, @dims) [SIG Release and Testing]
Added ephemeral_volume_controller_create[_failures]_total counters to kube-controller-manager metrics (#99115, @pohly) [SIG API Machinery, Apps, Cluster Lifecycle, Instrumentation and Storage]
Adds alpha feature VolumeCapacityPriority which makes the scheduler prioritize nodes based on the best matching size of statically provisioned PVs across multiple topologies. (#96347, @cofyc) [SIG Apps, Network, Scheduling, Storage and Testing]
Adds two new metrics to cronjobs, a histogram to track the time difference when a job is created and the expected time when it should be created, and a gauge for the missed schedules of a cronjob (#99341, @alaypatel07) [SIG Apps and Instrumentation]
Alpha implementation of Kubectl Command Headers: SIG CLI KEP 859 enabled when KUBECTL_COMMAND_HEADERS environment variable set on the client command line.
To enable: export KUBECTL_COMMAND_HEADERS=1; kubectl ... (#98952, @seans3) [SIG API Machinery and CLI]
Component owner can configure the allowlist of metric label with flag '--allow-metric-labels'. (#99738, @YoyinZyc) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
Disruption controller only sends one event per PodDisruptionBudget if scale can't be computed (#98128, @mortent) [SIG Apps]
EndpointSliceNodeName will always be enabled, so NodeName will always be available in the v1beta1 API. (#99746, @swetharepakula) [SIG Apps and Network]
Graduate CRIContainerLogRotation feature gate to GA. (#99651, @umohnani8) [SIG Node and Testing]
Kube-proxy iptables: new metric sync_proxy_rules_iptables_total that exposes the number of rules programmed per table in each iteration (#99653, @aojea) [SIG Instrumentation and Network]
Kube-scheduler now logs plugin scoring summaries at --v=4 (#99411, @damemi) [SIG Scheduling]
Kubeadm: a warning to user as ipv6 site-local is deprecated (#99574, @pacoxu) [SIG Cluster Lifecycle and Network]
Kubeadm: apply the "node.kubernetes.io/exclude-from-external-load-balancers" label on control plane nodes during "init", "join" and "upgrade" to preserve backwards compatibility with the lagacy LB mode where nodes labeled as "master" where excluded. To opt-out you can remove the label from a node. See #97543 and the linked KEP for more details. (#98269, @neolit123) [SIG Cluster Lifecycle]
Kubeadm: if the user has customized their image repository via the kubeadm configuration, pass the custom pause image repository and tag to the kubelet via --pod-infra-container-image not only for Docker but for all container runtimes. This flag tells the kubelet that it should not garbage collect the image. (#99476, @neolit123) [SIG Cluster Lifecycle]
Kubectl version changed to write a warning message to stderr if the client and server version difference exceeds the supported version skew of +/-1 minor version. (#98250, @brianpursley) [SIG CLI]
Kubernetes is now built with Golang 1.16 (#98572, @justaugustus) [SIG API Machinery, Auth, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation, Node, Release and Testing]
Persistent Volumes formatted with the btrfs filesystem will now automatically resize when expanded. (#99361, @Novex) [SIG Storage]
Remove cAdvisor json metrics api collected by Kubelet (#99236, @pacoxu) [SIG Node]
Sysctls is now GA and locked to default (#99158, @wgahnagl) [SIG Node]
The NodeAffinity plugin implements the PreFilter extension, offering enhanced performance for Filter. (#99213, @AliceZhang2016) [SIG Scheduling]
The endpointslice mirroring controller mirrors endpoints annotations and labels to the generated endpoint slices, it also ensures that updates on any of these fields are mirrored.
The well-known annotation endpoints.kubernetes.io/last-change-trigger-time is skipped and not mirrored. (#98116, @aojea) [SIG Apps, Network and Testing]
Update the latest validated version of Docker to 20.10 (#98977, @neolit123) [SIG CLI, Cluster Lifecycle and Node]
Upgrade node local dns to 1.17.0 for better IPv6 support (#99749, @pacoxu) [SIG Cloud Provider and Network]
Users might specify the kubectl.kubernetes.io/default-exec-container annotation in a Pod to preselect container for kubectl commands. (#99581, @mengjiao-liu) [SIG CLI]
When downscaling ReplicaSets, ready and creation timestamps are compared in a logarithmic scale. (#99212, @damemi) [SIG Apps and Testing]
When the kubelet is watching a ConfigMap or Secret purely in the context of setting environment variables
for containers, only hold that watch for a defined duration before cancelling it. This change reduces the CPU
and memory usage of the kube-apiserver in large clusters. (#99393, @chenyw1990) [SIG API Machinery, Node and Testing]
WindowsEndpointSliceProxying feature gate has graduated to beta and is enabled by default. This means kube-proxy will read from EndpointSlices instead of Endpoints on Windows by default. (#99794, @robscott) [SIG Network]
Bug or Regression
Creating a PVC with DataSource should fail for non-CSI plugins. (#97086, @xing-yang) [SIG Apps and Storage]
EndpointSlice controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99345, @robscott) [SIG Apps and Network]
EndpointSliceMirroring controller is now less likely to emit FailedToUpdateEndpointSlices events. (#99756, @robscott) [SIG Apps and Network]
Fix --ignore-errors does not take effect if multiple logs are printed and unfollowed (#97686, @wzshiming) [SIG CLI]
Fix bug that would let the Horizontal Pod Autoscaler scale down despite at least one metric being unavailable/invalid (#99514, @mikkeloscar) [SIG Apps and Autoscaling]
Fix cgroup handling for systemd with cgroup v2 (#98365, @odinuge) [SIG Node]
Fix smb mount PermissionDenied issue on Windows (#99550, @andyzhangx) [SIG Cloud Provider, Storage and Windows]
Fixed a bug that causes smaller number of conntrack-max being used under CPU static policy. (#99225, @xh4n3) (#99613, @xh4n3) [SIG Network]
Fixed bug that caused cAdvisor to incorrectly detect single-socket multi-NUMA topology. (#99315, @iwankgb) [SIG Node]
Improved update time of pod statuses following new probe results. (#98376, @matthyx) [SIG Node and Testing]
Kube-apiserver: an update of a pod with a generic ephemeral volume dropped that volume if the feature had been disabled since creating the pod with such a volume (#99446, @pohly) [SIG Apps, Node and Storage]
Kubeadm: skip validating pod subnet against node-cidr-mask when allocate-node-cidrs is set to be false (#98984, @SataQiu) [SIG Cluster Lifecycle]
On single-stack configured (IPv4 or IPv6, but not both) clusters, Services which are both headless (no clusterIP) and selectorless (empty or undefined selector) will report ipFamilyPolicy RequireDualStack and will have entries in ipFamilies[] for both IPv4 and IPv6. This is a change from alpha, but does not have any impact on the manually-specified Endpoints and EndpointSlices for the Service. (#99555, @thockin) [SIG Apps and Network]
Resolves spurious Failed to list *v1.Secret or Failed to list *v1.ConfigMap messages in kubelet logs. (#99538, @liggitt) [SIG Auth and Node]
Return zero time (midnight on Jan. 1, 1970) instead of negative number when reporting startedAt and finishedAt of the not started or a running Pod when using dockershim as a runtime. (#99585, @Iceber) [SIG Node]
Stdin is now only passed to client-go exec credential plugins when it is detected to be an interactive terminal. Previously, it was passed to client-go exec plugins when *stdout- was detected to be an interactive terminal. (#99654, @ankeesler) [SIG API Machinery and Auth]
The maximum number of ports allowed in EndpointSlices has been increased from 100 to 20,000 (#99795, @robscott) [SIG Network]
Updates the commands
kubectl kustomize {arg}
kubectl apply -k {arg}
to use same code as kustomize CLI v4.0.5
When a CNI plugin returns dual-stack pod IPs, kubelet will now try to respect the
"primary IP family" of the cluster by picking a primary pod IP of the same family
as the (primary) node IP, rather than assuming that the CNI plugin returned the IPs
in the order the administrator wanted (since some CNI plugins don't allow
configuring this). (#97979, @danwinship) [SIG Network and Node]
When using Containerd on Windows, the "C:\Windows\System32\drivers\etc\hosts" file will now be managed by kubelet. (#83730, @claudiubelu) [SIG Node and Windows]
VolumeBindingArgs now allow BindTimeoutSeconds to be set as zero, while the value zero indicates no waiting for the checking of volume binding operation. (#99835, @chendave) [SIG Scheduling and Storage]
kubectl exec and kubectl attach now honor the --quiet flag which suppresses output from the local binary that could be confused by a script with the remote command output (all non-failure output is hidden). In addition, print inline with exec and attach the list of alternate containers when we default to the first spec.container. (#99004, @smarterclayton) [SIG CLI]
Other (Cleanup or Flake)
Apiserver_request_duration_seconds is promoted to stable status. (#99925, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
Apiserver_request_total is promoted to stable status and no longer has a content-type dimensions, so any alerts/charts which presume the existence of this will fail. This is however, unlikely to be the case since it was effectively an unbounded dimension in the first place. (#99788, @logicalhan) [SIG API Machinery, Instrumentation and Testing]
EndpointSlice generation is now incremented when labels change. (#99750, @robscott) [SIG Network]
Featuregate AllowInsecureBackendProxy is promoted to GA (#99658, @deads2k) [SIG API Machinery]
Migrate pkg/kubelet/cloudresource to structured logging (#98999, @sladyn98) [SIG Node]
Migrate pkg/kubelet/cri/remote logs to structured logging (#98589, @chenyw1990) [SIG Node]
Migrate pkg/kubelet/kuberuntime/kuberuntime_container.go logs to structured logging (#96973, @chenyw1990) [SIG Instrumentation and Node]
Migrate pkg/kubelet/status to structured logging (#99836, @navidshaikh) [SIG Instrumentation and Node]
Migrate pkg/kubelet/token to structured logging (#99264, @palnabarun) [SIG Auth, Instrumentation and Node]
Migrate pkg/kubelet/util to structured logging (#99823, @navidshaikh) [SIG Instrumentation and Node]
Migrate proxy/userspace/proxier.go logs to structured logging (#97837, @JornShen) [SIG Network]
Migrate some kubelet/metrics log messages to structured logging (#98627, @jialaijun) [SIG Instrumentation and Node]
Process start time on Windows now uses current process information (#97491, @jsturtevant) [SIG API Machinery, CLI, Cloud Provider, Cluster Lifecycle, Instrumentation and Windows]
Uncategorized
Migrate pkg/kubelet/stats to structured logging (#99607, @krzysiekg) [SIG Node]
The DownwardAPIHugePages feature is beta. Users may use the feature if all workers in their cluster are min 1.20 version. The feature will be enabled by default in all installations in 1.22. (#99610, @derekwaynecarr) [SIG Node]
(No, really, you MUST read this before you upgrade)
The metric storage_operation_errors_total is not removed, but is marked deprecated, and the metric storage_operation_status_count is marked deprecated. In both cases the storage_operation_duration_seconds metric can be used to recover equivalent counts (using status=fail-unknown in the case of storage_operations_errors_total). (#99045, @mattcary) [SIG Instrumentation and Storage]
Changes by Kind
Deprecation
The batch/v2alpha1 CronJob type definitions and clients are deprecated and removed. (#96987, @soltysh) [SIG API Machinery, Apps, CLI and Testing]
API Change
Cluster admins can now turn off /debug/pprof and /debug/flags/v endpoint in kubelet by setting enableProfilingHandler and enableDebugFlagsHandler to false in their kubelet configuration file. enableProfilingHandler and enableDebugFlagsHandler can be set to true only when enableDebuggingHandlers is also set to true. (#98458, @SaranBalaji90) [SIG Node]
The BoundServiceAccountTokenVolume feature has been promoted to beta, and enabled by default.
This changes the tokens provided to containers at /var/run/secrets/kubernetes.io/serviceaccount/token to be time-limited, auto-refreshed, and invalidated when the containing pod is deleted.
Clients should reload the token from disk periodically (once per minute is recommended) to ensure they continue to use a valid token. k8s.io/client-go version v11.0.0+ and v0.15.0+ reload tokens automatically.
By default, injected tokens are given an extended lifetime so they remain valid even after a new refreshed token is provided. The metric serviceaccount_stale_tokens_total can be used to monitor for workloads that are depending on the extended lifetime and are continuing to use tokens even after a refreshed token is provided to the container. If that metric indicates no existing workloads are depending on extended lifetimes, injected token lifetime can be shortened to 1 hour by starting kube-apiserver with --service-account-extend-token-expiration=false. (#95667, @zshihang) [SIG API Machinery, Auth, Cluster Lifecycle and Testing]
Feature
A new histogram metric to track the time it took to delete a job by the ttl-after-finished controller (#98676, @ahg-g) [SIG Apps and Instrumentation]
AWS cloudprovider supports auto-discovering subnets without any kubernetes.io/cluster/ tags. It also supports additional service annotation service.beta.kubernetes.io/aws-load-balancer-subnets to manually configure the subnets. (#97431, @kishorj) [SIG Cloud Provider]
Add --permit-address-sharing flag to kube-apiserver to listen with SO_REUSEADDR. While allowing to listen on wildcard IPs like 0.0.0.0 and specific IPs in parallel, it avoid waiting for the kernel to release socket in TIME_WAIT state, and hence, considably reducing kube-apiserver restart times under certain conditions. (#93861, @sttts) [SIG API Machinery]
Add csi_operations_seconds metric on kubelet that exposes CSI operations duration and status for node CSI operations. (#98979, @Jiawei0227) [SIG Instrumentation and Storage]
Add migrated field into storage_operation_duration_seconds metric (#99050, @Jiawei0227) [SIG Apps, Instrumentation and Storage]
Add bash-completion for comma separated list on kubectl get (#98301, @phil9909) [SIG CLI]
Added support for installing arm64 node artifacts. (#99242, @liu-cong) [SIG Cloud Provider]
Feature gate RootCAConfigMap is graduated to GA in 1.21 and will be removed in 1.22. (#98033, @zshihang) [SIG API Machinery and Auth]
Kubeadm: during "init" and "join" perform preflight validation on the host / node name and throw warnings if a name is not compliant (#99194, @pacoxu) [SIG Cluster Lifecycle]
Kubectl: kubectl get will omit managed fields by default now. Users could set --show-managed-fields to true to show managedFields when the output format is either json or yaml. (#96878, @knight42) [SIG CLI and Testing]
Metrics can now be disabled explicitly via a command line flag (i.e. '--disabled-metrics=bad_metric1,bad_metric2') (#99217, @logicalhan) [SIG API Machinery, Cluster Lifecycle and Instrumentation]
TTLAfterFinished is now beta and enabled by default (#98678, @ahg-g) [SIG Apps and Auth]
The RunAsGroup feature has been promoted to GA in this release. (#94641, @krmayankk) [SIG Auth and Node]
Turn CronJobControllerV2 on by default. (#98878, @soltysh) [SIG Apps]
UDP protocol support for Agnhost connect subcommand (#98639, @knabben) [SIG Testing]
Fix ALPHA stability level reference link (#98641, @Jeffwan) [SIG Auth, Cloud Provider, Instrumentation and Storage]
Failing Test
Escape the special characters like [, ] and that exist in vsphere windows path (#98830, @liyanhui1228) [SIG Storage and Windows]
Kube-proxy: fix a bug on UDP NodePort Services where stale conntrack entries may blackhole the traffic directed to the NodePort. (#98305, @aojea) [SIG Network]
Bug or Regression
Add missing --kube-api-content-type in kubemark hollow template (#98911, @Jeffwan) [SIG Scalability and Testing]
Avoid duplicate error messages when running kubectl edit quota (#98201, @pacoxu) [SIG API Machinery and Apps]
Cleanup subnet in frontend IP configs to prevent huge subnet request bodies in some scenarios. (#98133, @nilo19) [SIG Cloud Provider]
Fix errors when accessing Windows container stats for Dockershim (#98510, @jsturtevant) [SIG Node and Windows]
Fixes spurious errors about IPv6 in kube-proxy logs on nodes with IPv6 disabled. (#99127, @danwinship) [SIG Network and Node]
In the method that ensures that the docker and containerd are in the correct containers with the proper OOM score set up, fixed the bug of identifying containerd process. (#97888, @pacoxu) [SIG Node]
Kubelet now cleans up orphaned volume directories automatically (#95301, @lorenz) [SIG Node and Storage]
When dynamically provisioning Azure File volumes for a premium account, the requested size will be set to 100GB if the request is initially lower than this value to accommodate Azure File requirements. (#99122, @huffmanca) [SIG Cloud Provider and Storage]
Other (Cleanup or Flake)
APIs for kubelet annotations and labels from k8s.io/kubernetes/pkg/kubelet/apis are now available under k8s.io/kubelet/pkg/apis/ (#98931, @michaelbeaumont) [SIG Apps, Auth and Node]
Migrate pkg/kubelet/(pod, pleg) to structured logging (#98990, @gjkim42) [SIG Instrumentation and Node]
Migrate pkg/kubelet/nodestatus to structured logging (#99001, @QiWang19) [SIG Node]
Migrate pkg/kubelet/server logs to structured logging (#98643, @chenyw1990) [SIG Node]
Migrate proxy/winkernel/proxier.go logs to structured logging (#98001, @JornShen) [SIG Network and Windows]
Migrate scheduling_queue.go to structured logging (#98358, @tanjing2020) [SIG Scheduling]
Several flags related to the deprecated dockershim which are present in the kubelet command line are now deprecated. (#98730, @dims) [SIG Node]
The deprecated feature gates CSIDriverRegistry, BlockVolume and CSIBlockVolume are now unconditionally enabled and can no longer be specified in component invocations. (#98021, @gavinfish) [SIG Storage]
(No, really, you MUST read this before you upgrade)
Newly provisioned PVs by gce-pd will no longer have the beta FailureDomain label. gce-pd volume plugin will start to have GA topology label instead. (#98700, @Jiawei0227) [SIG Cloud Provider, Storage and Testing]
Remove alpha CSIMigrationXXComplete flag and add alpha InTreePluginXXUnregister flag. Deprecate CSIMigrationvSphereComplete flag and it will be removed in 1.22. (#98243, @Jiawei0227) [SIG Node and Storage]
Changes by Kind
API Change
Adds support for portRange / EndPort in Network Policy (#97058, @rikatz) [SIG Apps and Network]
Fixes using server-side apply with APIService resources (#98576, @kevindelgado) [SIG API Machinery, Apps and Testing]
Kubernetes is now built using go1.15.7 (#98363, @cpanato) [SIG Cloud Provider, Instrumentation, Node, Release and Testing]
Scheduler extender filter interface now can report unresolvable failed nodes in the new field FailedAndUnresolvableNodes of ExtenderFilterResult struct. Nodes in this map will be skipped in the preemption phase. (#92866, @cofyc) [SIG Scheduling]
Feature
A lease can only attach up to 10k objects. (#98257, @lingsamuel) [SIG API Machinery]
Add ignore-errors flag for drain, support none-break drain in group (#98203, @yuzhiquan) [SIG CLI]
Base-images: Update to debian-iptables:buster-v1.4.0
Export NewDebuggingRoundTripper function and DebugLevel options in the k8s.io/client-go/transport package. (#98324, @atosatto) [SIG API Machinery]
Kubectl wait ensures that observedGeneration >= generation if applicable (#97408, @KnicKnic) [SIG CLI]
Kubernetes is now built using go1.15.8 (#98834, @cpanato) [SIG Cloud Provider, Instrumentation, Release and Testing]
New admission controller "denyserviceexternalips" is available. Clusters which do not *need- the Service "externalIPs" feature should enable this controller and be more secure. (#97395, @thockin) [SIG API Machinery]
Overall, enable the feature of PreferNominatedNode will improve the performance of scheduling where preemption might frequently happen, but in theory, enable the feature of PreferNominatedNode, the pod might not be scheduled to the best candidate node in the cluster. (#93179, @chendave) [SIG Scheduling and Testing]
Pause image upgraded to 3.4.1 in kubelet and kubeadm for both Linux and Windows. (#98205, @pacoxu) [SIG CLI, Cloud Provider, Cluster Lifecycle, Node, Testing and Windows]
The ServiceAccountIssuerDiscovery feature has graduated to GA, and is unconditionally enabled. The ServiceAccountIssuerDiscovery feature-gate will be removed in 1.22. (#98553, @mtaufen) [SIG API Machinery, Auth and Testing]
Documentation
Feat: azure file migration go beta in 1.21. Feature gates CSIMigration to Beta (on by default) and CSIMigrationAzureFile to Beta (off by default since it requires installation of the AzureFile CSI Driver)
The in-tree AzureFile plugin "kubernetes.io/azure-file" is now deprecated and will be removed in 1.23. Users should enable CSIMigration + CSIMigrationAzureFile features and install the AzureFile CSI Driver (https://github.com/kubernetes-sigs/azurefile-csi-driver) to avoid disruption to existing Pod and PVC objects at that time.
Users should start using the AzureFile CSI Driver directly for any new volumes. (#96293, @andyzhangx) [SIG Cloud Provider]
Failing Test
Kubelet: the HostPort implementation in dockershim was not taking into consideration the HostIP field, causing that the same HostPort can not be used with different IP addresses.
This bug causes the conformance test "HostPort validates that there is no conflict between pods with same hostPort but different hostIP and protocol" to fail. (#98755, @aojea) [SIG Cloud Provider, Network and Node]
Bug or Regression
Fix NPE in ephemeral storage eviction (#98261, @wzshiming) [SIG Node]
Fixed a bug that on k8s nodes, when the policy of INPUT chain in filter table is not ACCEPT, healthcheck nodeport would not work.
Added iptables rules to allow healthcheck nodeport traffic. (#97824, @hanlins) [SIG Network]
Fixed kube-proxy container image architecture for non amd64 images. (#98526, @saschagrunert) [SIG API Machinery, Release and Testing]
Fixed provisioning of Cinder volumes migrated to CSI when StorageClass with AllowedTopologies was used. (#98311, @jsafrane) [SIG Storage]
Fixes a panic in the disruption budget controller for PDB objects with invalid selectors (#98750, @mortent) [SIG Apps]
Fixes connection errors when using --volume-host-cidr-denylist or --volume-host-allow-local-loopback (#98436, @liggitt) [SIG Network and Storage]
If the user specifies an invalid timeout in the request URL, the request will be aborted with an HTTP 400.
in cases where the client specifies a timeout in the request URL, the overall request deadline is shortened now since the deadline is setup as soon as the request is received by the apiserver. (#96901, @tkashem) [SIG API Machinery and Testing]
Kubeadm: Some text in the kubeadm upgrade plan output has changed. If you have scripts or other automation that parses this output, please review these changes and update your scripts to account for the new output. (#98728, @stmcginnis) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where external credentials in an existing admin.conf prevented the CA certificate to be written in the cluster-info ConfigMap. (#98882, @kvaps) [SIG Cluster Lifecycle]
Kubeadm: fix bad token placeholder text in "config print *-defaults --help" (#98839, @Mattias-) [SIG Cluster Lifecycle]
Kubeadm: get k8s CI version markers from k8s infra bucket (#98836, @hasheddan) [SIG Cluster Lifecycle and Release]
Mitigate CVE-2020-8555 for kube-up using GCE by preventing local loopback folume hosts. (#97934, @mattcary) [SIG Cloud Provider and Storage]
Remove CSI topology from migrated in-tree gcepd volume. (#97823, @Jiawei0227) [SIG Cloud Provider and Storage]
Sync node status during kubelet node shutdown.
Adds an pod admission handler that rejects new pods when the node is in progress of shutting down. (#98005, @wzshiming) [SIG Node]
Truncates a message if it hits the NoteLengthLimit when the scheduler records an event for the pod that indicates the pod has failed to schedule. (#98715, @carlory) [SIG Scheduling]
We will no longer automatically delete all data when a failure is detected during creation of the volume data file on a CSI volume. Now we will only remove the data file and volume path. (#96021, @huffmanca) [SIG Storage]
Other (Cleanup or Flake)
Fix the description of command line flags that can override --config (#98254, @changshuchao) [SIG Scheduling]
Migrate staging/src/k8s.io/apiserver/pkg/admission logs to structured logging (#98138, @lala123912) [SIG API Machinery]
Resolves flakes in the Ingress conformance tests due to conflicts with controllers updating the Ingress object (#98430, @liggitt) [SIG Network and Testing]
The default delegating authorization options now allow unauthenticated access to healthz, readyz, and livez. A system:masters user connecting to an authz delegator will not perform an authz check. (#98325, @deads2k) [SIG API Machinery, Auth, Cloud Provider and Scheduling]
The e2e suite can be instructed not to wait for pods in kube-system to be ready or for all nodes to be ready by passing --allowed-not-ready-nodes=-1 when invoking the e2e.test program. This allows callers to run subsets of the e2e suite in scenarios other than perfectly healthy clusters. (#98781, @smarterclayton) [SIG Testing]
The feature gates WindowsGMSA and WindowsRunAsUserName that are GA since v1.18 are now removed. (#96531, @ialidzhikov) [SIG Node and Windows]
The new -gce-zones flag on the e2e.test binary instructs tests that check for information about how the cluster interacts with the cloud to limit their queries to the provided zone list. If not specified, the current behavior of asking the cloud provider for all available zones in multi zone clusters is preserved. (#98787, @smarterclayton) [SIG API Machinery, Cluster Lifecycle and Testing]
(No, really, you MUST read this before you upgrade)
Remove storage metrics storage_operation_errors_total, since we already have storage_operation_status_count.And add new field status for storage_operation_duration_seconds, so that we can know about all status storage operation latency. (#98332, @JornShen) [SIG Instrumentation and Storage]
Changes by Kind
Deprecation
Remove the TokenRequest and TokenRequestProjection feature gates (#97148, @wawa0210) [SIG Node]
Removing experimental windows container hyper-v support with Docker (#97141, @wawa0210) [SIG Node and Windows]
The export query parameter (inconsistently supported by API resources and deprecated in v1.14) is fully removed. Requests setting this query parameter will now receive a 400 status response. (#98312, @deads2k) [SIG API Machinery, Auth and Testing]
API Change
Enable SPDY pings to keep connections alive, so that kubectl exec and kubectl port-forward won't be interrupted. (#97083, @knight42) [SIG API Machinery and CLI]
Documentation
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97935, @adeniyistephen) [SIG Release and Testing]
Set kubelet option --volume-stats-agg-period to negative value to disable volume calculations. (#96675, @pacoxu) [SIG Node]
Bug or Regression
Clean ReplicaSet by revision instead of creation timestamp in deployment controller (#97407, @waynepeking348) [SIG Apps]
Ensure that client-go's EventBroadcaster is safe (non-racy) during shutdown. (#95664, @DirectXMan12) [SIG API Machinery]
Fix kubelet from panic after getting the wrong signal (#98200, @wzshiming) [SIG Node]
Fix repeatedly acquire the inhibit lock (#98088, @wzshiming) [SIG Node]
Fixed a bug that the kubelet cannot start on BtrfS. (#98042, @gjkim42) [SIG Node]
Fixed an issue with garbage collection failing to clean up namespaced children of an object also referenced incorrectly by cluster-scoped children (#98068, @liggitt) [SIG API Machinery and Apps]
Fixed no effect namespace when exposing deployment with --dry-run=client. (#97492, @masap) [SIG CLI]
Fixing a bug where a failed node may not have the NoExecute taint set correctly (#96876, @howieyuen) [SIG Apps and Node]
Indentation of Resource Quota block in kubectl describe namespaces output gets correct. (#97946, @dty1er) [SIG CLI]
KUBECTL_EXTERNAL_DIFF now accepts equal sign for additional parameters. (#98158, @dougsland) [SIG CLI]
Kubeadm: fix a bug where "kubeadm join" would not properly handle missing names for existing etcd members. (#97372, @ihgann) [SIG Cluster Lifecycle]
Kubelet should ignore cgroup driver check on Windows node. (#97764, @pacoxu) [SIG Node and Windows]
Make podTopologyHints protected by lock (#95111, @choury) [SIG Node]
Readjust kubelet_containers_per_pod_count bucket (#98169, @wawa0210) [SIG Instrumentation and Node]
Scores from InterPodAffinity have stronger differentiation. (#98096, @leileiwan) [SIG Scheduling]
Specifying the KUBE_TEST_REPO environment variable when e2e tests are executed will instruct the test infrastructure to load that image from a location within the specified repo, using a predefined pattern. (#93510, @smarterclayton) [SIG Testing]
Static pods will be deleted gracefully. (#98103, @gjkim42) [SIG Node]
Use network.Interface.VirtualMachine.ID to get the binded VM
Skip standalone VM when reconciling LoadBalancer (#97635, @nilo19) [SIG Cloud Provider]
Other (Cleanup or Flake)
Kubeadm: change the default image repository for CI images from 'gcr.io/kubernetes-ci-images' to 'gcr.io/k8s-staging-ci-images' (#97087, @SataQiu) [SIG Cluster Lifecycle]
Migrate generic_scheduler.go and types.go to structured logging. (#98134, @tanjing2020) [SIG Scheduling]
Migrate proxy/winuserspace/proxier.go logs to structured logging (#97941, @JornShen) [SIG Network]
Migrate staging/src/k8s.io/apiserver/pkg/audit/policy/reader.go logs to structured logging. (#98252, @lala123912) [SIG API Machinery and Auth]
Migrate staging\src\k8s.io\apiserver\pkg\endpoints logs to structured logging (#98093, @lala123912) [SIG API Machinery]
Node (#96552, @pandaamanda) [SIG Apps, Cloud Provider, Node and Scheduling]
The kubectl alpha debug command was scheduled to be removed in v1.21. (#98111, @pandaamanda) [SIG CLI]
(No, really, you MUST read this before you upgrade)
Kube-proxy's IPVS proxy mode no longer sets the net.ipv4.conf.all.route_localnet sysctl parameter. Nodes upgrading will have net.ipv4.conf.all.route_localnet set to 1 but new nodes will inherit the system default (usually 0). If you relied on any behavior requiring net.ipv4.conf.all.route_localnet, you must set ensure it is enabled as kube-proxy will no longer set it automatically. This change helps to further mitigate CVE-2020-8558. (#92938, @lbernail) [SIG Network and Release]
Changes by Kind
Deprecation
Deprecate the topologyKeys field in Service. This capability will be replaced with upcoming work around Topology Aware Subsetting and Service Internal Traffic Policy. (#96736, @andrewsykim) [SIG Apps]
Kubeadm: graduate the command kubeadm alpha kubeconfig user to kubeadm kubeconfig user. The kubeadm alpha kubeconfig user command is deprecated now. (#97583, @knight42) [SIG Cluster Lifecycle]
Kubeadm: the "kubeadm alpha certs" command is removed now, please use "kubeadm certs" instead. (#97706, @knight42) [SIG Cluster Lifecycle]
Remove the deprecated metrics "scheduling_algorithm_preemption_evaluation_seconds" and "binding_duration_seconds", suggest to use "scheduler_framework_extension_point_duration_seconds" instead. (#96447, @chendave) [SIG Cluster Lifecycle, Instrumentation, Scheduling and Testing]
The PodSecurityPolicy API is deprecated in 1.21, and will no longer be served starting in 1.25. (#97171, @deads2k) [SIG Auth and CLI]
API Change
Change the APIVersion proto name of BoundObjectRef from aPIVersion to apiVersion. (#97379, @kebe7jun) [SIG Auth]
Promote Immutable Secrets/ConfigMaps feature to Stable.
This allows to set Immutable field in Secrets or ConfigMap object to mark their contents as immutable. (#97615, @wojtek-t) [SIG Apps, Architecture, Node and Testing]
Feature
Add flag --lease-max-object-size and metric etcd_lease_object_counts for kube-apiserver to config and observe max objects attached to a single etcd lease. (#97480, @lingsamuel) [SIG API Machinery, Instrumentation and Scalability]
Add flag --lease-reuse-duration-seconds for kube-apiserver to config etcd lease reuse duration. (#97009, @lingsamuel) [SIG API Machinery and Scalability]
Adds the ability to pass --strict-transport-security-directives to the kube-apiserver to set the HSTS header appropriately. Be sure you understand the consequences to browsers before setting this field. (#96502, @249043822) [SIG Auth]
Kubeadm now includes CoreDNS v1.8.0. (#96429, @rajansandeep) [SIG Cluster Lifecycle]
Kubeadm: add support for certificate chain validation. When using kubeadm in external CA mode, this allows an intermediate CA to be used to sign the certificates. The intermediate CA certificate must be appended to each signed certificate for this to work correctly. (#97266, @robbiemcmichael) [SIG Cluster Lifecycle]
Kubeadm: amend the node kernel validation to treat CGROUP_PIDS, FAIR_GROUP_SCHED as required and CFS_BANDWIDTH, CGROUP_HUGETLB as optional (#96378, @neolit123) [SIG Cluster Lifecycle and Node]
The Kubernetes pause image manifest list now contains an image for Windows Server 20H2. (#97322, @claudiubelu) [SIG Windows]
The apimachinery util/net function used to detect the bind address ResolveBindAddress()
takes into consideration global ip addresses on loopback interfaces when:
- the host has default routes
- there are no global IPs on those interfaces.
in order to support more complex network scenarios like BGP Unnumbered RFC 5549 (#95790, @aojea) [SIG Network]
Bug or Regression
Changelog
General
Fix priority expander falling back to a random choice even though there is a higher priority option to choose
Clone kubernetes/kubernetes in update-vendor.sh shallowly, instead of fetching all revisions
Speed up binpacking by reducing the number of PreFilter calls (call once per pod instead of #pods*#nodes times)
Speed up finding unneeded nodes by 5x+ in very large clusters by reducing the number of PreFilter calls
Expose --max-nodes-total as a metric
Errors in IncreaseSize changed from type apiError to cloudProviderError
Make build-in-docker and test-in-docker work on Linux systems with SELinux enabled
Fix an error where existing nodes were not considered as destinations while finding place for pods in scale-down simulations
Remove redundant log lines and reduce severity around parsing kubeEnv
Don't treat nodes created by virtual kubelet as nodes from non-autoscaled node groups
Remove redundant logging around calculating node utilization
Add configurable --network and --rm flags for docker in Makefile
Subtract DaemonSet pods' requests from node allocatable in the denominator while computing node utilization
Include taints by condition when determining if a node is unready/still starting
Fix update-vendor.sh to work on OSX and zsh
Add best-effort eviction for DaemonSet pods while scaling down non-empty nodes
Add build support for ARM64
AliCloud
Add missing daemonsets and replicasets to ALI example cluster role
Apache CloudStack
Add support for Apache CloudStack
AWS
Regenerate list of EC2 instances
Fix pricing endpoint in AWS China Region
Azure
Add optional jitter on initial VMSS VM cache refresh, keep the refreshes spread over time
Serve from cache for the whole period of ongoing throttling
Fix unwanted VMSS VMs cache invalidations
Enforce setting the number of retries if cloud provider backoff is enabled
Don't update capacity if VMSS provisioning state is updating
Support allocatable resources overrides via VMSS tags
Add missing stable labels in template nodes
Proactively set instance status to deleting on node deletions
Cluster API
Migrate interaction with the API from using internal types to using Unstructured
Improve tests to work better with constrained resources
Add support for node autodiscovery
Add support for --cloud-config
Update group identifier to use for Cluster API annotations
Exoscale
Add support for Exoscale
GCE
Decrease the number of GCE Read Requests made while deleting nodes
Base pricing of custom instances on their instance family type
Add pricing information for missing machine types
Add pricing information for different GPU types
Ignore the new topology.gke.io/zone label when comparing groups
Add missing stable labels to template nodes
HuaweiCloud
Add auto scaling group support
Implement node group by AS
Implement getting desired instance number of node group
Implement increasing node group size
Implement TemplateNodeInfo
Implement caching instances
IONOS
Add support for IONOS
Kubemark
Skip non-kubemark nodes while computing node infos for node groups.
Magnum
Add Magnum support in the Cluster Autoscaler helm chart
Fix nil VMSS name when setting service to auto mode (#97366, @nilo19) [SIG Cloud Provider]
Fix the panic when kubelet registers if a node object already exists with no Status.Capacity or Status.Allocatable (#95269, @SataQiu) [SIG Node]
Fix the regression with the slow pods termination. Before this fix pods may take an additional time to terminate - up to one minute. Reversing the change that ensured that CNI resources cleaned up when the pod is removed on API server. (#97980, @SergeyKanzhelev) [SIG Node]
Fix to recover CSI volumes from certain dangling attachments (#96617, @yuga711) [SIG Apps and Storage]
Fix: azure file latency issue for metadata-heavy workloads (#97082, @andyzhangx) [SIG Cloud Provider and Storage]
Fixed FibreChannel volume plugin corrupting filesystems on detach of multipath volumes. (#97013, @jsafrane) [SIG Storage]
Fixed a bug in kubelet that will saturate CPU utilization after containerd got restarted. (#97174, @hanlins) [SIG Node]
Fixed bug in CPUManager with race on container map access (#97427, @klueska) [SIG Node]
Fixed cleanup of block devices when /var/lib/kubelet is a symlink. (#96889, @jsafrane) [SIG Storage]
GCE Internal LoadBalancer sync loop will now release the ILB IP address upon sync failure. An error in ILB forwarding rule creation will no longer leak IP addresses. (#97740, @prameshj) [SIG Cloud Provider and Network]
Ignore update pod with no new images in alwaysPullImages admission controller (#96668, @pacoxu) [SIG Apps, Auth and Node]
Kubeadm now installs version 3.4.13 of etcd when creating a cluster with v1.19 (#97244, @pacoxu) [SIG Cluster Lifecycle]
Kubeadm: avoid detection of the container runtime for commands that do not need it (#97625, @pacoxu) [SIG Cluster Lifecycle]
Kubeadm: fix a bug in the host memory detection code on 32bit Linux platforms (#97403, @abelbarrera15) [SIG Cluster Lifecycle]
Kubeadm: fix a bug where "kubeadm upgrade" commands can fail if CoreDNS v1.8.0 is installed. (#97919, @neolit123) [SIG Cluster Lifecycle]
Remove deprecated --cleanup-ipvs flag of kube-proxy, and make --cleanup flag always to flush IPVS (#97336, @maaoBit) [SIG Network]
The current version of the container image publicly exposed IP serving a /metrics endpoint to the Internet. The new version of the container image serves /metrics endpoint on a different port. (#97621, @vbannai) [SIG Cloud Provider]
Use force unmount for NFS volumes if regular mount fails after 1 minute timeout (#96844, @gnufied) [SIG Storage]
Users will see increase in time for deletion of pods and also guarantee that removal of pod from api server would mean deletion of all the resources from container runtime. (#92817, @kmala) [SIG Node]
Using exec auth plugins with kubectl no longer results in warnings about constructing many client instances from the same exec auth config. (#97857, @liggitt) [SIG API Machinery and Auth]
Warning about using a deprecated volume plugin is logged only once. (#96751, @jsafrane) [SIG Storage]
Other (Cleanup or Flake)
Bump github.com/Azure/go-autorest/autorest to v0.11.12 (#97033, @patrickshan) [SIG API Machinery, CLI, Cloud Provider and Cluster Lifecycle]
Kube-proxy: Traffic from the cluster directed to ExternalIPs is always sent directly to the Service. (#96296, @aojea) [SIG Network and Testing]
Kubeadm: fix a whitespace issue in the output of the "kubeadm join" command shown as the output of "kubeadm init" and "kubeadm token create --print-join-command" (#97413, @SataQiu) [SIG Cluster Lifecycle]
Kubeadm: improve the error messaging when the user provides an invalid discovery token CA certificate hash. (#97290, @neolit123) [SIG Cluster Lifecycle]
Migrate log messages in pkg/scheduler/{scheduler.go,factory.go} to structured logging (#97509, @aldudko) [SIG Scheduling]
Migrate proxy/iptables/proxier.go logs to structured logging (#97678, @JornShen) [SIG Network]
Migrate some scheduler log messages to structured logging (#97349, @aldudko) [SIG Scheduling]
NetworkPolicy validation framework optimizations for rapidly verifying CNI's work correctly across several pods and namespaces (#91592, @jayunit100) [SIG Network, Storage and Testing]
Official support to build kubernetes with docker-machine / remote docker is removed. This change does not affect building kubernetes with docker locally. (#97618, @jherrera123) [SIG Release and Testing]
Scheduler plugin validation now provides all errors detected instead of the first one. (#96745, @lingsamuel) [SIG Node, Scheduling and Testing]
Storage related e2e testsuite redesign & cleanup (#96573, @Jiawei0227) [SIG Storage and Testing]
The OIDC authenticator no longer waits 10 seconds before attempting to fetch the metadata required to verify tokens. (#97693, @enj) [SIG API Machinery and Auth]
The AttachVolumeLimit feature gate that is GA since v1.17 is now removed. (#96539, @ialidzhikov) [SIG Storage]
The CSINodeInfo feature gate that is GA since v1.17 is unconditionally enabled, and can no longer be specified via the --feature-gates argument. (#96561, @ialidzhikov) [SIG Apps, Auth, Scheduling, Storage and Testing]
The deprecated feature gates RotateKubeletClientCertificate, AttachVolumeLimit, VolumePVCDataSource and EvenPodsSpread are now unconditionally enabled and can no longer be specified in component invocations. (#97306, @gavinfish) [SIG Node, Scheduling and Storage]
ServiceNodeExclusion, NodeDisruptionExclusion and LegacyNodeRoleBehavior(locked to false) features have been promoted to GA.
To prevent control plane nodes being added to load balancers automatically, upgrade users need to add "node.kubernetes.io/exclude-from-external-load-balancers" label to control plane nodes. (#97543, @pacoxu) [SIG API Machinery, Apps, Cloud Provider and Network]
Uncategorized
Adding Brazilian Portuguese translation for kubectl (#61595, @cpanato) [SIG CLI]
error: failed to run Kubelet: failed to create kubelet:
misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
# kubectl get pods
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
在某些情况下 kubectl logs 和 kubectl run 命令或许会返回以下错误,即便除此之外集群一切功能正常:
Error from server: Get https://10.19.0.41:10250/containerLogs/default/mysql-ddc65b868-glc5m/mysql: dial tcp 10.19.0.41:10250: getsockopt: no route to host
这或许是由于 Kubernetes 使用的 IP 无法与看似相同的子网上的其他 IP 进行通信的缘故,
可能是由机器提供商的政策所导致的。
Digital Ocean 既分配一个共有 IP 给 eth0,也分配一个私有 IP 在内部用作其浮动 IP 功能的锚点,
然而 kubelet 将选择后者作为节点的 InternalIP 而不是公共 IP
使用 ip addr show 命令代替 ifconfig 命令去检查这种情况,因为 ifconfig 命令
不会显示有问题的别名 IP 地址。或者指定的 Digital Ocean 的 API 端口允许从 droplet 中
查询 anchor IP:
在云环境场景中,可能出现在云控制管理器完成节点地址初始化之前,kube-proxy 就被调度到新节点了。
这会导致 kube-proxy 无法正确获取节点的 IP 地址,并对管理负载平衡器的代理功能产生连锁反应。
在 kube-proxy Pod 中可以看到以下错误:
server.go:610] Failed to retrieve node IP: host IP unknown; known addresses: []
proxier.go:340] invalid nodeIP, initializing kube-proxy with 127.0.0.1 as nodeIP
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a Pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join <control-plane-host>:<control-plane-port> --token <token> --discovery-token-ca-cert-hash sha256:<hash>
[preflight] Running pre-flight checks
... (log output of join workflow) ...
Node join complete:
* Certificate signing request sent to control-plane and response
received.
* Kubelet informed of new secure connection details.
Run 'kubectl get nodes' on control-plane to see this machine join.
几秒钟后,当你在控制平面节点上执行 kubectl get nodes,你会注意到该节点出现在输出中。
...
You can now join any number of control-plane node by running the following command on each as a root:
kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866 --control-plane --certificate-key f8902e114ef118304e561c3ecd4d0b543adc226b7a07f675f56564185ffe0c07
Please note that the certificate-key gives access to cluster sensitive data, keep it secret!
As a safeguard, uploaded-certs will be deleted in two hours; If necessary, you can use kubeadm init phase upload-certs to reload certs afterward.
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.0.200:6443 --token 9vr73a.a8uxyaju799qwdjv --discovery-token-ca-cert-hash sha256:7c2e69131a36ae2a042a339b33381c6d0d43887e2de83720eff5359e26aec866
Kubespray 提供了一种使用
Netchecker
验证 Pod 间连接和 DNS 解析的方法。
Netchecker 确保 netchecker-agents pod 可以解析。
DNS 请求并在默认名称空间内对每个请求执行 ping 操作。
这些 Pods 模仿其余工作负载的类似行为,并用作集群运行状况指示器。
在很多组织中,其服务和应用的很大比例是 Windows 应用。
Windows 容器提供了一种对进程和包依赖关系
进行封装的现代方式,这使得用户更容易采用 DevOps 实践,令 Windows 应用同样遵从
云原生模式。
Kubernetes 已经成为事实上的标准容器编排器,Kubernetes 1.14 发行版本中包含了将
Windows 容器调度到 Kubernetes 集群中 Windows 节点上的生产级支持,从而使得巨大
的 Windows 应用生态圈能够充分利用 Kubernetes 的能力。
对于同时投入基于 Windows 应用和 Linux 应用的组织而言,他们不必寻找不同的编排系统
来管理其工作负载,其跨部署的运维效率得以大幅提升,而不必关心所用操作系统。
kubernetes 中的 Windows 容器
若要在 Kubernetes 中启用对 Windows 容器的编排,可以在现有的 Linux 集群中
包含 Windows 节点。在 Kubernetes 上调度 Pods
中的 Windows 容器与调用基于 Linux 的容器类似。
为了运行 Windows 容器,你的 Kubernetes 集群必须包含多个操作系统,控制面
节点运行 Linux,工作节点则可以根据负载需要运行 Windows 或 Linux。
Windows Server 2019 是唯一被支持的 Windows 操作系统,在 Windows 上启用
Kubernetes 节点
支持(包括 kubelet, 容器运行时、
以及 kube-proxy)。关于 Windows 发行版渠道的详细讨论,可参见
Microsoft 文档。
说明: Kubernetes 控制面,包括主控组件,
继续在 Linux 上运行。
目前没有支持完全是 Windows 节点的 Kubernetes 集群的计划。
说明: 在本文中,当我们讨论 Windows 容器时,我们所指的是具有进程隔离能力的 Windows
容器。具有 Hyper-V 隔离能力
的 Windows 容器计划在将来发行版本中推出。
支持的功能与局限性
支持的功能
Windows 操作系统版本支持
参考下面的表格,了解 Kubernetes 中支持的 Windows 操作系统。
同一个异构的 Kubernetes 集群中可以同时包含 Windows 和 Linux 工作节点。
Windows 容器仅能调度到 Windows 节点,Linux 容器则只能调度到 Linux 节点。
Pod 是 Kubernetes 中最基本的构造模块,是 Kubernetes 对象模型中你可以创建或部署的
最小、最简单元。你不可以在同一 Pod 中部署 Windows 和 Linux 容器。
Pod 中的所有容器都会被调度到同一节点(Node),而每个节点代表的是一种特定的平台
和体系结构。Windows 容器支持 Pod 的以下能力、属性和事件:
Windows 支持五种不同的网络驱动/模式:二层桥接(L2bridge)、二层隧道(L2tunnel)、
覆盖网络(Overlay)、透明网络(Transparent)和网络地址转译(NAT)。
在一个包含 Windows 和 Linux 工作节点的异构集群中,你需要选择一种对 Windows 和
Linux 兼容的联网方案。下面是 Windows 上支持的一些树外插件及何时使用某种
CNI 插件的建议:
网络驱动
描述
容器报文更改
网络插件
网络插件特点
L2bridge
容器挂接到外部 vSwitch 上。容器挂接到下层网络之上,但由于容器的 MAC
地址在入站和出站时被重写,物理网络不需要这些地址。
MAC 地址被重写为宿主系统的 MAC 地址,IP 地址也可能依据 HNS OutboundNAT
策略重写为宿主的 IP 地址。
当(比如出于安全原因)期望虚拟容器网络与下层宿主网络隔离时,
应该使用 win-overlay。如果你的数据中心可用 IP 地址受限,
覆盖网络允许你在不同的网络中复用 IP 地址(每个覆盖网络有不同的 VNID 标签)。
这一选项要求在 Windows Server 2009 上安装
[KB4489899](https://support.microsoft.com/help/4489899) 补丁。
对 Windows 而言,在 Kubernetes 中使用 IPv6 需要
Windows Server 2004 (内核版本 10.0.19041.610)或更高版本。
目前 Windows 上的覆盖网络(VXLAN)还不支持双协议栈联网。
局限性
在 Kubernetes 架构和节点阵列中仅支持将 Windows 作为工作节点使用。
这意味着 Kubernetes 集群必须总是包含 Linux 主控节点,零个或者多个 Linux
工作节点以及零个或者多个 Windows 工作节点。
资源处理
Linux 上使用 Linux 控制组(CGroups)作为 Pod 的边界,以实现资源控制。
容器都创建于这一边界之内,从而实现网络、进程和文件系统的隔离。
控制组 CGroups API 可用来收集 CPU、I/O 和内存的统计信息。
与此相比,Windows 为每个容器创建一个带有系统名字空间过滤设置的 Job 对象,
以容纳容器中的所有进程并提供其与宿主系统间的逻辑隔离。
没有现成的名字空间过滤设置是无法运行 Windows 容器的。
这也意味着,系统特权无法在宿主环境中评估,因而 Windows 上也就不存在特权容器。
归咎于独立存在的安全账号管理器(Security Account Manager,SAM),容器也不能
获得宿主系统上的任何身份标识。
资源预留
内存预留
Windows 不像 Linux 一样有一个内存耗尽(Out-of-memory)进程杀手(Process
Killer)机制。Windows 总是将用户态的内存分配视为虚拟请求,页面文件(Pagefile)
是必需的。这一差异的直接结果是 Windows 不会像 Linux 那样出现内存耗尽的状况,
系统会将进程内存页面写入磁盘而不会因内存耗尽而终止进程。
当内存被过量使用且所有物理内存都被用光时,系统的换页行为会导致性能下降。
为了统计 Windows、Docker 和其他 Kubernetes 宿主进程的 CPU 用量,建议
预留一定比例的 CPU,以便对事件作出相应。此值需要根据 Windows 节点上
CPU 核的个数来调整,要确定此百分比值,用户需要为其所有节点确定 Pod
密度的上线,并监控系统服务的 CPU 用量,从而选择一个符合其负载需求的值。
使用 kubelet 参数 --kubelet-reserve 与/或 -system-reserve 可以统计
节点上的 CPU 用量(各容器之外),进而可能将 CPU 用量限制在一个合理的范围,。
这样做会减少节点可分配 CPU
(NodeAllocatable)。
功能特性限制
终止宽限期(Termination Grace Period):未实现
单文件映射:将用 CRI-ContainerD 来实现
终止消息(Termination message):将用 CRI-ContainerD 来实现
特权容器:Windows 容器当前不支持
巨页(Huge Pages):Windows 容器当前不支持
现有的节点问题探测器(Node Problem Detector)仅适用于 Linux,且要求使用特权容器。
一般而言,我们不设想此探测器能用于 Windows 节点,因为 Windows 不支持特权容器。
在 Windows 节点上存在一个额外的参数用来设置 kubelet 进程的优先级,称作
--windows-priorityclass。此参数允许 kubelet 进程获得与 Windows 宿主上
其他进程相比更多的 CPU 时间片。
关于可用参数值及其含义的进一步信息可参考
Windows Priority Classes。
为了让 kubelet 总能够获得足够的 CPU 周期,建议将此参数设置为
ABOVE_NORMAL_PRIORITY_CLASS 或更高。
存储
Windows 上包含一个分层的文件系统来挂载容器的分层,并会基于 NTFS 来创建一个
拷贝文件系统。容器中的所有文件路径都仅在该容器的上下文内完成解析。
Windows 宿主联网服务和虚拟交换机实现了名字空间隔离,可以根据需要为 Pod 或容器
创建虚拟的网络接口(NICs)。不过,很多类似 DNS、路由、度量值之类的配置数据都
保存在 Windows 注册表数据库中而不是像 Linux 一样保存在 /etc/... 文件中。
Windows 为容器提供的注册表与宿主系统的注册表是分离的,因此类似于将 /etc/resolv.conf
文件从宿主系统映射到容器中的做法不会产生与 Linux 系统相同的效果。
这些信息必须在容器内部使用 Windows API 来配置。
因此,CNI 实现需要调用 HNS,而不是依赖文件映射来将网络细节传递到 Pod
或容器中。
Windows 节点不支持以下联网功能:
Windows Pod 不能使用宿主网络模式;
从节点本地访问 NodePort 会失败(但从其他节点或外部客户端可访问)
Windows Server 的未来版本中会支持从节点访问服务的 VIP;
每个服务最多支持 64 个后端 Pod 或独立的目标 IP 地址;
kube-proxy 的覆盖网络支持是 Beta 特性。此外,它要求在 Windows Server 2019 上安装
KB4482887 补丁;
非 DSR(保留目标地址)模式下的本地流量策略;
连接到覆盖网络的 Windows 容器不支持使用 IPv6 协议栈通信。
要使得这一网络驱动支持 IPv6 地址需要在 Windows 平台上开展大量的工作,
还需要在 Kubernetes 侧修改 kubelet、kube-proxy 以及 CNI 插件。
不支持 DNS 的 ClusterFirstWithHostNet 配置。Windows 将所有包含 “.” 的名字
视为全限定域名(FQDN),因而不会对其执行部分限定域名(PQDN)解析。
在 Linux 上,你可以有一个 DNS 后缀列表供解析部分限定域名时使用。
在 Windows 上,我们只有一个 DNS 后缀,即与该 Pod 名字空间相关联的 DNS
后缀(例如 mydns.svc.cluster.local)。
Windows 可以解析全限定域名、或者恰好可用该后缀来解析的服务名称。
例如,在 default 名字空间中生成的 Pod 会获得 DNS 后缀
default.svc.cluster.local。在 Windows Pod 中,你可以解析
kubernetes.default.svc.cluster.local 和 kubernetes,但无法解析二者
之间的形式,如 kubernetes.default 或 kubernetes.default.svc。
在 Windows 上,可以使用的 DNS 解析程序有很多。由于这些解析程序彼此之间
会有轻微的行为差别,建议使用 Resolve-DNSName 工具来完成名字查询解析。
IPv6
Windows 上的 Kubernetes 不支持单协议栈的“只用 IPv6”联网选项。
不过,系统支持在 IPv4/IPv6 双协议栈的 Pod 和节点上运行单协议家族的服务。
更多细节可参阅 IPv4/IPv6 双协议栈联网一节。
会话亲和性
不支持使用 service.spec.sessionAffinityConfig.clientIP.timeoutSeconds 来为
Windows 服务设置最大会话粘滞时间。
安全性
Secret 以明文形式写入节点的卷中(而不是像 Linux 那样写入内存或 tmpfs 中)。
这意味着客户必须做以下两件事:
信号(Signal) - Windows 交互式应用以不同方式来处理终止事件,并可实现以下方式之一或组合:
UI 线程处理包含 WM_CLOSE 在内的良定的消息
控制台应用使用控制处理程序来处理 Ctrl-C 或 Ctrl-Break
服务会注册服务控制处理程序,接受 SERVICE_CONTROL_STOP 控制代码
退出代码遵从相同的习惯,0 表示成功,非 0 值表示失败。
特定的错误代码在 Windows 和 Linux 上可能会不同。不过,从 Kubernetes 组件
(kubelet、kube-proxy)所返回的退出代码是没有变化的。
v1.Container.ResourceRequirements.limits.cpu 和
v1.Container.ResourceRequirements.limits.memory - Windows
不对 CPU 分配设置硬性的限制。与之相反,Windows 使用一个份额(share)系统。
基于毫核(millicores)的现有字段值会被缩放为相对的份额值,供 Windows 调度器使用。
参见 kuberuntime/helpers_windows.go 和
Microsoft 文档中关于资源控制的部分。
Windows 容器运行时中没有实现巨页支持,因此相关特性不可用。
巨页支持需要判定用户的特权
而这一特性无法在容器级别配置。
v1.PodSecurityContext.sysctls - 这些是 Linux sysctl 接口的一部分;Windows 上
没有等价机制。
操作系统版本限制
Windows 有着严格的兼容性规则,宿主 OS 的版本必须与容器基准镜像 OS 的版本匹配。
目前仅支持容器操作系统为 Windows Server 2019 的 Windows 容器。
对于容器的 Hyper-V 隔离、允许一定程度上的 Windows 容器镜像版本向后兼容性等等,
都是将来版本计划的一部分。
在一个 Kubernetes Pod 中,一个基础设施容器,或称 "pause" 容器,会被首先创建出来,
用以托管容器端点。属于同一 Pod 的容器,包括基础设施容器和工作容器,会共享相同的
网络名字空间和端点(相同的 IP 和端口空间)。我们需要 pause 容器来工作容器崩溃或
重启的状况,以确保不会丢失任何网络配置。
说明: 由于当前平台对 Windows 网络堆栈的限制,Windows 容器主机无法访问在其上调度的服务的 IP。只有 Windows pods 才能访问服务 IP。
可观测性
抓取来自工作负载的日志
日志是可观测性的重要一环;使用日志用户可以获得对负载运行状况的洞察,
因而日志是故障排查的一个重要手法。
因为 Windows 容器中的 Windows 容器和负载与 Linux 容器的行为不同,
用户很难收集日志,因此运行状态的可见性很受限。
例如,Windows 工作负载通常被配置为将日志输出到 Windows 事件跟踪
(Event Tracing for Windows,ETW),或者将日志条目推送到应用的事件日志中。
LogMonitor
是 Microsoft 提供的一个开源工具,是监视 Windows 容器中所配置的日志源
的推荐方式。
LogMonitor 支持监视时间日志、ETW 提供者模块以及自定义的应用日志,
并使用管道的方式将其输出到标准输出(stdout),以便 kubectl logs <pod>
这类命令能够读取这些数据。
从 Kubernetes v1.16 开始,可以为 Windows 容器配置与其镜像默认值不同的用户名
来运行其入口点和进程。
此能力的实现方式和 Linux 容器有些不同。
在此处
可了解更多信息。
使用组托管服务帐户管理工作负载身份
从 Kubernetes v1.14 开始,可以将 Windows 容器工作负载配置为使用组托管服务帐户(GMSA)。
组托管服务帐户是 Active Directory 帐户的一种特定类型,它提供自动密码管理,
简化的服务主体名称(SPN)管理以及将管理委派给跨多台服务器的其他管理员的功能。
配置了 GMSA 的容器可以访问外部 Active Directory 域资源,同时携带通过 GMSA 配置的身份。
在此处了解有关为
Windows 容器配置和使用 GMSA 的更多信息。
污点和容忍度
目前,用户需要将 Linux 和 Windows 工作负载运行在各自特定的操作系统的节点上,
因而需要结合使用污点和节点选择算符。 这可能仅给 Windows 用户造成不便。
推荐的方法概述如下,其主要目标之一是该方法不应破坏与现有 Linux 工作负载的兼容性。
确保特定操作系统的工作负载落在适当的容器主机上
用户可以使用污点和容忍度确保 Windows 容器可以调度在适当的主机上。目前所有 Kubernetes 节点都具有以下默认标签:
kubernetes.io/os = [windows|linux]
kubernetes.io/arch = [amd64|arm64|...]
如果 Pod 规范未指定诸如 "kubernetes.io/os": windows 之类的 nodeSelector,则该 Pod
可能会被调度到任何主机(Windows 或 Linux)上。
这是有问题的,因为 Windows 容器只能在 Windows 上运行,而 Linux 容器只能在 Linux 上运行。
最佳实践是使用 nodeSelector。
但是,我们了解到,在许多情况下,用户都有既存的大量的 Linux 容器部署,以及一个现成的配置生态系统,
例如社区 Helm charts,以及程序化 Pod 生成案例,例如 Operators。
在这些情况下,您可能会不愿意更改配置添加 nodeSelector。替代方法是使用污点。
由于 kubelet 可以在注册期间设置污点,因此可以轻松修改它,使其仅在 Windows 上运行时自动添加污点。
例如:--register-with-taints='os=windows:NoSchedule'
向所有 Windows 节点添加污点后,Kubernetes 将不会在它们上调度任何负载(包括现有的 Linux Pod)。
为了使某 Windows Pod 调度到 Windows 节点上,该 Pod 既需要 nodeSelector 选择 Windows,
也需要合适的匹配的容忍度设置。
例如,使用托管的负载均衡器时,你可以配置负载均衡器发送源自故障区域 A 中的 kubelet 和 Pod 的流量,
并将该流量仅定向到也位于区域 A 中的控制平面主机。
如果单个控制平面主机或端点故障区域 A 脱机,则意味着区域 A 中的节点的所有控制平面流量现在都在区域之间发送。
在每个区域中运行多个控制平面主机能降低出现这种结果的可能性。
[1]: 用来连接到集群的不同 IP 或 DNS 名
(就像 kubeadm 为负载均衡所使用的固定
IP 或 DNS 名,kubernetes、kubernetes.default、kubernetes.default.svc、
kubernetes.default.svc.cluster、kubernetes.default.svc.cluster.local)。