DC/OS Version 1.12.4 was released on July 2, 2019.
DC/OS 1.12.4 includes the following components:
- Apache Mesos 1.7.3 change log.
- Marathon 1.7.203 change log.
- Metronome 0.6.23 change log.
Release summary
DC/OS is a distributed operating system that enables you to manage resources, application deployment, data services, networking, and security in an on-premise, cloud, or hybrid cluster environment.
Issues fixed in DC/OS 1.12.4
The issues that have been fixed in DC/OS 1.12.4 are grouped by feature, functional area, or component. Most change descriptions include one or more issue tracking identifiers for reference.
Admin Router
-
Changes the maximum size allowed for uploads to a service through Admin Router (DCOS-52768).
This release increases the maximum size allowed for uploading packages from 1GB to 16GB. This change enables you to upload larger packages to a registry service without timing out the upload connection.
Cluster operations
-
Provides improved clean-up operations to eliminate issues caused by unused volumes for removed containers (DCOS_OSS-1502).
Previously, there have been issues caused by orphan volumes that were not removed when their associated containers were removed. In general, you can run
docker prune
commands to avoid issues with unused volumes. However, with this release, you can also usedcos-docker-gc
to remove unused volumes associated with removed containers.The
dcos-docker-gc
program is asystemd
-controlled service that runs hourly on every cluster node to perform Docker-related clean-up operations on each cluster.
Exhibitor
-
Changes the security setting for the configuration parameters
aws_secret_access_key
andexhibitor_azure_account_key
so that these values are not visible in theuser.config.yaml
file on cluster nodes (DCOS-51751).These configuration parameters are used for Exhibitor. They are now marked as secret. The values set for these configuration parameters can only be viewed in the
user.config.full.yaml
file. Theuser.config.full.yaml
file has stricter read permissions than theuser.config.yaml
file and is not included in DC/OS diagnostics bundles.
GUI
- Includes updates to the DC/OS frontend GUI to improve the user experience (COPS-4796, COPS-4804, COPS-4857, DCOS-53836, DCOS-54034, DCOS-54039, DCOS-54041).
Installation
- Corrects the output returned when running the
dcos_generate_config.sh
ordcos_generate_config.ee.sh
script with the--validate-config
option so that it doesn’t display warning or error messages about missing deprecated configuration settings such asssh_user
andssh_key_path
(COPS-4282, DCOS_OSS-4613, DCOS_OSS-5152).
Job management
-
Enables you to report the task IDs for finished jobs (DCOS_OSS-5258, DCOS_OSS-5273).
With this release, you can query run details for jobs using the
embed=history
argument to return task IDs in the job history for both successful and failed finished jobs. -
Improves the validation performed for secrets when running jobs (COPS-4706, DCOS_OSS-5019).
Previously, if a job had a secret defined but not used in its Environment configuration section, the job submission would fail with an error indicating that
Secret names are different from provided secrets
.In this release, the validation logic and error messages have changed to indicate the issue detected and how to proceed before resubmitting the job.
For example, if a secret is defined but not referenced, the validation error specifies that the secret should be removed:
The following secrets are defined, but not referenced: "foo". Please remove them from the secrets field.
If a job references a secret that isn’t defined, the validation error specifies that the secret should be added:
The following secrets are referenced, but undefined: "foo". Please add them to the secrets field.
Marathon
-
Introduces a watcher loop process to monitor and, if necessary, re-register the Marathon leader after reelection (COPS-3554).
-
Fixes an issue that prevented services managed by Marathon from restarting (COPS-3593, DCOS_OSS-4193).
In previous releases, you might have services managed by Marathon that are unable to restart if the container crashes or under certain DNS failure conditions. For example, restarting services might fail if the first ZooKeeper node or first DC/OS master is unreachable.
Because this problem affects high availability for Marathon, a workaround (ping zk-1) was introduced for DC/OS 1.11.5 and 1.11.6 to address the issue. In this release, the underlying issue is resolved and you can safely remove the workaround if you have it deployed. For background information about the issue and the steps to remove the workaround, see Removing the patch for Marathon failing to start if the first DC/OS is not available.
Mesos platform and containerization
-
Adds support for Docker registry manifest specification v2_schema2 images (COPS-3889, COPS-4296, COPS-4628, DCOS-49982).
In previous releases, you might see errors if you attempted to use the DC/OS Universal Container Runtime (UCR) with a Docker image downloaded from Docker Hub or from the Nexus 3 Docker registry configured as a proxy for the Docker Hub.
DC/OS Universal Container Runtime (UCR) now fully supports Docker images that are formatted using the Docker v2_schema2 specification. The DC/OS Universal Container Runtime (UCR) also continues to support Docker images that use the v2_schema1 format. If you also have Docker images that use Docker registry v2_schema1 format, you should consider updating those images because the v2_schema1 format is no longer supported by Docker and is likely to be deprecated for other registries in the near future.
For more information about using Docker images with DC/OS Universal Container Runtime (UCR), see Universal Container Runtime.
-
Changes the behavior for tasks when there are failed resource providers (ASF-2839).
With this release, the resource manager will no longer ask all resource providers to publish all allocated resources. With this change, a failed resource provider would only fail tasks that wanted to use the failed resource provider’s resources.
Metrics
-
Corrects the service endpoint values and service address-based statistics that are returned when the
statsd
metrics input plugin is enabled (COPS-3279, COPS-3576, DCOS-37703, DCOS-39703). -
Adds HTTP request time and HTTP request size metrics for Metronome-scheduled jobs (DCOS_OSS-5020).
-
Enables framework names to be properly decoded in metric tags (DCOS_OSS-5039).
Mesos masters allow spaces in framework names by using percent-encoding (%20) of the framework name. This release updates the Telegraf plugin to enable it to decode the framework name and export metrics with the correct tags.
-
Fixes issues that caused some DC/OS components to crash when the
/tmp
directory is mounted using thenoexec
option (DCOS-53077). -
Modifies the Telegraf configuration settings to improve how metrics are collected on busy agents (DCOS-50994).
Networking
-
Adds a timeout to the Mesos network overlay module to prevent the overlay master from getting stuck in RECOVERING mode (COPS-4167, COPS-4747, DCOS_OSS-4575, DCOS-47930).
-
Includes a fix that prevents duplicate virtual IP addresses (VIP) from showing up after the manual removal of an agent or framework initiated by operator intervention (COPS-4703, DCOS_OSS-4963, DCOS-51157).
Previously, if an operator manual removed an agent or framework while performing routine maintenance or by issuing the framework TEARDOWN call, the Mesos agent might return information about terminated tasks in a response field normally reserved for running tasks. This incorrect response in turn resulted in the
dcos-net
networking component adding a duplicate VIP entry. The fix in this release prevents thedcos-net
networking component from adding a duplicate virtual IP address when an agent or framework is removed. -
Changes the behavior for IP routing rules to resolve a conflict between the port used for virtual IP address traffic and the port mapping used for containers (DCOS_OSS-5061).
Previously, if a container was configured to use port mapping (for example, because the container uses bridge networking) and there is a virtual IP address listening on the same port, the routing of the virtual IP traffic would fail. This failure was caused by a conflict between the
iptable
rules for theportmapper
and theiptable
rules for the virtual IP routing.With this release, the
iptable
rules have been modified so that they are not applied to the traffic that is being routed to a virtual IP address. -
Returns canonical name (CNAME) records before address (A or AAAA) records in DNS responses (COPS-4761, DCOS_OSS-5108). Enterprise
For most DNS clients, the order in which records are returned has no effect. However, there are some DNS clients that require CNAME records to be listed before A records. This change resolves issues for DNS clients that have this requirement.
If you have DNS clients deployed for the DC/OS Enterprise cluster that are not able to resolve the
registry.component.thisdcos.directory
DNS name, the root cause of the issue is a bug in the DNS clientglibc
library. Theglibc
bug requires canonical name (CNAME) records to be retrieved before address (A or AAAA) records. With this release, the DC/OS DNS server sorts DNS results to ensure that the CNAME records appear first to resolve this issue. -
Adds round-robin DNS support so that DNS requests do not always return address (A) records in the same order (DCOS_OSS-5118).
Package management
-
Adds tarball package validation and support for retying a failed package download (COPS-2861).
Previously, you might encounter issues if the DC/OS package installer (
pkgpanda
) attempted to extract a package tarball into the/opt/mesosphere/packages
directory but detected that the tarball had only been partially downloaded. If downloading the entire tarball failed, the DC/OS installation process would also fail. With this release, the package installer validates tarballs and can retry the download if the package was not successfully downloaded to the bootstrap computer.
Enterprise
Security-
Fixes a problem with the
dcos-iam-ldap-sync
service failing to start correctly after a system reboot (COPS-4455, COPS-4814, DCOS-48107, DCOS-53420).With this release, the DC/OS identity and access management LDAP synchronization
systemd
unit no longer relies on the/opt/mesosphere
directory being available when thesystemd
configuration is loaded.
Storage
-
Updates Beta Rex-Ray to support NVMe (non-volatile memory express) EBS volumes (COPS-3961, DCOS_OSS-4316, DCOS-49828, DCOS-50047).
REX-Ray is a container storage orchestration engine that enables persistence for cloud-native workloads. With Rex-Ray, you can manage native Docker Volume Driver operations through a command-line interface (CLI).
Amazon Elastic Block Store (Amazon EBS) provides block-level storage volumes for Amazon Elastic Cloud (EC2) instances. Amazon EBS volumes can be attached to any running EC2 instance hosted in the same Amazon availability zone to provide persistent storage that is independent of the deployed instance. EBS storage volumes can be exposed using NVMe (non-volatile memory express) as a host controller interface and storage protocol. NVMe devices enable you to accelerate the transfer of data between nodes and solid-state drives (SSDs) over a computer’s connection gateway.
With this release, DC/OS updates REX-Ray to support NVMe storage when the DC/OS cluster runs on an Amazon instance. To work with NVMe devices, however, you must provide your own
udev
rules andnvme-cli
package. For more information about using Rex-Ray, see the REX-Ray website and github repository.
Third-party updates and compatibility
-
Updates the ZooKeeper package for DC/OS to release version 3.4.14 (DCOS_OSS-4988).
-
Updates support for OpenSSL to release version 1.0.2r (DCOS_OSS-4868).
-
Updates the
urllib3
to version 1.24.2 to address the security vulnerability identified for Python when working with CA certificates in CVE-2019-11324 (DCOS-52210).
Known issues and limitations
This section covers any known issues or limitations that don’t necessarily affect all customers, but might require changes to your environment to address specific scenarios. The issues are grouped by feature, functional area, or component. Where applicable, issue descriptions include one or more issue tracking identifiers.
About DC/OS 1.12
DC/OS 1.12 includes many new features and capabilities. The key features and enhancements focus on:
- Mesosphere Kubernetes engine
- Mesosphere Jupyter service
- Observability and metrics
- Private package registry
- Installation and upgrade improvements
- LDAP and networking enhancements
Mesosphere Kubernetes engine
- Introduced High Density Multi-Kubernetes (HDMK) that allows operators to take advantage of intelligent resource pooling when running multiple Kubernetes clusters on DC/OS. Compared with other Kubernetes distributions that run a single Kubernetes node per virtual machine, Mesosphere HDMK uses its intelligent resource pooling to pack multiple Kubernetes nodes onto the same server for bare metal, virtual machine, and public cloud instances, driving significant cost savings and resource efficiencies. Learn more about Kubernetes on DC/OS.
Mesosphere Jupyter service (MJS)
- Delivered secure, cloud-native Jupyter Notebooks-as-a-Service to empower data scientists to perform analytics and distributed machine learning on elastic GPU-pools with access to big and fast data services.
- Secured connectivity to data lakes and data sets on S3 and (Kerberized) HDFS.
- Included GPU-enabled Spark and distributed TensorFlow.
- Provided OpenID connect authentication and authorization with support for Windows Integrated Authentication (WIA) and Active Directory Federation Services (ADFS).
Observability and metrics
- Introduced a flexible and configurable metrics pipeline with multiple output formats.
- Enhanced support for application metric types including histograms, counters, timers, and gauges.
- Provided support for sample rates and multi-metrics packets.
- Introduced Mesos framework metrics.
- No longer require modifications when collecting metrics via Prometheus endpoint in 1.11.
Enterprise
Private package registry- Enabled on-premise package distribution and management.
- Enabled air-gapped Virtual Private Cloud package management.
- Simplified package artifact management.
- Introduced package-specific controls for adding/removing/updating packages within a cluster.
- Introduced package management CLI.
Installation and upgrade
- Provided full support for installing and operating a cluster with a Security-Enhanced Linux (SELinux) operating system, allowing you to use SELinux in targeted-enforcing mode for all hardened non-DC/OS components.
- Introduced a unified Terraform-based open source tool for provisioning, deploying, installing, upgrading, and decommissioning DC/OS on AWS, GCP, and Azure.
- Introduced an intuitive, streamlined installation with a quick start process - Spin up a DC/OS cluster with a few easy steps in 10-15 minutes.
- Officially recommended as a Mesosphere supported installation method with best practices built-in (i.e sequential masters & parallel agents in upgrade).
- Restructured Mesosphere installation documentation to organize Mesosphere supported installation methods and Community supported installation methods.
- Expanded DC/OS upgrade paths enable Mesosphere to skip specific upgrade paths within a supported patch version of DC/OS (i.e upgrade from 1.11.1 => 1.11.5 in one move) and to skip upgrade paths between supported major to major versions of DC/OS (for example, enabling you to upgrade from 1.11.7 to 1.12.1 in one move).
Enterprise
LDAP and networking enhancements- Introduced anonymous LDAP bind complies with standardized Enterprise LDAP integration pattern without a dedicated DC/OS integration LDAP user.
- Provided dynamic LDAP synchronization to synchronize LDAP user account groups automatically without manual synchronization of LDAP directory with accounts imported into DC/OS.
- Enhanced networking component with 150+ bug fixes with limited logging for visibility.
- Improved DNS convergence time (sub-sec) performance.
- Configured MTU for Overlay networks.
- Provided reusable IP addresses for new agents in the cluster.
- Mitigation of networking stuck-state due to SSL deadlock in Erlang library.
- Provided TLS 1.2 support.
- Provided support for per container network Metrics.
- Leveraged persistent connections in Edge-LB for L7 load-balancing. Enterprise
- Improved logging in Edge-LB. Enterprise