March 2020 – Hassan Azhar

JFokus 5th February 2020 (Day 2)

Posted on March 8, 2020March 8, 2020 by hassanazh in Software

The second day started with breakfast and coffee in the Waterfront, Stockholm, Sweden.

CloudState—Towards Stateful Serverless

Cloudstate-towards-stateful-serverless Download

This session was presented by Jonas Bonér, Lightbend Inc. (jboner).

The Serverless experience is revolutionary and will grow to dominate the future of Cloud Computing. Function-as-a-Service (FaaS) is, however—with its ephemeral, stateless, and short-lived functions—is only the first step. FaaS is great for processing-intensive, parallelizable workloads, moving data from A to B providing enrichment and transformation along the way. But it is quite limited and constrained in what use-cases it addresses well, which makes it very hard/inefficient to implement general-purpose application development and distributed systems protocols.

What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job.

Cloudstate is an open-source Apache project. Cloudstate is a specification, protocol, and reference implementation for providing distributed state management patterns suitable for Serverless computing. The current supported and envisioned patterns include:

Event Sourcing
Conflict-Free Replicated Data Types (CRDTs)
Key-Value storage
P2P messaging
CQRS read side projections

Cloudstate makes stateful serverless application easy and lets’ the use focus on the business logic, data model and workflow.

Services in any language that supports gRPC, and with language-specific libraries provided that allow idiomatic use of the patterns in each language is supported by Cloudstate, that makes this polyglot. Cloudstate can be used either by itself, in combination with a Service Mesh, or it is in envisioned that it will be integrated with other Serverless technologies
Cloudstate is Polystate, as it is based on Powerful state models—Event Sourcing, CRDTs, Key-Value
Cloudstate is PolyDB, Supports SQL, NoSQL, NewSQL and in-memory replication
Leveraging Akka, gRPC, Knative, GraalVM, running on Kubernetes

In short, Cloudstate manages:

Complexities of Distributed and Concurrent systems
Distributed State—Consistency, Replication, Persistence
Databases, Service Meshes, and other infrastructure
Message Routing, Scalability, Fail-over & Recovery
Running & Operating your application

High Level Architecture

The Cloudstate reference implementation is built on top of Kubernetes, Knative, Graal VM, gRPC, and Akka, with a growing set of client API libraries for different languages. Inbound and outbound communication is always going through the sidecars over gRPC channel using a constrained and well-defined protocol, in which the user defines commands in, events in, command replies out, and events out. Communicating over a gRPC allows the user code to be implemented in different languages (JavaScript, Java, Go, Scala, Python, etc.) [Reference].

The stateful service is supported by Akka cluster having Akka actors. The user, however doesn’t go through all the complexities and protected by Akka sidecars which interact the user code with the backend state and cluster management.

The Hacker’s Guide to JWT Security

the-hackers-guide-to-jwt-security-1 Download

This session was conducted by Patrycja Wegrzynowicz, Yon Labs, (yonlabs)

JSON Web Token (JWT) is an open standard for creating tokens that assert some number of claims like a logged-in user and his/her roles. JWT is widely used in modern applications as a stateless authentication mechanism. Therefore, it is important to understand JWT security risks, especially when broken authentication is among the most prominent security vulnerabilities according to the OWASP Top 10 list.

JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed. JWTs can be signed using a secret (with the HMAC algorithm) or a public/private key pair using RSA or ECDSA [Reference].

This session was based on four demos, each demo showed how a JWT can be hacked and used by others with different algorithms. Those demos explained various security risks of JWT, including confidentiality problems, vulnerabilities in algorithms and libraries, token cracking, token sidejacking, and more. These demos also showed common mistakes and vulnerabilities along with the best practices related to the implementation of JWT authentication and the usage of available JWT libraries.

Recommendations based on those demos:

Know your JWT library
Always use a specific algorithm and a key during verification
Always set an expiration time
Use algorithm with higher bit size

Living on the Cloud’s Edge

living-on-the-cloud-s-edge Download

This session was presented by Rustam Mehmandarov, Computas AS, (rmehmandarov) and Tannaz N. Roshandel, University of Oslo (tannaznvr)

Edge computing helps to break beyond the limitations imposed by, now, traditional cloud solutions. Some of the reasons might be privacy concerns, reducing the need for heavy processing resources, reducing the amount of data that is sent over the network – just to mention a few.

IoT Edge is a layer between Fog and IoT. Fog is the network to transfer data. Edge computing is computing that’s done at or near the source of the data, instead of relying on the cloud at one of a dozen data centers to do all the work.

In the live demo, they used a system built using the Google Coral IoT Edge device. That was an example to process video streams on the Edge and designed to keep the privacy of the people intact; without leaking faces and identity of the people to a 3rd party.

Organisation Refactoring and Culture Hacking – Lessons from Software

org-structure-refactoring-and-culture-hacking Download

This session was presented by Andrew Harmel-Law, ThoughtWorks, (al94781).

This session was based on a case study of the presenter. Where presenter was promoted to the manager level to manage a big group of people. He faced difficulties, in the beginning, to manage the workload because of different queries and requests to attend different courses/conferences via email. In the beginning, he decided to forward all mails to his HR department. Soon after this, he started receiving queries from the HR department about the technical courses and conferences which required approval of the manager. That brought him back to the original problem he was facing.

According to the presenter:

“Hacking teams is almost as much fun as hacking code”

The organization structure is best served by being in a constant state of (incremental) change.
The best people to drive these changes are those closest to the action – us, the makers.
Our existing maker skills are ideally suited for this work.

Refactoring and hacking is based on five steps

I. Map the Human Architecture: Group different roles and skills in a circle based on purpose and perspective. In this regard don’t overestimate existing understanding of how the org works. It’s a map of existing power and influence. Openness builds trust so be transparent and open to the staff.

II. Read the Dynamic System: Add a default response if there is a query or request of same pattern. In this regards, presenter gave the example that he auto approved the request for certain courses and conferences to reduce the load of replying each query.

III. Make the Right Change: Sometimes, a small change can have a huge impact. Observe the whole dynamic system and maintain the quality. Always watch out for feedback.

IV. Kill Consensus: To kill consensus, anyone can make any decision, after seeking advice from everyone who will be meaningfully affected, and those with expertise discuss the solution with experts even if they have done things a long time ago.

V. Beyond Delegation: This is based on the Toyota way of working. The manager should put responsibilities on the staff he/she is managing. The manager should not disappear, but start disappearing to let people take responsibilities. It’s all about power, the more power to take the decision to be given to the employees they will take more responsibilities. Devolution beats delegations, which means that spreading the problem with the staff rather than imposing the solution on them

Power is normally shared between managers and leaders, they should mentor the people to let them make the decision on their own. This will help in building the confidence of the people. It is safer to transfer power in a trustworthy way. Right changes in an end-to-end dynamic system are important to make some tiny tiny improvements. These improvements are meant for everyone not for any specific individual.

To improve the roles and groups, invite co-owners, new owners and show confidence in let them make changes. Let the hierarchy emerge as, where and when required

Refactoring or hack, doesn’t matter if things are improving

Globally Distributed SQL Databases FTW

globally-distributed-sql-databases-ftw Download

This session was presented by Henrik Engström, Kindred Group, (h3nk3).

When Google published the paper “F1: A Distributed SQL Database That Scales” in 2013 it set off a new type of database referred to as “Distributed SQL Databases”. The premise was to be able to use ACID transactions in a truly distributed database – something that was considered a pipedream before then. The main driver for F1, which has served as a model for several on-prem and cloud-based offsprings, was that Google realized that their engineers’ built systems were error-prone and overly complex when using eventual consistency.

Personally, I have invested a non-trivial portion of my career as a strong advocate for the implementation and use of platforms providing guarantees of global serializability.
Life beyond Distributed Transactions: an Apostate’s Opinion [2007]

The evolution of consistency from strong to eventual consistency is based on the ACID definition.

In strong consistency, consistency is associated with RDBMS and ACID is
– Atomicity – all or nothing
– Consistency – no violating constraints
– Isolation – exclusive access
– Durability – committed data survives crashes
In eventual consistency, consistency is associated with No SQL, and ACID is
– Associative – Set().add(1).add(2) === Set().add(2).add(1)
– Commutative – Math.max(1,2) === Math.max(2,1)
– Idempotent – Map().put(“a”,1).put(“a”,1) === Map().put(“a”,1)
– Distributed – symbolical meaning’

CockroachDB – inspired by F1/Spanner and used at Kindred – to see how one can implement systems using a globally distributed database that simultaneously provides developers with ACID properties.

CockroachDB scales horizontally without reconfiguration or need for a massive architectural overhaul. Simply add a new node to the cluster and CockroachDB takes care of the underlying complexity.
CockroachDB allows you to deploy a database on-prem, in the cloud or even across clouds, all as a single store. It is a simple and straightforward bridge to your future, cloud-based data architecture.
CockroachDB delivers an always-on and available database designed so that any loss of nodes is consumed without impact to availability. It creates & manages replicas of your data to ensure reliability.
CockroachDB is the only database in the world that enables you to attach ‘location’ to your data at the row level. This capability allows you to regulate the distance between your users and their data.

Performance

performance Download

This session was conducted by Chris Thalinger, Twitter, (christhalinger) and this was a quickie i.e. of 15 minutes.

In today’s Software Development world the number one demand from employers is to deliver features as soon as possible. Everything else is secondary. That means engineers are doing only one thing: writing new code, debugging, and writing new code again. And most of the time this code is running in one of the very convenient clouds. Rarely anyone ever stops and thinks about performance as a whole. If performance is an issue the to-go solution is to throw more money at it. Which usually means buying more computing power in the cloud. Adding more computing becomes wasteful and non-environment friendly. We should find out how to optimize the code to avoid this wasteful.

The presenter presented different companies like Google, Microsoft, AMAZON, and Twitter about how they use renewable energy and trying to reduce waste.

STOP EVERY NOW AND THEN AND THINK ABOUT THE IMPACT OF YOUR WORK

Lessons Learned from the 737 Max

This session was presented by Ken Sipe, D2iQ (kensipe).

There were two fatal crashes of the Boeing 737 Max in the fall of 2018 and spring of 2019 grounding the airplane worldwide and begging the question of why? In the end, it comes down to software but there is much more to that story. [Redacted], the presenter in this session was in the unique position of being an instrument-rated private pilot and a software engineer with experience working with remote teams, both will provide insight into lessons we will learn as we peel back the details of these tragic events.

In this session, the presenter presented about aircraft types and how they affect decisions of the airline industry from pilot scheduling, plane schedules, innovation, and profits. An airplane design from 1994 causes challenges in 2018-2019 that resulted in a software solution to a hardware problem of design. The presenter presented different rules and regulations from USA FAA relinquishing quality standards to Boeing because of man-power and costs. This session also focused on what a pilot does and expects and what the MCAS system did by design.

Lessons from this study learned are:

Fail-safe: fail-safe is better than foolproof. Failing safe allows users to undo when they do anything wrong. Provide cross-checks, which was missing in the case of 737 Max.
Provide all necessary warnings to the end-user. In this case, these were disabled and pilots were not able to see the warning.
Reduce workload: The workload was one reason to have faulty software. High workload may result in task drop or less task performance.
Safety is assumed: Make it part of your requirements.
Politics: Be aware when requirements are not technical
Documentation: Documentation is essentially required and it very important to give importance to documentation.
Cheap is expensive: The development was done at a very cheap rate and no senior developer was involved. To have reliable software don’t go only for the cheap developers.

JFokus 4th February 2020 (Day 1)

Posted on March 7, 2020March 8, 2020 by hassanazh in Software

JFokus is Sweden’s largest developers conference. This year JFokus, Stockholm Sweden, was conducted on 2020-02-04 and 2020-02-05 at Stockholm Waterfront Conference. Jfokus is all about developers! Java, Frontend & Web. Continuous Delivery & DevOps, Internet of Things & Artificial Intelligence, Cloud & Big Data, Future & Trends, Alt.JVM Languages like Scala, Kotlin & more, Agile development.

The chosen theme of this year’s conference was Star wars. The conference was started with a very good animation of star wars.

Keynote: Connecting Developers and Digital Media

The conference was started with a keynote by Danielle Banks, weather.com, (daniellebankstv). She discussed many environmental changes going on in the world at the moment. One example of that was the hurricane on 2019-09-02 and on that day there was a record number of hits on the weather.com website.

In her keynote, she mentioned that in weather.com they not only post weather updates but also new scientific studies, animal stories, and human interest stories. There is a great team of developers involved in the back-end to provide the efficient and fast service of weather.com. This keynote explained the connection between the ever-growing world of digital media and developers. It is the virtual, plus literal machinery that drives the delivery of news and weather content to the world.

Weather.com also provides the opportunity for the developers to innovate and join the Challenge Community. “Call for Code” is an incredible initiative to create projects that prepare communities for natural disasters. Meanwhile, the Global High-Resolution Atmospheric Forecasting System (GRAF) is paving the way for everyone in the world to have access to the best weather data.

Keynote: Graphs from Malmö to Panama and Beyond

Another keynote by Emil Eifrem, Neo4j (emileifrem), was about how Swedish born open-source data technology was used in one of the biggest global financial crime news stories in the journalists’ history with 2.6 terabytes data in 11.5 million files spanning 40 years of records on more than 210 K companies.

This Panama leaks embarrassed many prime ministers and many famous personalities from different fields of life. In this discovery, more than 370 journalists, more than 100 media organizations, and 80 countries were involved. This discovery took almost one year to compile. The main contribution was that there were only three data journalists and three data scientists were involved in the back end. The leaked impact might not be this big if it would have been released 10 years ago because of media and technology advancement. Neo4j is a graph-based database and used graph algorithms.

Neo4j is a highly scalable native graph database, purpose-built to leverage not only data but also data relationships.

Getting the best out of Spring Cloud, Kubernetes, and Istio

PDF Presentation Download

This session was presented by Magnus Larsson, Callista (callistaent). This session answered the question of why three different tools are required for Developing, testing, deploying, and managing a system landscape of cooperating micro-services that can be challenging, to say the least. The reasons behind using these tools are:

Ease of scale
Faster releases
Form a distributed system
Using stateless architecture.

There are multiple challenges involved in this distributed architecture. Where main challenge is find the way that can help in keeping the track of the changes.

To tackle the challenges mentioned in the image above, opensource comes to rescue. Moreover, by using only one tool it is not possible to tackle all the challenges. Therefore more than one tool is required which may have some overlapping features.

Spring Cloud provides tools for developers to quickly build some of the common patterns in distributed systems (e.g. configuration management, service discovery, circuit breakers, intelligent routing, micro-proxy, a control bus, one-time tokens, global locks, leadership election, distributed sessions, cluster state) [Reference].

Kubernetes is a portable, extensible, open-source platform for managing containerized workloads and services, that facilitates both declarative configuration and automation [Reference].

Istio makes it easy to create a network of deployed services with load balancing, service-to-service authentication, monitoring, and more, with few or no code changes in service code [Reference].

Overlapping capabilities of spring cloud, Kubernetes and Istio

How to migrate an existing application to serverless?

JFokus 2020 – How to migrate an application to serverless from Marcia Villalba

This session by Marcia Villalba, AWS (mavi888uy) was about migrating an existing application to serverless and without knowing from where to start. This was my personal one of the favorite sessions because this was based on the real-time use cases that the speaker has faced in the market. First of all what is serverless? This is defined by Jeremy Daly

Serverless is methodology for planning, building and deploying software in a way that maximizes value by minimizing undifferentiated heavy lifted.
https://www.jeremydaly.com/serverless/

Serverless functions as a service (FAAS) for this AWS provide a tool i.e. AWS Lambda which lets you run code without provisioning or managing servers. You pay only for the compute time you consume.

Previously serverless was only backend as a service (BAAS) which includes AWS S3 and SNS.

Marcia presented use case to migrate an existed monolith application to serverless in a company. A number of steps were discussed for this migration. First and foremost finding the unknowns that included technology limitations and figuring out will this technology solve my problem. To cater to this in a good way an expert consultant should be hired who has experience of already working in a similar domain. Then come the foundational work that includes picking up the language, developer tools, and deployment frameworks.

Follow the infrastructure as code (IOC) process for managing and provisioning computer data centers through machine-readable definition files. The biggest benefit of IOC is that it will minimize the risk and allow you to make infrastructural changes repeatable and predictable. In IOC release using the same tool as code changes. AWS provides a very good opensource framework for this known as SAM (as explained).

Also, provide a good CI/CD approach and there are very good tools from AWS available. For monitoring and observability there are AWS tools that are event-driven, have a life of their own and provide good log watch (For more info check AWS summit page).

The migration strategy says that don’t rewrite big code and always do that in short circles. Follow the Strangler pattern, first put full monolith application in the Lamba and then check out the Seams in the code. The Seam is that part of the code that can be treated in isolation and can work without impacting the rest of the codebase. Then decompose the repository layer based on the microservice. Make sure that each service can run on its own. Always start with the least critical part of the system.

Machine Learning using Kubeflow and Kubernetes

This session by Arun Gupta, AWS, (arungupta) was about deployment of machine learning (ML) workflows on Kubernetes using Kubeflow.

According to machine learning 101, perform training using train data and then check accuracy and optimization using test data. Once the model is trained, provide interference using input data and verify how good the predictions are. All of these predictions are based on the training data that was used to train the model.

The AWS ML stack provides ML frameworks and infrastructure at the bottom and it contains very small ML code but heavy-duty tasks are to circulate on this layer which is managed by data scientists.

The containerized ML layer contains the container services at this layer. Kubernetes is used to containerized the ML. The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable [Reference]. Check eksworkshop.com

Then there are ML and AI services running over this.

The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It shares data resources and can be used by multiple data scientists. This application supports multiple languages.

Kubeflow Fairing is a Python package that streamlines the process of building, training and deploying machine learning (ML) models in a hybrid cloud environment.

Katib is a Kubernetes Native System for Hyperparameter Tuning and Neural Architecture Search. The system is inspired by Google vizier and supports multiple ML/DL frameworks (e.g. TensorFlow, MXNet, and PyTorch).

KFServing provides a Kubernetes Custom Resource Definition for serving machine learning (ML) models on arbitrary frameworks.

RSocket – Future Reactive Protocol

This was a quickie from Oleh Dokuka, Netifi, (OlehDokuka). RSocket is a new application-level protocol capable of reactive streaming that can be used to simplify the way enterprises build and operate cloud-native applications. It enables traditional enterprise developers to build sophisticated, cloud-native, distributed applications. RSocket is transport agnostic and can be used on top of any transport protocol like TCP/Aeron or even on top of HTTP/2 or WebSocket. RSocket has various interaction models so it can satisfy the needs of today’s applications. The protocol is agnostic when it comes to programming languages, message formats, and API architecture. Any developer can use RSocket to meet all business requirements. RSocket simplifies life for any startup or enterprise, whether it is used in a protocol implementation like Java, C#, C++, JavaScript or deployed in an RPC framework.

RSocket can be used for stable microservices, binary messaging and create a single connection end and then create a number of multiplexing. It provides flexibility as it could run on top of any layer and it can use any reliable transport.

JVMs in Containers – Best Practices

jvms-in-containers-best-practices Download

This session is by David Delabassee, Oracle (delabassee).

The container is a packaged software into standardized units that makes development, shipment, and deployment easy. While VM requires more memory and computational resources. Main examples of containers are Docker, CRI-O, LXC, Rkt, runC, systemd-nspawn, OpenVZ, etc

This session was very well explained with a small demo of the hello world program in a container. Based on that the size of the small project was around 450 MB. The main reason to have a big size of the container image is the JAVA size. Always choose the JAVA image with a smaller size.

There are three stack of layers in the container.

Java application and its dependencies
This contains the application that should be containerized with all of its dependencies.
Java Runtime Layer
It is a serverless JAVA function. Since JAVA 9 the JRE has been removed that reduced the size of JDK.

By reducing the number of modules which should be part of Java runtime, the size of the image can be reduced.

openjdk:13 (12 modules) 88 MB
–strip-debug –strip-java-debug-attributes -14 MB
–compress=1 -18 MB
–compress=2 -31 MB
–no-header-file –no-man-pages 0 MB

3. Operating System Layer
This layer describes the operating system that has been containerized with the application and JAVA. There is a number of very slim images for operating systems like Alpine, Musl, etc.

The latest JAVA versions have better startup time.

Class data sharing is another way to improve the container. This is available since JAVA 5. This will reduce memory footprint between multiple JVMs by sharing common class metadata and improves startup time.

loads classes from JAR file into a private internal representation
Dumps it to a shared archive
When JVMs (re)starts, the archive is memory-mapped to allow sharing of R/O
JVM metadata for these classes among multiple JVMs’

GraalVM is a High-performance Polyglot Virtual Machine.

GraalVM is a universal virtual machine for running applications written in JavaScript, Python, Ruby, R, JVM-based languages like Java, Scala, Groovy, Kotlin, Clojure, and LLVM-based languages such as C and C++.

GraalVM removes the isolation between programming languages and enables interoperability in a shared runtime. It can run either standalone or in the context of OpenJDK, Node.js or Oracle Database.

Cubes, Hexagons, Triangles, and More: Understanding Microservices

JFokus: Cubes, Hexagons, Triangles, and More: Understanding Microservices from Chris Richardson

This session was by Chris Richardson, Chris Richardson Consulting, Inc (crichardson)

This session describes the essential characteristics of micro-services, that explains how a successful microservice architecture consists of loosely coupled services with stable APIs that communicate asynchronously. I will cover strategies for effectively testing micro-services. Micro-service architecture is becoming increasingly important. The goal of this presentation was the essential characteristics of microservices architecture.

Software is eating the world and market place is volatile and changing at a rapid pace. There it requires quick and reliable delivery as well. That leads to high velocity and highly reliable software. Successful applications survive more time in the industry, and with changes in the technology, it requires quick changes and should be modernized easily. Success triangle focus on the process (DevOps and CI/CD), organization (Small and autonomous), and architecture that should be efficient enough to make a product that can cope with fast changes and survive in the market for a longer time.

Scale cube focus on the -ities of software engineering like testability, deployability, maintainability, etc. Successful applications have a habit of growth that can result in team growing and codebase growth. In such a situation a monolith application becomes harder to maintain. For such scenarios, microservices-based architecture is the best architecture. In this architecture, services are broken into small size with small size teams to maintain it and a dedicated database. A microservice is highly maintainable, loosely coupled, independently deployed and owned by small teams. The development of a microservice-based architecture system starts with a service per team.

Major drawbacks of the microservice environment are the complexity as it requires the number of deployments for each service, detailed testing, identifying the boundaries as two services should not have the same functionality and transferring a monolith application to the microservice-architecture.

A hexagonal architecture also is known as ports and adapters architecture. It provides a layered approach where an API is accessible to the outside world and business logic is implemented at the backend.

Microservice architecture is based on multiple services with collaborating with each other, these services should be loosely coupled. Coupling could be run-time coupling where a service is dependent on the response of another service. Also, the coupling could be design-time coupling where a change in one service requires a change in another service. The design time coupling requires communication between teams that can slow down the development.

The design iceberg services are based on small services based APIs which are accessible to the outside world and large complex implementation is hidden from the outside world with a separate database based on each service. Avoid using the shared database for multiple services.

The runtime coupling requires synchronous communication, which results in reduced availability. A self-contained service can handle synchronous requests without waiting for any other service. Usage of asynchronous messaging improves the availability of the service. The number of messaging services could be used for asynchronous messaging like Kafka, JMS queues, ActiveMQ, etc.

Microservices enable DevOps, which requires automated testing. Using a microservice environment without automated tests is self-defeating and very risky. There is a very close relation between testing and microservices architecture. It demands a good level of automated testing and a separate deployment pipeline for each service. This deployment pipeline must contain autotests to test the service at different levels. Try to do testing at the service level and try to avoid an end to end testing because it could become a bottle-neck in the system.

Consumer-driven contract testing is a very good way to test the communication of different services in isolation. In contract testing, the assumption is that if consumers and providers are tested separately with the same contractual tests then consumers and providers can communicate.

Testing in production could be challenging and slow, therefore deploy separately and test that code separately in deployment before release. Automate the deployment, rollback, and roll forward. To test code in the production test deployed code before release by using the following steps:

Deploy newer version alongside the old version
Test newer version
Route test traffic to a newer version
Release for the small number of users in production
Monitor tests
Release for production if working as expected
Rollback if it fails.

End of day one. Please go thorough day two activities in a separate post.