JFokus 5th February 2020 (Day 2)

The second day started with breakfast and coffee in the Waterfront, Stockholm, Sweden.

CloudState—Towards Stateful Serverless

This session was presented by Jonas Bonér, Lightbend Inc. (jboner).

The Serverless experience is revolutionary and will grow to dominate the future of Cloud Computing. Function-as-a-Service (FaaS) is, however—with its ephemeral, stateless, and short-lived functions—is only the first step. FaaS is great for processing-intensive, parallelizable workloads, moving data from A to B providing enrichment and transformation along the way. But it is quite limited and constrained in what use-cases it addresses well, which makes it very hard/inefficient to implement general-purpose application development and distributed systems protocols.

What’s needed is a next-generation Serverless platform and programming model for general-purpose application development in the new world of real-time data and event-driven systems. What is missing is ways to manage distributed state in a scalable and available fashion, support for long-lived virtual stateful services, ways to physically co-locate data and processing, and options for choosing the right data consistency model for the job.

Cloudstate is an open-source Apache project. Cloudstate is a specification, protocol, and reference implementation for providing distributed state management patterns suitable for Serverless computing. The current supported and envisioned patterns include:

  • Event Sourcing
  • Conflict-Free Replicated Data Types (CRDTs)
  • Key-Value storage
  • P2P messaging
  • CQRS read side projections

Cloudstate makes stateful serverless application easy and lets’ the use focus on the business logic, data model and workflow.

  • Services in any language that supports gRPC, and with language-specific libraries provided that allow idiomatic use of the patterns in each language is supported by Cloudstate, that makes this polyglot. Cloudstate can be used either by itself, in combination with a Service Mesh, or it is in envisioned that it will be integrated with other Serverless technologies
  • Cloudstate is Polystate, as it is based on Powerful state models—Event Sourcing, CRDTs, Key-Value
  • Cloudstate is PolyDB, Supports SQL, NoSQL, NewSQL and in-memory replication
  • Leveraging Akka, gRPC, Knative, GraalVM, running on Kubernetes

In short, Cloudstate manages:

  • Complexities of Distributed and Concurrent systems
  • Distributed State—Consistency, Replication, Persistence
  • Databases, Service Meshes, and other infrastructure
  • Message Routing, Scalability, Fail-over & Recovery
  • Running & Operating your application

High Level Architecture

The Cloudstate reference implementation is built on top of Kubernetes, KnativeGraal VMgRPC, and Akka, with a growing set of client API libraries for different languages. Inbound and outbound communication is always going through the sidecars over gRPC channel using a constrained and well-defined protocol, in which the user defines commands in, events in, command replies out, and events out. Communicating over a gRPC allows the user code to be implemented in different languages (JavaScript, Java, Go, Scala, Python, etc.) [Reference].

serving_stateful_functions.png

The stateful service is supported by Akka cluster having Akka actors. The user, however doesn’t go through all the complexities and protected by Akka sidecars which interact the user code with the backend state and cluster management.

Powered by gRPC and Akka sidecars

The Hacker’s Guide to JWT Security

This session was conducted by Patrycja Wegrzynowicz, Yon Labs, (yonlabs)

JSON Web Token (JWT) is an open standard for creating tokens that assert some number of claims like a logged-in user and his/her roles. JWT is widely used in modern applications as a stateless authentication mechanism. Therefore, it is important to understand JWT security risks, especially when broken authentication is among the most prominent security vulnerabilities according to the OWASP Top 10 list.

JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed. JWTs can be signed using a secret (with the HMAC algorithm) or a public/private key pair using RSA or ECDSA [Reference].

This session was based on four demos, each demo showed how a JWT can be hacked and used by others with different algorithms. Those demos explained various security risks of JWT, including confidentiality problems, vulnerabilities in algorithms and libraries, token cracking, token sidejacking, and more. These demos also showed common mistakes and vulnerabilities along with the best practices related to the implementation of JWT authentication and the usage of available JWT libraries.

Recommendations based on those demos:

  1. Know your JWT library
  2. Always use a specific algorithm and a key during verification
  3. Always set an expiration time
  4. Use algorithm with higher bit size

Living on the Cloud’s Edge

This session was presented by Rustam Mehmandarov, Computas AS, (rmehmandarov) and Tannaz N. Roshandel, University of Oslo (tannaznvr)

Edge computing helps to break beyond the limitations imposed by, now, traditional cloud solutions. Some of the reasons might be privacy concerns, reducing the need for heavy processing resources, reducing the amount of data that is sent over the network – just to mention a few.

IoT Edge is a layer between Fog and IoT. Fog is the network to transfer data. Edge computing is computing that’s done at or near the source of the data, instead of relying on the cloud at one of a dozen data centers to do all the work.

In the live demo, they used a system built using the Google Coral IoT Edge device. That was an example to process video streams on the Edge and designed to keep the privacy of the people intact; without leaking faces and identity of the people to a 3rd party.

Organisation Refactoring and Culture Hacking – Lessons from Software

This session was presented by Andrew Harmel-Law, ThoughtWorks, (al94781).

This session was based on a case study of the presenter. Where presenter was promoted to the manager level to manage a big group of people. He faced difficulties, in the beginning, to manage the workload because of different queries and requests to attend different courses/conferences via email. In the beginning, he decided to forward all mails to his HR department. Soon after this, he started receiving queries from the HR department about the technical courses and conferences which required approval of the manager. That brought him back to the original problem he was facing.

According to the presenter:

“Hacking teams is almost as much fun as hacking code”

  1. The organization structure is best served by being in a constant state of (incremental) change.
  2. The best people to drive these changes are those closest to the action – us, the makers.
  3. Our existing maker skills are ideally suited for this work.

Refactoring and hacking is based on five steps

I. Map the Human Architecture: Group different roles and skills in a circle based on purpose and perspective. In this regard don’t overestimate existing understanding of how the org works. It’s a map of existing power and influence. Openness builds trust so be transparent and open to the staff.

II. Read the Dynamic System: Add a default response if there is a query or request of same pattern. In this regards, presenter gave the example that he auto approved the request for certain courses and conferences to reduce the load of replying each query.

III. Make the Right Change: Sometimes, a small change can have a huge impact. Observe the whole dynamic system and maintain the quality. Always watch out for feedback.

IV. Kill Consensus: To kill consensus, anyone can make any decision, after seeking advice from everyone who will be meaningfully affected, and those with expertise discuss the solution with experts even if they have done things a long time ago.

V. Beyond Delegation: This is based on the Toyota way of working. The manager should put responsibilities on the staff he/she is managing. The manager should not disappear, but start disappearing to let people take responsibilities. It’s all about power, the more power to take the decision to be given to the employees they will take more responsibilities. Devolution beats delegations, which means that spreading the problem with the staff rather than imposing the solution on them

Power is normally shared between managers and leaders, they should mentor the people to let them make the decision on their own. This will help in building the confidence of the people. It is safer to transfer power in a trustworthy way. Right changes in an end-to-end dynamic system are important to make some tiny tiny improvements. These improvements are meant for everyone not for any specific individual.

To improve the roles and groups, invite co-owners, new owners and show confidence in let them make changes. Let the hierarchy emerge as, where and when required

Refactoring or hack, doesn’t matter if things are improving

Globally Distributed SQL Databases FTW

This session was presented by Henrik Engström, Kindred Group, (h3nk3).

When Google published the paper “F1: A Distributed SQL Database That Scales” in 2013 it set off a new type of database referred to as “Distributed SQL Databases”. The premise was to be able to use ACID transactions in a truly distributed database – something that was considered a pipedream before then. The main driver for F1, which has served as a model for several on-prem and cloud-based offsprings, was that Google realized that their engineers’ built systems were error-prone and overly complex when using eventual consistency.

Personally, I have invested a non-trivial portion of my career as a strong advocate for the implementation and use of platforms providing guarantees of global serializability.

Life beyond Distributed Transactions: an Apostate’s Opinion [2007]

The evolution of consistency from strong to eventual consistency is based on the ACID definition.

In strong consistency, consistency is associated with RDBMS and ACID is
– Atomicity – all or nothing
– Consistency – no violating constraints
– Isolation – exclusive access
– Durability – committed data survives crashes
In eventual consistency, consistency is associated with No SQL, and ACID is
– Associative – Set().add(1).add(2) === Set().add(2).add(1)
– Commutative – Math.max(1,2) === Math.max(2,1)
– Idempotent – Map().put(“a”,1).put(“a”,1) === Map().put(“a”,1)
– Distributed – symbolical meaning’

CockroachDB – inspired by F1/Spanner and used at Kindred – to see how one can implement systems using a globally distributed database that simultaneously provides developers with ACID properties.

  1. CockroachDB scales horizontally without reconfiguration or need for a massive architectural overhaul. Simply add a new node to the cluster and CockroachDB takes care of the underlying complexity.
  2. CockroachDB allows you to deploy a database on-prem, in the cloud or even across clouds, all as a single store. It is a simple and straightforward bridge to your future, cloud-based data architecture.
  3. CockroachDB delivers an always-on and available database designed so that any loss of nodes is consumed without impact to availability. It creates & manages replicas of your data to ensure reliability.
  4. CockroachDB is the only database in the world that enables you to attach ‘location’ to your data at the row level. This capability allows you to regulate the distance between your users and their data.

Performance

This session was conducted by Chris Thalinger, Twitter, (christhalinger) and this was a quickie i.e. of 15 minutes.

In today’s Software Development world the number one demand from employers is to deliver features as soon as possible. Everything else is secondary. That means engineers are doing only one thing: writing new code, debugging, and writing new code again. And most of the time this code is running in one of the very convenient clouds. Rarely anyone ever stops and thinks about performance as a whole. If performance is an issue the to-go solution is to throw more money at it. Which usually means buying more computing power in the cloud. Adding more computing becomes wasteful and non-environment friendly. We should find out how to optimize the code to avoid this wasteful.

The presenter presented different companies like Google, Microsoft, AMAZON, and Twitter about how they use renewable energy and trying to reduce waste.

STOP EVERY NOW AND THEN AND THINK ABOUT THE IMPACT OF YOUR WORK

Lessons Learned from the 737 Max

This session was presented by Ken Sipe, D2iQ (kensipe).

There were two fatal crashes of the Boeing 737 Max in the fall of 2018 and spring of 2019 grounding the airplane worldwide and begging the question of why? In the end, it comes down to software but there is much more to that story. [Redacted], the presenter in this session was in the unique position of being an instrument-rated private pilot and a software engineer with experience working with remote teams, both will provide insight into lessons we will learn as we peel back the details of these tragic events.

In this session, the presenter presented about aircraft types and how they affect decisions of the airline industry from pilot scheduling, plane schedules, innovation, and profits. An airplane design from 1994 causes challenges in 2018-2019 that resulted in a software solution to a hardware problem of design. The presenter presented different rules and regulations from USA FAA relinquishing quality standards to Boeing because of man-power and costs. This session also focused on what a pilot does and expects and what the MCAS system did by design.

Lessons from this study learned are:

  1. Fail-safe: fail-safe is better than foolproof. Failing safe allows users to undo when they do anything wrong. Provide cross-checks, which was missing in the case of 737 Max.
  2. Provide all necessary warnings to the end-user. In this case, these were disabled and pilots were not able to see the warning.
  3. Reduce workload: The workload was one reason to have faulty software. High workload may result in task drop or less task performance.
  4. Safety is assumed: Make it part of your requirements.
  5. Politics: Be aware when requirements are not technical
  6. Documentation: Documentation is essentially required and it very important to give importance to documentation.
  7. Cheap is expensive: The development was done at a very cheap rate and no senior developer was involved. To have reliable software don’t go only for the cheap developers.

Scala Days, Day 2

Some update about Scala days, day 2 a really really long and very very informative day. Conference sessions actually started this day. Before that, there were a welcome party and training sessions. There were four tracks defined by the organizers. But, anyone can switch according to his/her field of interest in a particular session.

Keynote: Tools for verified Scala

Day 2 started with the keynote, by Viktor Kuncak. He spoke about tools known as stainless for code verification, synthesis, and repair that his research group at EPFL developed over the past years.

8 Akka anti-patterns you’d better be aware of

Akka is a toolkit for building highly concurrent and distributed applications on the JVM using the actor model. Given the prevalence of frameworks over toolkits and models in the industry, it is easy to forget that the former will not prevent you from using them in any way you please – including ways that are possibly suboptimal or perhaps even harmful.

Taming Distribution: Formal Protocols for Akka Typed

Cloud computing, reactive systems, microservices: distributed programming has become the norm. But while the shift to loosely coupled message-based systems has manifest benefits in terms of resilience and elasticity, our tools for ensuring correct behavior has not grown at the same pace. Statically typed languages like Java and Scala allow us to exclude large classes of programming errors before the first test is run. Unfortunately, these guarantees are limited to the local behavior within a single process, the compiler cannot tell us that we are sending the wrong JSON structure to a given web service. Therefore distribution comes at the cost of having to write large test suites, with timing-dependent non-determinism.

  • Distribution is based on two parts
    • Concurrency and partial failure
  • Actors are based on distribution and that can imply non determination. This makes concurrency non determinant as well as distribution non determinant.
  • Causality can only be avoided by restricting non determination as well as impose as much orders as possible. But still it is not enough.
  • Implement cluster receptionist that can be FQCN
  • Actor model should return new behavior.
    • This has a side effect that says some errors that compiler can’t catch
  • Use HStack
  • This add-on is available at: Akka Typed session

Compiling like a boss

We all love Scala, but the one aspect we have a hard time accepting are long compile times. It’s not uncommon for a project to experience compilation times of a handful of minutes, if not worse. On top of that, compilation times are unpredictable, depending on a combination of language features, external libraries, and type annotations. A single line change may increase compilation times ten fold.
What can we do? It’s paramount we gain greater insight into the tools and libraries we use. There are also established (anti-)patterns that you should know about, if you fancy to keep compilation times to a minimum. And why not utilizing all cores when compiling? The stock Scala compiler can’t do it, but Triplequote Hydra is here to change that. Sit tight and let’s cut down on compilation time!

Scala Days, Day 1

Scala days conference started on the 31st of May 2017 in Copenhagen. Before this, there were two days of training sessions that we were not supposed to attend.

We reached Copenhagen from Karlskrona via train at 2:15 pm Swedish time. Then it was easy to locate the hotel that is not far from the central station. After checking in we left for the conference at 3:00 pm. We reached there around 3:45 pm and took our cards and complete the package. It was looking quite crowdy and there were many participants that showed the popularity of the conference. It was too windy in Copenhagen.

We had some refreshments over there and then went around to look at the companies with some presentations about their companies.

Keynote:

Keynote session started at 5:00 pm where Jon Pretty was the host. His way of hosting the event was really lively to keep everyone interested in the session. He explained how he came up with the idea of ‘the cake pattern‘ that is now changed and and replaced with ‘the jigsaw pattern‘.

Then came the founder of the Scala and a guy who wrote more lines of code in JAVA and Scala than anyone else in the world. It was worth listening to him. He explained that soon Dotty will be released for professional projects. Its pre-release versions are now available. Scala version 2.14 and 2.15  will be released in 2018 and then Dotty will be merged in the Scala to release Scala 3.0.

Dotty will introduce many key features in the Scala and many more languages are moving towards that space as well.

Dotty will follow rules of calculus in it and it will support intersection and union operations in programming using ‘&’ and ‘|’ symbols. Dotty will introduce some changes focus on

  • Types
  • Enums
  • Traits
  • Implicits

These are just some areas of updates, Dotty will introduce much more as well.

Changes in Types:
  1. Remove existential types and type projections like T for some {X} and T # A. Both are unsound
  2. Replaced compound types with intersection types and union types.

    e.g. T with U -> T & U and T | U

  3. Introduced type lambdas i.e. [X] => T. Previous emulation using structural types is awful and uses general type projection #, which has been eliminated [Odesky]
Changes in Traits:

Traits will be updated with support to pass parameters to it.

class C extends { val  x = E } with T

will be replaced with

trait T(x: Int)
class C extends T(22)

One restriction that implies here is that only class can extend this but not trait.

Changes in Enums:

Enums will have new features

  • One construct supports enumerations and ADTs
  • Maps transparently to classes/objects/vals

A simple enum will be like

can be expanded like

Enums can also have parameters

enum Color {
case Red extends Color(0xFF0000)
case Green extends Color(0x00FF00)
case Blue extends Color(0x0000FF)
}

Enum tags will be added to differentiate between enum values.

Changes in Implicit:

Implicit in Scala is currently very puzzling and there is too much repetition. It is really hard to understand what called what. In Dotty, only implicit methods are eligible for conversions and a new class ImplicitConverter allows to abstract over implicit conversions.

Dotty will also reduce repetition. The idea behind this is that Context abstraction is just a parameter passing parameter. Make these parameters implicit to avoid tedium. There are two rules for Typing in implicit.

  1. Implicit functions get implicit arguments just like implicit methods

    val f: implicit A =>B
    implicit val a: A

    f expands to f(a)

  2. Implicit functions get created on demand. If the expected type of b is implicit A => B, then

    b expends to implicit (_: A) => b

Efficiency can also be improved by optimizing implicit function result types rather than creating a closure like

def f(x: T) = { implicit y: U => body }

Just create a curried function like

def f(x: T)(implicit y: U = body

For more details about implicit changes and new features listen: Continue reading “Scala Days, Day 1”