Marionette White Paper

^{“Where shall I begin, please your Majesty?” the White Rabbit asked.}
^{“Begin at the beginning,” the King said gravely, “and go on till you come to the end: then stop.”}
^{— Lewis Carroll, Alice in Wonderland (1865)}

We are going to hand some details about what lies under the hood of Marionette.

Who Should Read This

This document is directed at intermediate-to-advanced developers, particularly web application engineers and DevOps specialists/site reliability engineers and product owners. It seeks to fill a need in the clients/users for a practical demonstration of complex cloud native design principles and structure of Marionette.

Layered Architecture For Marionette Stack

We describe in the following a layered architecture for Marionette cryptocurrency dealing platform, which from the bottom to the top layer consists of the following components:

Hardware Layer: The hardware layer provides the physical infrastructure including the computing and storage devices, and the network equipment required for running the cloud services and applications.
OS Layer: The operating system layer consists of distribution, which is based on Debian GNU/Linux. The OS layer is responsible for managing the hardware as virtualised resources. There are a number of support services that are run on top of OS layer.
Docker Layer: The Docker engine runs on the host operating system, and includes the necessary binary packages and libraries for executing applications within containers.
Middleware Layer: The middleware layer brings together the resources, providing an integrated and consistent view of the cloud services.
Services Layer: The Services layer integrates useful services and applications. These services are logically grouped into currency exchange services, trading services and enterprise services.
Front-end Layer: The front-end layer provides the different types of interfaces to interact with the infrastructure of the services, and any other tools for assisting in the administrating of services and applications.

Microservices

Microservices are an architectural style in which components of a system are designed as standalone and independently deployable applications. This definition emphasizes the fact that microservices are applications that run independently of each other yet can collaborate in the performance of their tasks. Martin Fowler provides a more detailed definition. He defines microservices as an architectural style with “an approach to developing a single application as a suite of small services, each running in its own process and communicating with lightweight mechanisms, often an HTTP resource API”. This definition emphasizes the autonomy of the services by stating that they run in independent processes. If you’re relatively new to the microservices, you’ll definitely want to read on. Even if you’re somewhat comfortable with microservices, you might want to skim this chapter: there will be a gem or two in here for you. If you’re a seasoned veteran of the microservice development, you can go ahead and move on to the next chapter (or read it ironically and judge me).

With Marionette composed of multiple, collaborating microservices, we can decide to use different technologies inside each one. This allows us to pick the right tool for each job rather than having to select a more standardized, one-size-fits-all approach that often ends up being the lowest common denominator. With microservices, we are also able to more quickly adopt technologies and to understand how new advancements might help us.

Microservices collaborate through APIs, and this document will give you a taste of the design of GraphQL APIs of Marionette microservices. It is important to understand that an API is just a layer on top of an application, and that there are different types of interfaces. The most challenging aspect of working with APIs is ensuring that both the API client and the API server follow the API specification. Here we are going to explain you foundational patterns and principles for building Marionette microservices and driving their integrations with APIs.

The advantages of microservices are many and varied. Many of these benefits can be laid at the door of any distributed system. Microservices, however, tend to achieve these benefits to a greater degree primarily because they take a more opinionated stance in the way service boundaries are defined. By combining the concepts of information hiding and domain-driven design with the power of distributed systems, microservices help us deliver significant gains over other forms of distributed architectures. Microservices may well give us the option for each microservice to be written in a different programming language, to run on a different runtime, or to use a different database - but these are options only.

GraphQL APIs

Now that we defined what an API is, we will explain the defining features of a web API. A web API is an API that uses the Hypertext Transfer Protocol (HTTP) protocol to transport data. Web APIs are implemented using technologies such as SOAP, REST, GraphQL, gRPC, and others.
When a resource is represented by a large payload, fetching it from the server translates to a large amount of data transfer. With the emergence of API clients running in mobile devices with restricted network access and limited storage and memory capacity, exchanging large payloads often results in unreliable communication. In 2012, Facebook was acutely aware of these problems, and it developed a new technology to allow API clients to run granular data queries on the server. This technology was released in 2015 under the name of GraphQL.

GraphQL is one of the most popular protocols for building web APIs. It’s a suitable choice for driving integrations between microservices and for building integrations with frontend applications. GraphQL gives API consumers full control over the data they want to fetch from the server and how they want to fetch it. GraphQL is a query language for APIs. Instead of fetching full representations of resources, GraphQL allows us to fetch one or more properties of a resource, such as the order or the status of an order. With GraphQL, we can also model the relationship between different objects, which allows us to retrieve, in a single request, the properties of various resources from the server, such as a orders’ details and others. In contrast, with other types of APIs, such as REST, you get a full list of details for each object. Therefore, whenever it’s important to give the client full control over how they fetch data from the server, GraphQL is a great choice.

For example, the trading service owns data about trades as well as their orders. Each trade and orders contains a rich list of properties that describe their features. However, when a client requests a list of trades, they are most likely interested in fetching only a few details about each trade. Also, client (frontend) may be interested in being able to traverse the relationships between trades, orders, and other objects owned by the trading service. For these reasons, GraphQL is an excellent choice for building the service API. As we describe the specification for the trading API and others, you’ll learn about scalar types of GraphQL, design of custom object types, as well as queries and mutations.

Just as we can use SQL to define schemas for our database tables, we can use GraphQL to write specifications that describe the type of data that can be queried from our servers. A GraphQL API specification is called a schema, and it’s written in a standard called Schema Definition Language (SDL).

Containers

We run each microservice instance in isolation. This ensures that issues in one microservice can’t affect another microservice - for example, by gobbling up all the CPU. Virtualization is one way to create isolated execution environments on existing hardware, but normal virtualization techniques can be quite heavy when we consider the size of our microservices. Containers, on the other hand, provide a much more lightweight way to provision isolated execution for service instances, resulting in faster spin-up times for new container instances, along with being much more cost effective for many architectures.

A microservice instance runs as a separate container on a virtual or physical machine. That container runtime may be managed by a container orchestration tool like Kubernetes.

Containers as a concept work wonderfully well for microservices, and Docker made containers significantly more viable as a concept. We get our isolation but at a manageable cost. We also hide underlying technology, allowing us to mix different tech stacks. When it comes to implementing concepts like desired state management, though, we’ll need something like Kubernetes to handle it for us.

After we begun playing around with containers, we also realized that we need something to allow us to manage these containers across lots of underlying machines. Container orchestration platforms like Kubernetes do exactly that, allowing us to distribute container instances in such a way as to provide the robustness and throughput our service needs, while allowing us to make efficient use of the underlying machines. The work in this direction is being done in full conformity with the adjusted schedule. But now we don’t feel the need to rush to adopt Kubernetes for that matter. It absolutely offers significant advantages over more traditional deployment techniques, but its adoption is difficult to justify if we have only a few microservices. As we gradually increase the complexity of Marionette microservice architecture, we will look to introduce new technology as we need it. We don’t need a Kubernetes cluster when we have about three dozen services. After the overhead of managing deployment begins to become a significant headache, we will start the use of Kubernetes. We know that but if we do end up doing that, we do our best to ensure that someone else is running the Kubernetes cluster for us, perhaps by making use of a managed service on a public cloud provider. Running our own Kubernetes cluster can be a significant amount of work. By the way with smaller services, we can scale just those services that need scaling, allowing us to run other parts of the system on smaller, less powerful hardware. We can make a change to a single service and deploy it independently of the rest of the system. This allows us to get our code deployed more quickly. If a problem does occur, it can be quickly isolated to an individual service, making fast rollback easy to achieve. It also means that we can get our new functionality out to customers more quickly.

Public cloud providers, or more specifically the main three providers - Google Cloud and Amazon Web Services (AWS) - offer a huge array of managed services and deployment options for managing Marionette. As our microservice architecture grows, more and more work will be pushed into the operational space. Public cloud providers offer a host of managed services, from managed database instances or Kubernetes clusters to message brokers or distributed filesystems. By making use of these managed services, we are offloading a large amount of this work to a third party that is arguably better able to deal with these tasks.

Containers have become a dominant concept in server-side software deployment and for many are the de facto choice for packaging and running microservice architectures. The container concept, popularized by Docker, and allied with a supporting container orchestration platform like Kubernetes, has become many people’s go-to choice for running microservice architectures at scale.

By deploying one service per container, as in figure, we get a degree of isolation from other containers and can do so much more cost-effectively than would be possible if we wanted to run each service in its own VM.

You should view containers as a great way of isolating execution of trusted software. The Docker toolchain handles much of the work around containers. Docker manages the container provisioning, handles some of the networking problems for us, and even provides its own registry concept that allows you to store Docker applications. Before Docker, we didn’t have the concept of an “image” for containers - this aspect, along with a much nicer set of tools for working with containers, helped containers become much easier to use. The Docker image abstraction is a useful one for us, as the details of how our microservice is implemented are hidden. We have the builds for our microservice create a Docker image as a build artifact and store the image in the Docker registry, and away we go. When you launch an instance of a Docker image, you have a generic set of tools for managing that instance, no matter the underlying technology used - microservices written in NodeJS or Go, or whatever can all be treated the same. When Docker first emerged, its scope was limited to managing containers on one machine. This was of limited use - what if we wanted to manage containers across multiple machines? This is something that is essential if we want to maintain system health, if we have a machine die on us, or if we just want to run enough containers to handle the system’s load. Docker came out with two totally different products of its own to solve this problem, confusingly called “Docker Swarm” and “Docker Swarm Mode”. Really, though, when it comes to managing lots of containers across many machines, Kubernetes is king here, even if we might use the Docker toolchain for building and managing individual containers.

Testing

The question is how to effectively and efficiently test our code’s functionality when it spans a distributed system. Unit testing is a methodology where units of code are tested in isolation from the rest of the application. A unit test might test a particular function, object, class, or module. But unit tests don’t test whether or not units work together when they’re composed to form a whole application. For that, we use a set of full end-to-end functional tests of the whole running application (aka system testing). Eventually, we need to launch Marionette and see what happens when all the parts are put together.

Which way is right for us? Behaviour Driven Development (BDD) uses human-readable descriptions of software user requirements as the basis for software tests. Like Domain Driven Design, an early step in BDD is the definition of a shared vocabulary between stakeholders, domain experts, and engineers. This process involves the definition of entities, events, and outputs that the users care about, and giving them names that everybody can agree on. Out testers then use that vocabulary to create a domain specific language (named as predicates in our ecosystem) they can use to encode system tests such as User Acceptance Tests. Each test is based on a user story written in the formally specified ubiquitous language (a vocabulary shared by all stakeholders) based on English. Notice that this language is focused exclusively on the business value that a customer should get from the software rather than describing the user interface of the software, or how the software should accomplish the goals. Our testers use custom tools such as Cucumber to create and maintain their own custom domain specific language.

Security

Often when the topic of Marionette microservices security comes up, our clients want to start talking about reasonably sophisticated technological issues like the use of JWT tokens or the need for mutual TLS (topics we will explain later). Oh my! However, the problem with security is that you’re only as secure as your least secure aspect. To use an analogy, if you’re looking to secure your home, it would be a mistake to focus all your efforts on having a front door that is pick resistant, with lights and cameras to deter malicious parties, if you leave your back door open.

Our microservice architecture consists of lots of communication between things. Human users interact with our system via user interfaces. These user interfaces in turn make calls to microservices, and microservices end up calling yet more microservices.

Credentials give a person or computer access to some form of restricted resource. This could be a database, a computer, a user account, or something else. We have the number of humans involved, and we have lots credentials in the mix representing microservices, (virtual) machines, databases, and the like. We break the topic of credentials down into two key areas. Firstly, we have the credentials of the users (and operators) of our system. Secondly, we consider secrets - pieces of information that are critical to running our microservices. Across both sets of credentials, we consider the issues of rotation, revocation, and limiting scope. User credentials, such as email and password combinations, remain essential to how many of our users work with our software, but they also are a potential weak spot when it comes to our system being accessed by malicious parties. Our credentials also extend to managing things like API keys for third-party systems, such as accounts for our public cloud provider.

Critical pieces of information:

Certificates for TLS
SSH keys
Public/private API keypairs
Credentials for accessing databases
etc.

In the context of security, authentication is the process by which we confirm that a party is who they say they are. We typically authenticate a human user by having them type in their username and password. We assume that only the actual user has access to this information, and therefore the person entering this information must be them. Ease of use is important - we want to make it easy for our users to access our system. Our approach to authentication is to use some sort of single sign-on (SSO) solution to ensure that a user has only to authenticate themselves only once per session, even if during that session they may end up interacting with multiple services.

Authorization is the mechanism by which we map from a principal (generally, when we’re talking abstractly about who or what is being authenticated, we refer to that party as the principal) to the action we are allowing them to do. When a principal is authenticated, we will be given information about them that will help us decide what we should let them do.

Marionette authorization service scenario uses JSON Web Token (JWT). JWT defines a compact and self-contained way for securely transmitting information between parties as a JSON object. Once the user is logged in, each subsequent request will include the JWT, allowing the user to access routes, services, and resources that are permitted with that token. There is a possibility to get token using Marionette GraphQL interface:

mutation {
login(email: "user@domain.io", password: "mypasswd") {
token
}
}

In authentication, when the user successfully logs in using their credentials, a JSON Web Token will be returned. The output is three Base64-URL strings separated by dots that can be easily passed in HTML and HTTP environments:

{
"data": {
"login": {
"token": "eyJhbGcxxx1NiJ9.eyJxxxpudWxsfQ.TE2ehxfNuxxx"
}
}
}

Whenever the user wants to access a protected route or resource, the user agent should send the JWT, typically in the Authorization header using the Bearer schema. The content of the header should look like the following:

Authorization: Bearer <token>

This can be, in certain cases, a stateless authorization mechanism. The protected routes of the server will check for a valid JWT in the Authorization header, and if it's present, the user will be allowed to access protected resources.

The following steps show how a JWT is obtained and used to access Marionette APIs or resources:

The application or client makes a one-time authorization request to the authorization server. This is performed through one of the different authorization flows. Web application will go through the /login endpoint using the authorization code flow.
The server validates the credentials and, if everything is correct, it returns to the client a JSON with a token that encodes data from a user logged into the system i.e. when the authorization is granted, the authorization server returns an access token to the application.
After receiving the token, the client should store it in the way they prefer, either by LocalStorage, SessionStorage, Cookies and HTTP Only or other client-side storage mechanisms.
Every time the client accesses a route that requires authentication, it just sends this token to the API to authenticate and release the consumption data.
The application uses the access token to access a protected resource (API). The server always validates this token to allow or block a client request.

As an example, you can extract information about a certain order using the previously received token:

query {
userOrder(id: "9dac9971-b947-421a-983e-33b22047a18c") {
id
status
type
}
}

HTTP header:
{
"Authorization":"Bearer eyJhbGcxxx1NiJ9.eyJxxxpudWxsfQ.TE2ehxfNuxxx"
}

There are several options for storing tokens. Each option has costs and benefits. Briefly, the options are: store in memory JavaScript, store sessionStorage, store localStorage and store in a cookie. The main trade off is security. Any information stored outside of the current application's memory is vulnerable to Cross-Site Scripting (XSS) attacks. Marionette uses Cookies and HTTP Only as acceptable
ways to put client state (HttpOnly is an additional flag included in a Set-Cookie HTTP response header). Using the HttpOnly flag when generating a cookie helps mitigate the risk of client side script accessing the protected cookie. If the HttpOnly flag is included in the HTTP response header, the cookie cannot be accessed through client side script. As a result, even if a cross-site scripting flaw exists, and a user accidentally accesses a link that exploits this flaw, the browser will not reveal the cookie to a third party.
Briefly speaking the HttpOnly flag is always set and your browser should not allow a client-side script to access the session cookie.

Re-defining Service Layer of Marionette stack

Let's dive into the technicalities of the construct of Marionette back-end. A microservice is a JavaScript module containing some part of Marionette application. It is isolated and self-contained, meaning that even if it goes offline or crashes the remaining services would be unaffected. Inside service there are definitions of actions and subscriptions to events. From the architectural point-of-view back-end of Marionette can be seen as a composition of two independent parts: the set of core services and the gateway service. The first one is responsible for business logic while the second simply receives user's requests and conveys them to the other services. To ensure that Marionette back-end is resilient to failures the core and the gateway services are running in dedicated nodes. Running services at dedicated nodes means that the transporter module is required for inter services communication. Most of the transporters supported by framework rely on a message broker for inter services communication. Overall, the internal architecture of Marionette back-end is represented in the figure below.

Now, assuming that back-end services are up and running, the back-end can serve user’s requests. So let’s see what actually happens with a request to list all available markets. First, the request (GET /markets) is received by the HTTP server running at gateway node. The incoming request is simply passed from the HTTP server to the Gateway service that does all the processing and mapping. In this case in particular, the user's request is mapped into a “list all markets“-action of the Markets service. Next, the request is passed to the broker, which checks whether the Markets service is a local or a remote service. In this case, the Markets service is remote so the broker needs to use the transporter module to deliver the request. The transporter simply grabs the request and sends it through the communication bus. Since all nodes are connected to the same communication bus (Message broker), the request is successfully delivered to the Markets service node. Upon reception, the Service broker of Markets service node will parse the incoming request and forward it to the Markets service. Finally, the Markets service invokes the action a-la list of all markets and returns the list of all available markets. The response is simply forwarded back to the end-user.