Amped by @yaamehn’s opinionated Microservices A Reference Architecture recent talk at the Microservices Australia Meetup, has got me thinking about the many interesting possibilities Microservices stirs up. Many useful techniques are touched on, regardless of whether actually adopting Microservices.
On the surface Microservices doesn’t appear to be anything groundbreakingly new. It does however push tried-and-trued concepts (e.g. abstraction, decoupling, modularisation, continuous delivery) to extreme levels. Fused with innovations in the operations space such as containerising like Docker, can lead to some very powerful outcomes; such efficient continuous delivery (on a micro scale), decoupling of domains allowing for a best of bread technlogies, and so on.
Monoliths (i.e. softare that is bundled up as a single atomic distribution) are often critised when compared with Microservices architectures, however they still continue to exhibit a number of advantages over this modern approach. For example, monoliths are free to make extensive use of in-process module communication (IPC), as opposed to facing the many challenges that come with having conversations with a myriad of distributed software systems. Although not as relevant today, Sun Microsystems' L Peter Deutsch fallacies of distributed computing coined all the way back in 1994, still rings true.
Essentially everyone, when they first build a distributed application, makes the following eight assumptions. All prove to be false in the long run and all cause big trouble and painful learning experiences.
- The network is reliable
- Latency is zero
- Bandwidth is infinite
- The network is secure
- Topology doesn’t change
- There is one administrator
- Transport cost is zero
- The network is homogeneous
Anyway, below are some (random and orderless) notes from @yaamehn’s fascinating talk.
Things talking to other things. Lots of concerns such as transports, formats, serialisation, discovery and so on.
- Seriously. Just use HTTP.
- Never go full REST. The maturity model. Hypermedia can be cool, and is interesting from an academic stand point, but has no place in the real world.
- Have a standard data format. JSON API. XSD. Pick a couple.
- Specification and portal. Swagger. API Blueprint. RAML. Things like documentation and client libraries should be generated from the spec. Keeping things DRY.
RESTful API Modeling Language (RAML) is a simple and succinct way of describing practically-RESTful APIs. It encourages reuse, enables discovery and pattern-sharing, and aims for merit-based emergence of best practices.
- Register. Registrator runs on all Docker hosts. When Docker container start or stop, automatically will detect and publish/unpublish services.
- Lookup (DNS).
Host lookups. Given a name, give me an IP.
Service lookups. Goes deeper. Ports etc.
On the cient side theres lots of plumbing to take care of:
- Flow control. Backoff.
- Circuit breakers.
- Versioning and Postel’s Law.
- Serialisation and deserialisation.
- Understanding documentation (hopefully without misinterpretation).
- Ideally automatically code generate and publish a jar, a gem, a whatever module depending on technology stacks you are supporting.
- Go with a framework, like Finagle to take this burden on.
- Standardise a way of creating client libs for each service that takes care of the above problems.
Finagle is an extensible RPC system for the JVM, used to construct high-concurrency servers. Finagle implements uniform client and server APIs for several protocols, and is designed for high performance and concurrency.
Events are the “jewel” of Microservices. It’s common place to focus most attention on “services”, one can easily get caught up thinking in terms of a synchronous nest of request/response conversations taking place. This is a disaster waiting to happen. Events need to be treated as a first class citizen.
What data goes into an event. More specifically, what is the structure and metadata that makes up an event. Give this careful consideration, as depending on the problem you’re solving, can provide flexibility you may not be aware of today, providing flexibility for tomorrow.
Three common approaches:
- Snapshot. The entity (e.g. customer), at a point in time.
- Callback. The fact something interesting just happened (e.g. customer updated), with a reference URI to callback on.
- Delta. Both before and after versions of the entity, including differences.
Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.
Kafka is a convenient place to centralise logs. Absolutely everything. Kafka will not only easily handle it, downstream consumers can subscribe to logs/messages of interest, catering for both real-time and batch based consumers.
Docker is a no brainer. Just use it. Materialise isolated, predicable environments fast. Minimise dev/prod parity.
TemplatingContainer specific configuration with templating
There are lots of ways to tackle container configuration. A simple, ubiquatous and effective method is to use environment variables. Yes environment variables. Consistent with The Twelve-Factor App way of thinking…supported absolutely everywhere, they just work.
Env vars are easy to change between deploys without changing any code; unlike config files, there is little chance of them being checked into the code repo accidentally; and unlike custom config files, or other config mechanisms such as Java System Properties, they are a language and OS agnostic standard.
The code running in a container may have various configuration files. Example, a
*.properties file part of some Java software. Use environment variables to provide specific container state. On startup the container should template all configuration based on environment, followed by starting the service.
Who cares really. The point is keep it simple.
- bash with sed.
- Node.js with Jade.
- Mustache.java with Java.
- Scala with Scalate.
For configuration that is less static in nature, and dynamic reloads are appropriate.
consul-templatequeries a Consul instance and updates any number of specified templates on the filesystem.
Managing lots of Docker containers becomes a problem in itself. Docker orchestration systems are built to address that problem.
Take a container. Express the desired management characterists of this container, such as number of instances, the target specfication of machines on which it should run, what to do in the event of failure (e.g. alert someone, attempt to restart X times), what other containers in-turn this container depends on, or is it stateless.
Some example implementation options (note this is an exploding field right now, expect lots of change, standardisation and consistency improvements over the next year):
- Apache Mesos
- [Google Kubernetes](https://github. com/GoogleCloudPlatform/kubernetes)
- Spotify Helios
A no brainer. In an ecosystem of interconnected containers and Microservices. A potential ops nightmare.
While the above solution is great, it doesn’t assist us in understanding the hierachy of calls/logs, that is, when services in-turn consume other services. A layman’s approach might be to go down the synthetic correlational identifier path…unfortunatly this just ends up flattening out the hierachy. It’s actually an interesting dilemma. Luckily some clever Google and Twitter engineers have done some thinking for us: Googles Dapper and Twitters Zipkin are both good places to start.
All Microservices expose a
/stats endpoint, that returns a standardised JSON structure with health statistics. Periodicly
collectd scoops up all
/stats endpoints, enriches this health data with more host related data from the Docker daemon, and pushes this bundle of health data to a monitoring service (e.g. Circonus).
Tip: use environment variables such as
STATS_URI to signal the monitoring capabilities of a container to the outside world. Allowing for introspection on a container by container basis.