Publish AI, ML & data-science insights to a global community of data professionals.

Diagrams as Code in Python

Creating cloud system architecture diagrams with Python

Photo by MARIOLA GROBELSKA on Unsplash
Photo by MARIOLA GROBELSKA on Unsplash

Over the last few years I used many different tools suitable for drawing system architecture diagrams for data platforms and cloud designs. A couple of such tools are draw.io and Excalidraw.

Even though these platforms offer a wide range of tools that can help you draw the desired diagram I have been always struggling with a couple of things. Firstly, it was not easy for me to share the diagrams with other people in the organisation in a way that it’d be possible for them to update or modify them. Secondly, it was -almost- impossible to version control my diagrams – in most of the cases I’d have to save a file containing metadata that can then be used to reload an old diagram.

Lastly, my diagrams were lacking consistency which – in my opinion – is quite important given that you are expected to create several different diagrams that you must present to users and colleagues. And by consistency I mean to be able to use the various diagram components – including edges, nodes etc. – consistently.

Quite recently, I’ve came across a Python package that lets you draw the cloud system architecture in Python code without any design tools. In other words, it offers Diagrams as Code in a way that you can programmatically draw diagrams whilst be able to version control them.


Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering


The Diagrams package

Diagrams is a Python package that can be used for creating cloud system architecture diagrams and supports six major Cloud Providers including Amazon Web Services (AWS), Microsoft Azure, Google Cloud Platform (GCP), Kubernetes, Alibaba Cloud and Oracle Cloud. Additionally, it also supports other commonly used technologies such as specific programming languages, frameworks, chat systems and many more nodes. At the same time, you also have the option to create custom nodes in order to serve your specific use-cases.

The package requires Python 3.6 version or higher so before giving it a go, make sure that you have a compatible Python version on your host machine. Additionally, you also have to install Graphviz – an open source graph visualization software that Diagrams package use in order to render the diagrams.

macOS users can download the Graphviz via brew install graphviz if you’re using Homebrew. Similarly, Windows users with Chocolatey installed can run choco install graphviz.

  • Diagrams Documentation

Now that you have double checked your Python version and installed Graphviz you can go ahead and install the package through pip:

$ pip install diagrams

Creating Cloud Architecture Diagrams in Python

Before creating our first Diagram as Code, let’s explore some of the most fundamental components of this package.

The first one is Diagram which is primary object representing a global diagram context. You can use Diagram as a Context Manager

from diagrams import Diagram
from diagrams.gcp.analytics import Bigquery
with Diagram('My Diagram'):
    BigQuery('Data Warehouse')

The above code snippet will create a diagram consisting of a single BigQuery node which is the managed Data Warehouse service on Google Cloud Platform.

The second fundamental component of the package is Node that is an abstract concept that represents a single system component object. A typical Node consists of three basic parts; provider, resource type and name. For example, the BigQuery node that we’ve used in the previous code snippet is provided by gcp provider and is of analytics resource type (and obviously BigQuery corresponds to the name of the node). Note that you can even create your own custom node:

from diagrams import Diagram, Node
with Diagram('My Diagram'):
    Node('This is a custom node')

The third component that is crucial when it comes to creating diagrams with this library is called Cluster that allows multiple Nodes to be grouped together in a way that they are all isolated from any other Nodes not contained in the cluster.

As an example, consider the following Diagram consisting of three nodes; one for Google Cloud Storage (which is a managed Object Storage service on Google Cloud Platform), a Cloud SQL node (which is a managed Postgres service on GCP) and an on-prem MongoDB.

from diagrams import Cluster, Diagram
from diagrams.gcp.database import SQL
from diagrams.gcp.storage import GCS
from diagrams.onprem.database import MongoDB
with Diagram('My Diagram', direction='TB'):
  gcs = GCS('Google Cloud Storage')
with Cluster('Databases'):
  cloud_sql = SQL('Cloud SQL')
  mongodb = MongoDB('MongoDB')
Example Diagram with a Cluster of Databases - Source: Author
Example Diagram with a Cluster of Databases – Source: Author

Finally, the last fundamental component of a Diagram is the Edge – an object that represents an edge between two Node objects and has three properties; label, colour and style.

from diagrams import Diagram, Edge, Node
with Diagram('My Diagram', direction='TB'):
    n1 = Node('n1')
    n2 = Node('n2')
    n3 = Node('n3')
    n4 = Node('n4')
    n5 = Node('n5')
    n6 = Node('n6')   

    n1 >> n2
    n3 - n4
    n5 >> Edge(label='This is a label', color='red') >> n6
Example with different types of Edges - Source: Author
Example with different types of Edges – Source: Author

Creating an Architecture Diagram

Now that we’ve learned about the fundamental objects that are required to make up a Diagram with Python let’s create a more realistic flow using the components we mentioned earlier.

In the following example (taken from the official documentation) we create a diagram that corresponds to a Message Collecting System on Google Cloud Platform.

from diagrams import Cluster, Diagram
from diagrams.gcp.analytics import BigQuery, Dataflow, PubSub
from diagrams.gcp.compute import AppEngine, Functions
from diagrams.gcp.database import BigTable
from diagrams.gcp.iot import IotCore
from diagrams.gcp.storage import GCS
with Diagram("Message Collecting", show=False):
    pubsub = PubSub("pubsub")
with Cluster("Source of Data"):
        [IotCore("core1"),
         IotCore("core2"),
         IotCore("core3")] >> pubsub
with Cluster("Targets"):
        with Cluster("Data Flow"):
            flow = Dataflow("data flow")
with Cluster("Data Lake"):
            flow >> [BigQuery("bq"),
                     GCS("storage")]
with Cluster("Event Driven"):
            with Cluster("Processing"):
                flow >> AppEngine("engine") >> BigTable("bigtable")
with Cluster("Serverless"):
                flow >> Functions("func") >> AppEngine("appengine")
pubsub >> flow
Message Collecting Diagram on Google Cloud Platform - Source: Documentation
Message Collecting Diagram on Google Cloud Platform – Source: Documentation

Final Thoughts

Architecture diagrams are quite important as they give a clear picture of how various components work across an organisation. Therefore, it’s important to get them right and also present them in a nice, intuitive and consistent way.

Additionally, it’s also important to be able to share these diagrams easily, in such a way that they can easily be modified by various people whilst being version controlled.

Diagrams as Code is an approach that can help you move towards this direction when it comes to drawing and sharing architecture diagrams. In today’s tutorial, we showcased how to take advantage of diagrams package in order to programatically create diagrams with Python.


Subscribe to Data Pipeline, a newsletter dedicated to Data Engineering


Related articles you may also like

Apache Airflow Architecture


What is Parallel Computing?


The Big O Notation


Towards Data Science is a community publication. Submit your insights to reach our global audience and earn through the TDS Author Payment Program.

Write for TDS

Related Articles