For all Sankey charts, data formatting is the first step to getting them right. Sankey data formatting is one of the most common questions we get from users.
Read below for a breakdown of the different ways a dataset needs to be formatted, in order to work in a Sankey.
In this article
How to format your data to build a Sankey
To build a Sankey, you need to wrangle your data into a long format. That is one row per record.
- 1
-
Make sure that in the data sheet, you have one row per record. The data may look repetitive, but this is okay.
This is an example of what your data should look like.
-
In this dataset, we are looking at the number of refugees resettled in 2020. Notice how we have multiple rows with the same Source ("Country of origin") and Target ("Country of asylum") values, but with different counts for "Cases".
This isn't a problem, because the template will aggregate these rows and adjust the width of the flow accordingly.
- 2
Next, bind your Source and Target columns to the correct data bindings. In our example, "Country of origin" is our Source column and "Country of asylum" is our Target column.
-
Usually, you'll also want to bind a column to Values to size your links, though this isn't required. If you don't add a column of values, your Sankey will size links based on the number of rows in the data.
-
The resulting Sankey would look like this:
How to format your data to build an alluvial diagram
Alluvial diagrams represent discrete flows between elements, meaning that the flow has ordered stages or steps. This makes the wrangling process of the data slightly different from the Sankey diagram.
- 1
-
First, you'll need to make sure your dataset is in a long format (one row per record), just like with a Sankey diagram.
The easiest way to do this is by identifying your Source and Target nodes, which determine where your data is coming from and where it is going. Even though we are working with steps, all records need to be added to these two columns
In the example above, we are first plotting the flow from Afghanistan to Europe, and then the flow from Europe to the United Kingdom all on the same two columns.
TIP: Need more information on how to change your dataset? You can find a host of resources on how to transform your data here.
- 2
-
Then you have to specify the steps. Unlike Sankeys, alluvial diagrams follow a specific order, so we need to tell the template what that order is.
Steps are determined by numeric values: either dates or simple numbers that specify the steps. In this example, we are just using 1, 2, and 3 because there are only 2 steps in this flow: from "Country of origin" (step 1), to "Region of asylum" (step 2) and, lastly, to "Country of asylum" (step 3).
- 3
-
In the Data tab, you'll need to bind your Source and Target columns, as well as your Step from and Step to columns.
The resulting alluvial diagram would look like this: