Tutorial: Visual analytics of heterogeneous data
In this tutorial, we will explore how Curio can facilitate visual analytics of heterogeneous data by integrating various data sources such as raster data, sensor data, and geospatial data to analyze and visualize urban microclimate in Milan. Here is the overview of the entire dataflow pipeline:
Before you begin, please familiarize yourself with Curio’s main concepts and functionalities by reading our quick start guide.
The data for this tutorial can be found here.
For completeness, we also include the template code in each dataflow step.
Step 1: Load high-resolution mean radiant temperature data
The icons on the left-hand side can be used to instantiate different nodes, including data loading nodes. Let’s start by instantiating a data loading node and changing its view to Code. Then, we load the high-resolution mean radiant temperature data:
import rasterio
timestamp = 12
src = rasterio.open(f'Milan_Tmrt_2022_203_{timestamp:02d}00D.tif')
return src
Step 2: Loading meteorological data
Using a Data loading / file node, we load air temperature (Td), wind speed (Wind) and relative humidity (RH) data from ERA5 hourly meteorological dataset.
import pandas as pd
sensor = pd.read_csv('Milan_22.07.2022_Weather_File_UMEP_CSV.csv', delimiter=';')
return sensor
Step 2.5: Merging raster and meteorological data
As an intermediate step, let’s merge the dataflow from Step 1 and 2.
Step 3: Compute universal thermal climate index (UTCI)
In this step we want to compute Universal Thermal Climate Index (UTCI)), a human biometeorology parameter to assess human well-being in the outdoor environment. The UTCI computation takes raster data as input, processes it, and produces another raster dataset as output. This output contains the UTCI values for each corresponding location in the grid.
To do that, we connect the loaded data (raster and tabular) with a custom analysis & modeling node that computes the UTCI.
import xarray as xr
from pythermalcomfort import models
import numpy as np
from rasterio.warp import Resampling
src = arg[0]
sensor = arg[1]
timestamp = 12
upscale_factor = 0.25
dataset = src
data = dataset.read(
out_shape=(
dataset.count,
int(dataset.height * upscale_factor),
int(dataset.width * upscale_factor)
),
resampling=Resampling.nearest,
masked=True
)
data.data[data.data==src.nodatavals[0]] = np.nan
sensor = sensor[sensor['it']==timestamp]
tdb = sensor['Td'].values[0]
v = sensor['Wind'].values[0]
rh = sensor['RH'].values[0]
def xutci(tdb, tr, v, rh, units='SI'):
return xr.apply_ufunc(
models.utci,
tdb,
tr,
v,
rh,
units
)
utci = xutci(tdb, data[0], v, rh)
return (utci.tolist(), [data.shape[-1], data.shape[-2]])
Step 4: Loading sociodemographic data
To study the relationship between UTCI and vulnerable populations, we create a new data node that loads sociodemographic data for populations older than 65 at neighborhood level.
import geopandas as gpd
gdf = gpd.read_file('R03_21-11_WGS84_P_SocioDemographics_MILANO_Selected.shp')
return gdf
Step 5: Merge data
Now, we want to spatially join the UTCI data in the raster format with the socio-demographic data loaded in the previous step. To do that, we create another analysis & modeling node, and run the following:
import numpy as np
from rasterstats import zonal_stats
dataset = arg[0]
utci = np.array(arg[1][0])
shape = arg[1][1]
gdf = arg[2]
transform = dataset.transform * dataset.transform.scale(
(dataset.width / shape[0]),
(dataset.height / shape[1])
)
joined = zonal_stats(gdf, utci, stats=['min','max','mean','median'], affine=transform)
gdf['mean'] = [d['mean'] for d in joined]
return gdf.loc[:, [gdf.geometry.name, 'mean', "gt_65"]]
We then filter the resulting gdf to only those with mean UTCI higher than zero. Let’s create a new data cleaning node connected to the previous node and store the result on a data node:
import geopandas as gpd
gdf = arg
filtered_gdf = gdf.set_crs(32632)
filtered_gdf = filtered_gdf.to_crs(3395)
filtered_gdf = filtered_gdf[filtered_gdf['mean']>0]
filtered_gdf.metadata = {
'name': 'census'
}
return filtered_gdf
Step 6: Create a visualization map
We can visualize the result of the previous operations by adding a UTK map. The grammar for the map is automatically populated once it receives an input from a previous box.
Step 7: Create a linked scatterplot
In this step, we create a linked scatterplot through a Vega-Lite node connected to the output of the data node in Step 5.
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"params": [
{"name": "clickSelect", "select": "interval"}
],
"mark": {
"type": "point",
"cursor": "pointer"
},
"encoding": {
"x": {"field": "gt_65", "type": "quantitative"},
"y": {"field": "mean", "type": "quantitative", "scale": {"domain": [37, 42]}},
"fillOpacity": {
"condition": {"param": "clickSelect", "value": 1},
"value": 0.3
},
"color": {
"field": "interacted",
"type": "nominal",
"condition": {"test": "datum.interacted === '1'", "value": "red", "else": "blue"}
}
},
"config": {
"scale": {
"bandPaddingInner": 0.2
}
}
}
Step 8: Create linked boxplot
To create a box plot we first create a “Data Cleaning” node (connected to data node of Step 5) to filter out all attributes we are not interested and only keep the “greater than 65”.
gdf = arg
return gdf.loc[:, ["gt_65"]]
Finally, we create a Vega-Lite node connected to the data cleaning node:
{
"$schema": "https://vega.github.io/schema/vega-lite/v5.json",
"transform": [
{
"fold": ["gt_65"],
"as": ["Variable", "Value"]
}
],
"mark": {
"type": "boxplot",
"size": 60
},
"encoding": {
"x": {"field": "Variable", "type": "nominal", "title": "Variable"},
"y": {"field": "Value", "type": "quantitative", "title": "Value"}
}
}
Step 9: Link map and scatterplot
The map, scatterplot, and boxplot are linked through interaction edges (red ones) connected to the data node, allowing for the analysis of outliers of concern, i.e., regions that have a large population of older adults and high UTCI.
Final result
This tutorial demonstrates how Curio can be used for visual analytics involving heterogeneous data sources. By integrating raster, tabular, and geospatial data, we can conduct comprehensive analyses of urban microclimate and visualize the results effectively. The linkage between different types of data and interactive visualization enables a deeper understanding of the relationships and potential areas of concern.