Tutorial: Expert-in-the-loop urban accessibility analyses

In this tutorial, we are going to learn how Curio can facilitate expert-in-the-loop inspection of a computer vision model for sidewalk surface material classification. Here is the overview of the whole dataflow pipeline:

Before you begin, please familiarize yourself with Curio’s main concepts and functionalities by reading our quick start guide.

The data for this tutorial can be found here.

For completeness, we also include the template code in each dataflow step.

Step 0: Initializing Curio

In order to run this tutorial, make sure you satisfy the requirements from CitySurfaces.

After initializing Curio, you will see a blank canvas.

Step 1: Loading the model training node

The icons on the left-hand side of the interface can be used to instantiate different nodes, including analysis & modeling nodes. Let’s start by instantiating an Analysis & Modeling node and changing its view to Code. Then, we set up the training procedure for our model.

For simplicity, we are not loading the exact segmentation model used in CitySurfaces since it is a resource-intensive model. Instead, we use a lighter version for demonstration purposes:

After hitting run, the Python return will output train_model() for the next node. Curio’s provenance feature allows the expert to analyze several versions of their training procedure.

# computation analysis - clear

import os
from PIL import Image
from torch.utils.data import Dataset, DataLoader
from albumentations.pytorch import ToTensorV2
from cityscapesscripts.helpers.labels import trainId2label as t2l
import segmentation_models_pytorch as smp
from torch import nn, optim

import albumentations as A  
import torch
import glob
import numpy as np

IMG_DIR = './dataset/city-surfaces'
IMAGE_WIDTH   = 320  
IMAGE_HEIGHT  = 320
BATCH_SIZE    = 8
NUM_CLASSES   = 3 # 10
LEARNING_RATE = 0.002
DEVICE  = "cuda" # if torch.cuda.is_available() else "cpu"

class SegmentationDataset(Dataset):
    def __init__(self, img_dir, transform=None):
        self.img_dir  = img_dir
        self.transform  = transform
        self.images = glob.glob('%s/*.png'%(self.img_dir))

    def __len__(self):
        return len(self.images)

    def __getitem__(self, index):
        img_path    = self.images[index]
        mask_path   = self.images[index].replace('images','annotations')
        image       = np.array(Image.open(img_path).convert("RGB"), dtype=np.float32) / 255.0
        y           = np.array(Image.open(mask_path).convert("L"))
        y = y - 1
        y[y==0]=0 # concrete
        y[y==1]=0 # bricks
        y[y==2]=0 # granite
        y[y==3]=0 # asphalt
        y[y==4]=0 # mixed
        y[y==5]=1 # road
        y[y==6]=2 # background
        y[y==7]=0
        y[y==8]=0
        y[y==9]=0
        
        if self.transform is not None:
            augmentations = self.transform(image=image, mask=y)
            image   = augmentations["image"].to(torch.float32)
            y       = augmentations["mask"].type(torch.LongTensor)
        
        return image, y

train_transform = A.Compose(
    [
        A.Resize(height=IMAGE_HEIGHT, width=IMAGE_WIDTH),
        A.ColorJitter(p=0.2),
        A.HorizontalFlip(p=0.5),
        ToTensorV2(),
    ],
)

val_transform = A.Compose(
    [
        A.Resize(height=IMAGE_HEIGHT, width=IMAGE_WIDTH),
        ToTensorV2(),
    ],
)

def get_loaders(img_dir, batch_size, train_transform, val_transform):
    
    train_ds     = SegmentationDataset(IMG_DIR+'//train//images//' , transform=train_transform)

    train_loader = DataLoader(train_ds, batch_size=batch_size, shuffle=True)

    val_ds       = SegmentationDataset(IMG_DIR+'//val//images//', transform=val_transform)

    val_loader   = DataLoader(val_ds, batch_size=batch_size, shuffle=True)

    return train_loader, val_loader

train_loader, val_loader = get_loaders(IMG_DIR, BATCH_SIZE, train_transform, val_transform)

def check_accuracy(loader, model, device="cuda"):
    num_correct = 0
    num_pixels  = 0
    dice_score  = 0
    iou_score   = 0
    model.eval()

    with torch.no_grad():
        for image, mask in loader:
            image = image.to(device)
            mask  = mask.to(device)
            predictions = model(image)
            pred_labels = torch.argmax(predictions, dim=1)

            cpred_labels = pred_labels.cpu().detach().numpy()
            cmask = mask.cpu().detach().numpy()
            cciou_score = 0
            ccnum_correct = 0
            ccnum_pixels = 0
            intersection_per_class = np.zeros(NUM_CLASSES)
            union_per_class = np.zeros(NUM_CLASSES)
            for class_idx in range(NUM_CLASSES):
                ccpred_labels = cpred_labels==class_idx
                ccmask = cmask==class_idx
                
                intersection = np.logical_and(ccpred_labels, ccmask)
                union = np.logical_or(ccpred_labels, ccmask)
                intersection_per_class[class_idx] = np.sum(intersection)
                union_per_class[class_idx] = np.sum(union)
                
            iou_per_class = intersection_per_class / (union_per_class + 1e-10)
            iou_score += np.mean(iou_per_class)
    model.train()


model = smp.Unet(encoder_name='efficientnet-b3', in_channels=3, classes=NUM_CLASSES, activation='softmax2d').to(DEVICE)
loss_fn = nn.CrossEntropyLoss(ignore_index=255)
optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

NUM_EPOCHS    = 10

def train_fn(loader, model, optimizer, loss_fn):

    for batch_idx, (image, mask) in enumerate(loader):
        image   = image.to(device=DEVICE)
        mask    = mask.to(device=DEVICE)

        # forward
        predictions = model(image)
        loss = loss_fn(predictions, mask)

        # backward
        model.zero_grad()
        loss.backward()
        optimizer.step()

for epoch in range(NUM_EPOCHS):

    train_fn(train_loader, model, optimizer, loss_fn)
    
    # check accuracy
    check_accuracy(val_loader, model, device=DEVICE)
    # break

torch.save(model.state_dict(), 'model.pth')

return "Model saved in model.pth"

Step 2: Creating the Boston physical layer

Next, we create a Data Loading node and change its view to Code. We load a sample of 100 unlabeled, unseen images.

python
import pandas as pd
df = pd.read_csv('./dataset/gsv/boston_gsv.csv', names=['status','id','lat','lon'])
sample = df[df['status']=='OK'].sample(100, random_state=42)
return sample

Step 3: Computing prediction uncertainty

Now, we create an Analysis & Modeling node to calculate the prediction uncertainty of the model on the new set of unseen data and connect it to the previous node. The goal is to measure the difference between the two highest prediction probabilities in the softmax layer.

import torch
import segmentation_models_pytorch as smp
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from io import BytesIO
import base64

sample = arg

def compute_uncertainty(predictions):
   sorted_probs = np.sort(predictions, axis=1)
   highest_prob = sorted_probs[:, -1, :, :]  # Highest probability for each pixel
   second_highest_prob = sorted_probs[:, -2, :, :]  # Second highest probability
   uncertainty_margin = highest_prob - second_highest_prob
   return 1.0-uncertainty_margin

model = smp.Unet(encoder_name='efficientnet-b3', in_channels=3, classes=6, activation='softmax2d').to('cuda')
model.load_state_dict(torch.load('model.pth'))

color_map = {
   0: (68, 1, 84, 255),
   1: (64, 67, 135, 255),
   2: (41, 120, 142, 255),
   3: (34, 167, 132, 255),
   4: (121, 209, 81, 255),
   5: (253, 231, 36, 255),
}

lats = []
lons = []
uncerts = []
images = []
predicted_images = []
uncert_images = []
for index, row in sample.iterrows():
   image_path = image_path = './dataset/gsv/boston/%s_left.jpg'%row['id']

   pil_image = Image.open(image_path).convert("RGB").resize((320,320))

   image = np.array(pil_image, dtype=np.float32) / 255.0
   predictions = model(torch.from_numpy(image.reshape(1,320,320,3)).permute((0,3,1,2)).to('cuda'))

   pred_labels = torch.argmax(predictions, dim=1)
   pred_array = pred_labels.cpu().numpy()
   pred_array = pred_array.reshape((320, 320))
   pred_pil = Image.new("RGB", (pred_array.shape[1], pred_array.shape[0]))
   for i in range(pred_array.shape[0]):
       for j in range(pred_array.shape[1]):
           pred_pil.putpixel((j, i), color_map[pred_array[i, j]])

   # pred_array = np.uint8((pred_array/2) * 255)
   # pred_array = np.transpose(pred_array, (1, 2, 0))
   # pred_array = np.squeeze(pred_array, axis=2)
   # pred_pil = Image.fromarray(pred_array)

   buffered = BytesIO()
   pred_pil.save(buffered, format="PNG")
   pred_str = base64.b64encode(buffered.getvalue()).decode('utf-8')

   uncertainty_margin = compute_uncertainty(predictions.cpu().detach().numpy())

   uncertainty_array = np.uint8(uncertainty_margin * 255)
   uncertainty_array = np.transpose(uncertainty_array, (1, 2, 0))
   uncertainty_array = np.squeeze(uncertainty_array, axis=2)
   uncertainty_pil = Image.fromarray(uncertainty_array)

   buffered = BytesIO()
   uncertainty_pil.save(buffered, format="PNG")
   uncertainty_str = base64.b64encode(buffered.getvalue()).decode('utf-8')

   lats.append(row['lat'])
   lons.append(row['lon'])
   uncerts.append(float(np.average(uncertainty_margin)))

   buffered = BytesIO()
   pil_image.save(buffered, format="PNG")
   img_str = base64.b64encode(buffered.getvalue()).decode('utf-8')

   images.append(img_str)
   predicted_images.append(pred_str)
   uncert_images.append(uncertainty_str)

return (lats, lons, uncerts, images, predicted_images, uncert_images)

Now, let’s connect this node to a new computation analysis node and create a GeoDataFrame and connect it to a data node, where each image and its associated uncertainty will be represented as a geospatial point feature.

import geopandas as gpd

lats = arg[0]
lons = arg[1]
uncerts = arg[2]
original_images = arg[3]
predicted_images = arg[4]
uncert_images = arg[5]

image_content = list(zip(original_images, predicted_images, uncert_images))

gdf = pd.DataFrame({'lat': lats, 'lon': lons, 'uncertainty': uncerts, 'image_content': image_content})

gdf['image_id'] = gdf.index

gdf = gpd.GeoDataFrame(
   gdf, geometry=gpd.points_from_xy(gdf.lon, gdf.lat), crs="EPSG:4326"
)

gdf = gdf.sort_values(by='image_id', ascending=True)

return gdf

Step 4: Filtering most uncertain images

We then filter the most uncertain images by connecting a “Computation Analysis” node to the previous data node.

df = pd.DataFrame(arg.drop(columns=arg.geometry.name))

df = df[df['interacted'] == '1']

df = df.sort_values(by='uncertainty', ascending=False)

return df.head(20)

Step 5: Visualizing the images

Finally, we visualize the images by simply adding an image node.

Step 6: Loading neighborhood data

Next, we create a Data Loading node and change its view to Code. We load the physical layer describing neighborhoods in Boston:

import geopandas as gpd
# Load neighborhood data
boston = gpd.read_file('Census2020_BlockGroups.shp').to_crs('EPSG:4326')
return boston

Step 7: Merging the data

We now merge the uncertainty data from step 5 with the neighborhood data in step 3, to help us determine the optimal neighborhood from which to sample our next set of images.

To do that, we create a new computation analysis node, and change its view to code, and run the following:

import geopandas as gpd

boston = arg[0]
gdf = arg[1]

def agg_to_list(series):
   return list(series)

joined = gpd.sjoin(boston, gdf).groupby('GEOID20').agg({'uncertainty': 'mean', 'image_id': agg_to_list})
boston = boston.set_index('GEOID20')
boston.loc[joined.index,'uncertainty'] = joined['uncertainty']
boston.loc[joined.index,'image_id'] = joined['image_id']

filtered_boston = boston.loc[joined.index]

filtered_boston = filtered_boston.rename(columns={'image_id': 'linked'})

return filtered_boston

Now, let’s clean the filtered_boston GeoDataFrame by creating a new data cleaning node and connecting it to the previous one.

import geopandas as gpd

filtered_boston = arg

filtered_boston = filtered_boston.loc[:, [filtered_boston.geometry.name, 'uncertainty', 'linked']]

filtered_boston = filtered_boston.set_crs(4326)
filtered_boston = filtered_boston.to_crs(3395)

filtered_boston.metadata = {
   'name': 'boston'
}

return filtered_boston

Step 8: Visualizing prediction uncertainty

In this step, we want to create a spatial map showing the distribution of prediction uncertainties over neighborhoods of Boston.

To achieve that, let’s create a data node and a UTK visualization node connected to the neighborhood data node. UTK’s grammar is automatically populated once an input is received.

Step 9: Analyzing and identifying shortcomings

These nodes are then used to identify potential shortcomings with the model that require new labeled data. The sorted mosaic of images helps identify patterns of failures where the model had the most difficulty classifying. This signals the need for sampling more images with similar light/shadow and built environment conditions:

Final result

The final visualization shows the prediction uncertainties overlaid on the map of Boston’s neighborhoods, facilitating a more targeted approach to image labeling. Given that dense labeling of images is an expensive endeavor, Curio facilitates a more targeted approach, in which conditions are identified and can be used as a guide for labeling.