How could I get the TP, FP, FN, TN?

hubutui · June 1, 2019, 10:10am

I’m trying to evaluate my segmentation result. I could use sitk.LabelOverlapMeasuresImageFilter to get dice, FPR, FNR. I wonder if there any filter for calculating confusion matrix so that I could calcalate every metric based on it?

zivy · June 2, 2019, 12:22am

Hello @hubutui,

TP,FP, FN, TN are not standard measures for evaluating segmentation so we don’t have a turnkey solution, you can compute them yourself using logical operators between images. The common measures are overlap (Dice, Jaccard) and surface distances (Hausdorff, mean surface etc.). This Jupyter notebook illustrates these common segmentation evaluation measures.

Hopefully these will work for you.

hubutui · June 2, 2019, 3:31am

Thank you. As this paper says, overlap metrics are calculated from TP, FP, TN, FN. It would help the user to calculate other metrics based on these.
Also, sklearn.metrics is also a good choice.

dyoll · September 24, 2021, 8:25am

I know this thread is a bit old, but just to share my experience with sklearn.metrics.confusion_matrix on 3D labelfields. I found the implementation to be quite slow (slower than loading the data and doing inference), so I evaluated a few different options:

import numpy as np
from numba import njit, generated_jit, types
from sklearn.metrics import confusion_matrix as sk_confusion_matrix
import pandas
from timeit import default_timer as timer


def compute_confusion_naive(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_zip(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for ai, bi in zip(a, b):
        cm[ai, bi] += 1
    return cm


@njit
def compute_confusion_numba(a: np.ndarray, b: np.ndarray, num_classes: int = 0):
    if num_classes < 1:
        num_classes = max(np.max(a), np.max(b)) + 1
    cm = np.zeros((num_classes, num_classes))
    for i in range(a.shape[0]):
        cm[a[i], b[i]] += 1
    return cm


def compute_confusion_sklearn(a: np.ndarray, b: np.ndarray):
    return sk_confusion_matrix(a, b)


def compute_confusion_pandas(a: np.ndarray, b: np.ndarray):
    return pandas.crosstab(pandas.Series(a), pandas.Series(b))


if __name__ == "__main__":
    A = np.random.randint(15, size=310*310*310)
    B = np.random.randint(15, size=310*310*310)

    start = timer()
    cm1 = compute_confusion_naive(A, B)
    end = timer()
    print("Naive: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_zip(A, B)
    end = timer()
    print("Naive-Zip: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_numba(A, B)
    end = timer()
    print("Numba: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_sklearn(A, B)
    end = timer()
    print("sklearn: %g s" % (end-start))

    start = timer()
    cm1 = compute_confusion_pandas(A, B)
    end = timer()
    print("pandas: %g s" % (end-start))

The results look like:

Naive: 18.6546 s
Naive-Zip: 17.86 s
Numba: 0.674944 s
sklearn: 18.5911 s
pandas: 5.81173 s

The timing for the numba implementation can be optimized further (by half) if num_classes is known, e.g. using dispatch via generated_jit to skip computing the max of a and b.