2024년 6월 28일 금요일

ISIC 2024 - Skin cancer detection with 3D-TBP

In a time where we're taking more pictures of ourselves than ever before, It's no surprise that some of these pictures may end up saving another purpose.

While most selfies are uploaded to social media, what if we could upload them to our hospitals or doctor's offices as a preventative measure against cancer?

In this competition, you'll develop AI algorithm that can identify historically confirmed skin cancer, including melanoma, basal cell carcinoma, and squamous cell carcinoma, among images of skin lesions.

Your work will help accurately identify skin cancer, helping with early diagnosis and treatment.


Overview

In this competition, you'll develop image-based algorithms to identify histologically confirmed skin cancer cases with single-lesion crops from 3D total body photos (TBP). The image quality resembles close-up smartphone photos, which are regularly submitted for telehealth purposes. Your binary classification algorithm could be used in settings without access to specialized care and improve triage for early skin cancer detection.

개요

이 대회에서는 3D 전신 사진(TBP)에서 단일 병변 작물을 사용하여 조직학적으로 확인된 피부암 사례를 식별하는 이미지 기반 알고리즘을 개발하게 됩니다. 이미지 품질은 원격 의료 목적으로 정기적으로 제출되는 스마트폰 클로즈업 사진과 유사합니다. 이진 분류 알고리즘은 전문적인 진료를 받지 않고도 환경에서 사용할 수 있으며 조기 피부암 발견을 위한 분류를 개선할 수 있습니다.


Description

Skin cancer can be deadly if not caught early, but many populations lack specialized dermatologic care. Over the past several years, dermoscopy-based AI algorithms have been shown to benefit clinicians in diagnosing melanoma, basal cell, and squamous cell carcinoma. However, determining which individuals should see a clinician in the first place has great potential impact. Triaging applications have a significant potential to benefit underserved populations and improve early skin cancer detection, the key factor in long-term patient outcomes.

Dermatoscope images reveal morphologic features not visible to the naked eye, but these images are typically only captured in dermatology clinics. Algorithms that benefit people in primary care or non-clinical settings must be adept to evaluating lower quality images. This competition leverages 3D TBP to present a novel dataset of every single lesion from thousands of patients across three continents with images resembling cell phone photos.

This competition challenges you to develop AI algorithms that differentiate histologically-confirmed malignant skin lesions from benign lesions on a patient. Your work will help to improve early diagnosis and disease prognosis by extending the benefits of automated skin cancer detection to a broader population and settings.

설명 

피부암은 조기에 발견되지 않으면 치명적일 수 있지만, 많은 인구는 전문적인 피부과 치료가 부족합니다. 지난 몇 년 동안 피부경 검사 기반 AI 알고리즘은 흑색종, 기저 세포 및 편평 세포 암종을 진단하는 데 임상의에게 도움이 되는 것으로 나타났습니다. 그러나 어떤 개인이 먼저 임상의를 만나야 하는지 결정하는 것은 잠재적으로 큰 영향을 미칩니다. 선별 애플리케이션은 소외된 집단에게 혜택을 주고 장기적인 환자 결과의 핵심 요소인 조기 피부암 발견을 향상시킬 수 있는 상당한 잠재력을 가지고 있습니다. 피부경 이미지는 육안으로 볼 수 없는 형태학적 특징을 나타내지만 이러한 이미지는 일반적으로 피부과 진료소에서만 캡처됩니다. 일차 진료 또는 비임상 환경에 있는 사람들에게 도움이 되는 알고리즘은 낮은 품질의 이미지를 평가하는 데 능숙해야 합니다. 이 대회에서는 3D TBP를 활용하여 3개 대륙에 걸쳐 수천 명의 환자로부터 얻은 모든 단일 병변에 대한 새로운 데이터 세트를 휴대폰 사진과 유사한 이미지로 제시합니다. 이 대회에서는 조직학적으로 확인된 악성 피부 병변과 환자의 양성 병변을 구별하는 AI 알고리즘을 개발해야 합니다. 귀하의 작업은 자동화된 피부암 감지의 이점을 더 넓은 인구 및 환경으로 확장하여 조기 진단 및 질병 예후를 개선하는 데 도움이 될 것입니다.


Evaluation

Primary Scoring Metric

Submissions are evaluated on partial area under the ROC curve (pAUC) above 80% true positive rate (TPR) for binary classification of malignant examples. (See the implementation in the notebook ISIC pAUC-aboveTPR.)

The receiver operating characteristic (ROC) curve illustrates the diagnostic ability of a given binary classifier system as its discrimination threshold is varied. However, there are regions in the ROC space where the values of TPR are unacceptable in clinical practice. Systems that aid in diagnosing cancers are required to be highly-sensitive, so this metric focuses on the area under the ROC curve AND above 80% TRP. Hence, scores range from [0.0, 0.2].

The shaded regions in the following example represents the pAUC of two arbitrary algorithms (Ca and Cb) at an arbitrary minimum TPR:

평가 


기본 점수 측정항목 

제출물은 악성 사례의 이진 분류에 대해 80% 참양성률(TPR)을 초과하는 ROC 곡선(pAUC) 아래 부분 영역에서 평가됩니다. (노트북 ISIC pAUC-aboveTPR의 구현을 참조하세요.) ROC(수신기 작동 특성) 곡선은 식별 임계값이 변경됨에 따라 특정 이진 분류기 시스템의 진단 능력을 보여줍니다. 그러나 ROC 공간에는 TPR 값이 임상 실습에서 허용되지 않는 영역이 있습니다. 암 진단을 돕는 시스템은 매우 민감해야 하므로 이 지표는 ROC 곡선 아래 영역과 80% TRP 이상에 중점을 둡니다. 따라서 점수 범위는 [0.0, 0.2]입니다. 다음 예에서 음영 처리된 영역은 임의의 최소 TPR에서 두 개의 임의 알고리즘(Ca 및 Cb)의 pAUC를 나타냅니다.


"pAUC defined by constraining TPR" by ProfGigio is licensed under CC-BY-SA-4.0

Submission File

For each image (isic_id) in the test set, you must predict the probability (target) that the lesion is malignant. The file should contain a header and have the following format:

isic_id,target
ISIC_0015657,0.7
ISIC_0015729,0.9
ISIC_0015740,0.8
etc.


Dataset Description

What should I expect the data format to be?

The dataset consists of diagnostically labelled images with additional metadata. The images are JPEGs. The associated .csv file contains a binary diagnostic label (target), potential input variables (e.g. age_approx, sex, anatom_site_general, etc.), and additional attributes (e.g. image source and precise diagnosis).

What am I predicting?

In this challenge you are differentiating benign from malignant cases. For each image (isic_id) you are assigning the probability (target) ranging [0, 1] that the case is malignant.

The SLICE-3D dataset - skin lesion image crops extracted from 3D TBP for skin cancer detection

To mimic non-dermoscopic images, this competition uses standardized cropped lesion-images of lesions from 3D Total Body Photography (TBP). Vectra WB360, a 3D TBP product from Canfield Scientific, captures the complete visible cutaneous surface area in one macro-quality resolution tomographic image. An AI-based software then identifies individual lesions on a given 3D capture. This allows for the image capture and identification of all lesions on a patient, which are exported as individual 15x15 mm field-of-view cropped photos. The dataset contains every lesion from a subset of thousands of patients seen between the years 2015 and 2024 across nine institutions and three continents.

The following are examples from the training set. 'Strongly-labelled tiles' are those whose labels were derived through histopathology assessment. 'Weak-labelled tiles' are those who were not biopsied and were considered 'benign' by a doctor.


데이터 세트 설명 

데이터 형식은 어떻게 됩니까? 

데이터 세트는 추가 메타데이터와 함께 진단 라벨이 지정된 이미지로 구성됩니다. 이미지는 JPEG입니다. 연결된 .csv 파일에는 이진 진단 라벨(대상), 잠재적 입력 변수(예: age_about, sex, anatom_site_general 등) 및 추가 속성(예: 이미지 소스 및 정확한 진단)이 포함되어 있습니다. 

나는 무엇을 예측하고 있습니까? 

이 챌린지에서는 양성 사례와 악성 사례를 구별합니다. 각 이미지(isic_id)에 대해 사례가 악성일 확률(대상) 범위 [0, 1]을 할당합니다. SLICE-3D 데이터 세트 - 피부암 검출을 위해 3D TBP에서 추출한 피부 병변 이미지 작물 비피부경 이미지를 모방하기 위해 이 대회에서는 3D 전신 사진(TBP)의 표준화된 자른 병변 이미지를 사용합니다. 

Canfield Scientific의 3D TBP 제품인 Vectra WB360은 하나의 매크로 품질 해상도 단층 촬영 이미지로 눈에 보이는 피부 표면 전체를 캡처합니다. 그러면 AI 기반 소프트웨어가 주어진 3D 캡처에서 개별 병변을 식별합니다. 이를 통해 환자의 모든 병변을 이미지 캡처하고 식별할 수 있으며, 이는 개별 15x15mm 시야각 잘린 사진으로 내보낼 수 있습니다. 데이터 세트에는 2015년부터 2024년까지 3개 대륙, 9개 기관에서 관찰된 수천 명의 환자 하위 집합의 모든 병변이 포함되어 있습니다. 다음은 훈련 세트의 예입니다. '강하게 라벨이 붙은 타일'은 조직병리학 평가를 통해 라벨이 파생된 타일입니다. '약한 라벨 타일'은 생검을 받지 않았고 의사가 '양성'으로 간주한 타일입니다.




Files

train-image/ - image files for the training set (provided for train only)

train-image.hdf5 - training image data contained in a single hdf5 file, with the isic_id as key

train-metadata.csv - metadata for the training set

test-image.hdf5 - test image data contained in a single hdf5 file, with the isic_id as key. This contains 3 test examples to ensure your inference pipeline works correctly. When the submitted notebook is rerun, this file is swapped with the full hidden test set, which contains approximately 500k images.

test-metadata.csv - metadata for the test subset

sample_submission.csv - a sample submission file in the correct format


Columns included only in train-metadata.csv

field namedescription
targetBinary class {0: benign, 1: malignant}.
lesion_idUnique lesion identifier. Present in lesions that were manually tagged as a lesion of interest.
iddx_fullFully classified lesion diagnosis.
iddx_1First level lesion diagnosis.
iddx_2Second level lesion diagnosis.
iddx_3Third level lesion diagnosis.
iddx_4Fourth level lesion diagnosis.
iddx_5Fifth level lesion diagnosis.
mel_mitotic_indexMitotic index of invasive malignant melanomas.
mel_thick_mmThickness in depth of melanoma invasion.
tbp_lv_dnn_lesion_confidenceLesion confidence score (0-100 scale).+

'+ D’Alessandro, B. "Methods and apparatus for identifying skin features of interest." US Patent #11,164,670. (2021).

Columns in both train-metadata.csv and test-metadata.csv

field namedescription
isic_idUnique case identifier.
patient_id Unique patient identifier.
age_approxApproximate age of patient at time of imaging.
sexSex of the person.
anatom_site_generalLocation of the lesion on the patient's body.
clin_size_long_diam_mmMaximum diameter of the lesion (mm).+
image_typeStructured field of the ISIC Archive for image type.
tbp_tile_typeLighting modality of the 3D TBP source image.
tbp_lv_AA inside lesion.+
tbp_lv_AexA outside lesion.+
tbp_lv_BB inside lesion.+
tbp_lv_BextB outside lesion.+
tbp_lv_CChroma inside lesion.+
tbp_lv_CextChroma outside lesion.+
tbp_lv_HHue inside the lesion; calculated as the angle of A* and B* in LAB* color space. Typical values range from 25 (red) to 75 (brown).+
tbp_lv_HextHue outside lesion.+
tbp_lv_LL inside lesion.+
tbp_lv_LextL outside lesion.+
tbp_lv_areaMM2Area of lesion (mm^2).+
tbp_lv_area_perim_ratioBorder jaggedness, the ratio between lesions perimeter and area. Circular lesions will have low values; irregular shaped lesions will have higher values. Values range 0-10.+
tbp_lv_color_std_meanColor irregularity, calculated as the variance of colors within the lesion's boundary.
tbp_lv_deltaAAverage A contrast (inside vs. outside lesion).+
tbp_lv_deltaBAverage B contrast (inside vs. outside lesion).+
tbp_lv_deltaLAverage L contrast (inside vs. outside lesion).+
tbp_lv_deltaLBnormContrast between the lesion and its immediate surrounding skin. Low contrast lesions tend to be faintly visible such as freckles; high contrast lesions tend to be those with darker pigment. Calculated as the average delta LB of the lesion relative to its immediate background in LAB* color space. Typical values range from 5.5 to 25.+
tbp_lv_eccentricityEccentricity.+
tbp_lv_locationClassification of anatomical location, divides arms & legs to upper & lower; torso into thirds.+
tbp_lv_location_simpleClassification of anatomical location, simple.+
tbp_lv_minorAxisMMSmallest lesion diameter (mm).+
tbp_lv_nevi_confidenceNevus confidence score (0-100 scale) is a convolutional neural network classifier estimated probability that the lesion is a nevus. The neural network was trained on approximately 57,000 lesions that were classified and labeled by a dermatologist.+,++
tbp_lv_norm_borderBorder irregularity (0-10 scale); the normalized average of border jaggedness and asymmetry.+
tbp_lv_norm_colorColor variation (0-10 scale); the normalized average of color asymmetry and color irregularity.+
tbp_lv_perimeterMMPerimeter of lesion (mm).+
tbp_lv_radial_color_std_maxColor asymmetry, a measure of asymmetry of the spatial distribution of color within the lesion. This score is calculated by looking at the average standard deviation in LAB* color space within concentric rings originating from the lesion center. Values range 0-10.+
tbp_lv_stdLStandard deviation of L inside lesion.+
tbp_lv_LextStandard deviation of L outside lesion.+
tbp_lv_symm_2axisBorder asymmetry; a measure of asymmetry of the lesion's contour about an axis perpendicular to the lesion's most symmetric axis. Lesions with two axes of symmetry will therefore have low scores (more symmetric), while lesions with only one or zero axes of symmetry will have higher scores (less symmetric). This score is calculated by comparing opposite halves of the lesion contour over many degrees of rotation. The angle where the halves are most similar identifies the principal axis of symmetry, while the second axis of symmetry is perpendicular to the principal axis. Border asymmetry is reported as the asymmetry value about this second axis. Values range 0-10.+
tbp_lv_symm_2axis_angleLesion border asymmetry angle.+
tbp_lv_xX-coordinate of the lesion on 3D TBP.+
tbp_lv_yY-coordinate of the lesion on 3D TBP.+
tbp_lv_zZ-coordinate of the lesion on 3D TBP.+
attributionImage attribution, synonymous with image source.
copyright_licenseCopyright license.

'+ D’Alessandro, B. "Methods and apparatus for identifying skin features of interest." US Patent #11,164,670. (2021).
'++Betz-Stablein, B., et al. Reproducible naevus counts using 3D total body photography and convolutional neural networks. Dermatology. 238, 4–11 (2021).

Dataset Citation

Please cite the SLICE-3D dataset under CC BY-NC 4.0 with the following attribution:

International Skin Imaging Collaboration. SLICE-3D 2024 Challenge Dataset. International Skin Imaging Collaboration https://doi.org/10.34970/2024-slice-3d (2024).

Creative Commons Attribution-Non Commercial 4.0 International License.

The dataset was generated by the International Skin Imaging Collaboration (ISIC) and images are from the following sources: Hospital Clínic de Barcelona, Memorial Sloan Kettering Cancer Center, Hospital of Basel, FNQH Cairns, The University of Queensland, Melanoma Institute Australia, Monash University and Alfred Health, University of Athens Medical School, and Medical University of Vienna.