UCI ML Hackathon: Challenge Datasets

California Wildfires

California Wildfires

Donated by: Casey Graff (email)

(average year) ~50k instances, (all years) ~350k instances. Each instance is an image with multiple channels, each corresponding to a different feature (ground vegetation, observed fire detection, weather) and the target is a single channel image of the next day’s fire detections.

Possible Applications: Image to Image Classification (binary classification per pixel of output image; similar to image segmentation).

^ top

Galaxy Spiral Structure

Spiral Structure Analysis of 1 Million Sloan Galaxies

Donated by: Wayne Hayes (email)

CSV file with description of 856,734 galaxies (+ header line), 338 features each

Possible Applications: Measuring the structure and evolution of galaxies; evolution of the Cosmos at large.

^ top

GPA

Geometric Pose Affordance

Donated by: Zhe Wang (email)

Given 2D location of the person in image size, how do we get the 3d location of the person in root-relative coordinate (xyz location relative to pelvis joint). Number of Instance: training: 222,514, validation: 8,000, testing: 82378 Type of data: human keypoints position extract from image, in 2d (input), and 3d (output), also geometry information (multi-layer depth map) extracted from 2d joint location (input).

Possible Applications: Motion-retarget in Adobe: https://sites.google.com/umich.edu/nik Action recognition and explanation Hollywood motion capture (3D Avatart), animation Sports analysis (NBA, FIFA) Health care (Autism, Parkinson, anthropometry physical rehabilitation) robot learning (action anticipation, affordance learning, assist leaving) self-driving cars (motion prediction) Virtual reality (holelens2, facebook reality lab) Amazon Go Scene understanding and proxemics recognition

^ top

Amyloid Positivity

Clinical memory assessments and biomarkers associated with Alzheimer’s Disease and Related Disorders

Donated by: Michael Lee (email)

Tabular data with 939 cases of 19 variables. Each case is a clinical test of a patient. Variables involve demographic information (age, gender, years of education), protocol information (time since baseline test), memory test outcomes (free recall scores, recognition scores), biomarkers (APOE genotype, beta amyloid), and diagnosis of memory impairment (cognitively normal or impaired).

Possible Applications: Prediction of amyloid status. Prediction of progression to cognitive impairment. Visualization of relationship between memory test performance, biomarkers, and demographics.

^ top

DNS network captures

DNS network captures

Donated by: Zhou Li (email)

DNS data is often captured and used by security companies to find cyber-attacks. There are two pcap files consisting of millions of packets of DNS queries. A portion of them are benign, while others are malicious (e.g., flowing to a domains owned by cyber-attackers). The first one contains various kinds of DNS attacks. The second one contains DNS queries to many algorithm generated domains (Domain generation algorithms, DGA) from various family. DGA domains are often used as rendezvous points linked to command and control servers by malwares.

Possible Applications: The first dataset could be used to build detection system to identify various kind of network attacks based on DNS communication patterns. The second dataset could be used to build detection system to detect DGA domains.

^ top

Satellite Imagery of Cambodia

Satellite imagery of Chbar Mon, Kampong Speu, Cambodia

Donated by: Daniel Parker (email)

These are raster data (satellite images).

Possible Applications: Classify the images according to some simple land types, including: urban, rice fields, other agricultural fields, water, buildings, houses, etc.

^ top

UCI Clinical Data

UCI OMOP DeID database

Donated by: Alessandro Ghigi, Zhaoxian Hu, Wu Fu (email)

DeID clinical data related to 800,000 patients and 15,000,000 visits. Available clinical information: encounters, conditions (diagnoses), procedures, measurements (lab tests and vital signs), drugs, observations.

Possible Applications: Feasibility studies, clinical projects that can run against DeID data.

^ top