• Hackathon: May 18 - May 29, 2022
  • Registration closes: May 16, 2022

Opening Ceremony: 3-5pm on Wed, May 18 @ DBH 6011 + 6th Floor Patio

Eligibility: University of California, Irvine students of all degrees, majors, backgrounds, and interests are more than welcome to register! That being said, we highly recommend having some solid experience in machine learning in order to succeed in this event.


Today, artificial intelligence and machine learning are being widely used for all sorts of useful and critical discoveries and pattern recognition. Tracking economies, predicting natural disasters, and training robots are just some of the incredible and fascinating things that are being done with machine learning this very minute. In an age of extreme technological development, it only makes sense that our students grow their understanding with it.

This virtual hackathon aims to encourage direct student involvement with real world machine learning applications through the UCI ML Repository. The repository is used by millions worldwide and contains hundreds of datasets used in machine learning research and teaching. The goal of every participating member/team is to use or develop machine learning approaches to build features for the repository. Features available for participants to work on will include visualization of datasets, automatic statistical analysis of datasets and ML methods, evaluation pipeline, dataset recommendation system, natural language processing etc. We hope that the final submission that students draft will provide a useful addition to the services the UCI ML Repository provides to the machine learning community. The most interesting submissions, as judged by repository research team, will receive monetary awards.

Successful teams may be encouraged to continue developing their project after the event jointly with UCI’s machine learning repository research team. Discoveries made from the Machine Learning Hackathon can be applied to other relevant datasets, or can spark a call for action on implementing a solution to an existing problem. All participants will create open source work, allowing other teams or researchers to further their initial progress.

Whether it be a simple, fascinating discovery or significant breakthrough, we are beyond excited to see what YOU can come up with!


This hackathon will last for twelve days. In that time, students will work individually, or in teams up to four people in size, to use or develop machine learning approaches to build features for the UCI ML Repository, which contains hundreds of datasets used in machine learning research.

Correspondence and announcements throughout the duration of the event will happen primarily through Slack. We will provide additional contact information there, if necessary. We will be arranging check-ins, meetings with UCI ML Repository team members, and office hours, should anyone need guidance or assistance. There will be prizes offered to those groups that submit particularly useful or creative deliverables. If their submissions get incorporated into the repository website, award winners will also be credited as contributors to the UCI ML Repository . We encourage students with any academic background or interest to participate!


Overviews of currently available projects are shown below. Required materials will become available to registered participants at the start of the hackathon. More projects may also be added so be on the look out!

Visualization of Datasets

Create an interactive visual way for users to browse datasets in the UCI ML Repository. This should help users find datasets relevant to their needs and find useful datasets they might not have found otherwise.

Automatic Statistician

For each dataset in the UCI ML Repository, produce results on several metrics examining the quality and attributes of the dataset. Determine most helpful metrics based on variability within the datasets as well as how well metrics identify artificial data. Possible features to consider would be, variance of performance between model types, features correlations, balance of labels and features etc.

Evaluation of Datasets

Develop a fully automatic black-box model for classification and regression problems on tabular datasets. This could include automatically infering the problem type of a dataset (binary classification, regression etc.), handling numerical and categorical variables and apply suitable preprocessing, training multiple models on each dataset and display the results in a tabular form, hyperparamter search etc.

Recommendation System

For each dataset in the UCI ML Repository, recommend other datasets in the repository that are similar to it. All available features for each of the datasets will be provided. Additional datasets from other repositories may also be included for training and testing purposes.

NLP for Dataset Parsing

Use NLP methods to extract information from papers introducing or using datasets from the UCI Repository. Relevant information to identity from papers could be the variable used as the target, main results, funding sources etc.


Tamanna Hossain

PhD Student, Computer Science

Faculty Mentors

  • Sameer Singh
  • Padhraic Smyth
  • Philip Papadopoulos

Contact Us


Slack (for registered participants only):



RCIC is generously providing computational resources for the hackathon. Go to the page above to see how to access the machines and run your jobs on the HPC. If you have any questions about the computing resources, post in the #computing channel on Slack.

Frequently Asked Questions

  • I already have a team in mind. How do we register together?
    Every team member needs to register. We will be contacting you about teaming closer to the event.

  • Will there be prizes?
    Yes, most interesting submissions will receive monetary prizes. They will also be credited as contributors to the UCI ML Repository if their submissions get incorporated into the repository website.

  • Will computational resources be provided for the hackathon?
    Yes, we will be providing access to a cluster computing environment for registered hackathon participants. Instructions on how to access resources will be released at the start of the hackathon.

  • When will we be able to access project details?
    Project overviews have been released on this page. Required materials will become available to registered participants at the start of the hackathon. More projects might also be added closer to the hackathon so be on the look out.