Capstone Highlights

COMP 4449: Data Science Capstone

Data Science Capstone provides students an opportunity to demonstrate their expertise as data scientists. Students are expected to integrate prior knowledge and skills to design, develop, test, and present ‘full-cycle’ data science products, and apply them in real-world contexts. This includes assessing and communicating their value to decision-making. 

Two sets of challenges of increasing complexity are presented as mid-term and term projects. These two projects are implemented, documented, tested, and presented by the students or student’s two person teams.

Capstone Project Examples

Capstone projects vary widely by student interests, career goals and industry needs. The following are examples of previous capstone products developed by DataScience@Denver graduates.

Data Visualization of Subsea Cable Performance

The goal of this project was to extract topology and performance information for hundreds of subsea cables by scraping the information from the internet. Eventually, the information formed datasets that could be used to determine best routing based on throughput and latency of data metrics. All results were interactive and accessible via an internal website.

Heart Health Algorithm

This project involved pulling heart MRI imaging and video data from Kaggle and using Matplotlib animation to visualize left ventricle contractions. The student created computations of the systole and diastole cross-sectional area ratios drawing on Canny- and Hough-circle detections. The student also employed a binary classification algorithm to model and predict healthy and unhealthy heart function with more than 99% accuracy.

Data Accuracy in Movie Tweets

In this project, a student scraped the web for 1.6 million movie-related tweets and used natural language processing (NPL) sentiment analysis via a convolutional neural network (CNN)—and achieved 92% accuracy. The student used Keras and TensorFlow for the CNN, which was composed of four hidden layers of neurons and ran five epochs.

Data Mapping Network Connections

A student used a fictional Kronos kidnapping as a basis to form a graph depicting the network of connections between the individuals communicating, predicting which individuals were likely to be colluding in the kidnapping. The student used text processing to determine word frequency and determined an estimate of the criminal composition.

Movement Tracking of Amusement Park Guests

In this project, a student sought to solve a fictional crime at an amusement park utilizing information from the fictional DinoFun World to track the movement of park guests throughout a three-day period. The data included time stamps and positions associated with park ride check-ins.

YouTube Country of Origin Prediction Algorithm

In this project, a student used Python library requests to scrape YouTube for meta content with thumbnail color composition sorted by country. The software retrieved JavaScript Object Notation (JSON) objects from the web, wrote out comma-separated values (CSV), and then read it back in. The student used the most frequent colors for prediction, pruned based on a best-features report, and was able to predict the country of origin based on a decision-tree classification algorithm.

Data Classification of Tweet Sentiments

For this project, a student pulled tweets from the Internet and—using a short list of search terms—determined whether the opinions expressed within the tweets were favorable, neutral or unfavorable. This resulted in approximately 800 tweets in each category. An XGBoost classifier predicted which search term produced a given tweet and achieving 76% accuracy.

Data Scientist Job Title Predictions

Using a Kaggle dataset pulled from an Indeed search for data science job listings, this student performed NLP to discover the most-listed job title search words, including “data scientist,” “machine learning engineer” and “data analyst.” Visualizations were rendered for companies hiring and for specific job requirements. The student utilized a stochastic gradient descent classifier (SGD) to predict position from the text of the job description.

Consumer Purchase Data Segmentation

To complete this project, the student performed exploratory data analysis on multinational consumer purchase data across 542,000 transactions. The analysis included NLP, k-means clustering of products, a principal component analysis, visualizations and more and was meant to predict which of five clusters each transaction belonged to. XGBoost achieved 89% accuracy.

Real-World Experience for a Data Science Career

Request Information