# Required Courses

**COMP 4441, Introduction to Probability and Statistics for Data Science**

The course introduces fundamentals of probability for data science. Students will survey data visualization methods and summary statistics, develop models for data, and apply statistical techniques to assess the validity of the models. The techniques will include parametric and non-parametric methods for parameter estimation and hypothesis testing for a single sample mean and two sample means, for proportions, and for simple linear regression. Students will acquire sound theoretical footing for the methods where practical, and will apply them to real-world data, primarily using R.

**COMP 3006, Python Software Development**

This accelerated course covers advanced Python programming for data scientists. Course Objectives: name and demonstrate proficiency using advanced Python programming techniques for data science; analyze a programming task and create a development plan and high-level software design that accomplishes the task; relate common portions of the Python standard library to specific programming tasks; understand and apply aspects of the Python scientific programming ecosystem to achieve a data-science analysis goal; collaborate with another data scientist to develop a software program that completes a given data-science task.

**COMP 4581 Algorithms for Data Science**

This course introduces the design and analysis of algorithms within the context of data science. Topics include: data structures, asymptotic complexity and algorithm design techniques such as incremental, divide and conquer, dynamic programming, randomization, greedy algorithms, and advanced sorting techniques. Examples to illustrate techniques are drawn from multi-dimensional clustering (k-means and probabilistic), regression, decision trees, order statistics, data mining using apriori algorithms, and algorithms for generating combinatorial objects. This course is not to be used for the MS Computer Science.

**COMP 4447, Data Science Tools 1**

Organizations are using data science to extract actionable insight from data. To highlight the hidden patterns in the data, this course equips students with essential sills for data collection, cleanup, transformation, feature engineering, summarization, and visualization. Students will do assignments and a final project. This is a hands-on course. Students will use Python libraries, Linux commands, and various data sets to perform these activities.

**COMP 4442, Advanced Probability and Statistics for Data Science**

This course builds on material in Probability and Statistics 1. Students will carry out model fitting and diagnostics for multiple regression, ANOVA, ANCOVA, and generalized linear models. Dimension reductions techniques such as PCA and Lasso are introduced, as are techniques for handling dependent data. The course introduces the principles of resampling and Bayesian Analysis. Students will acquire sound theoretical footing for the methods where practical, and will apply them to real-world data, primarily using R.

**COMP 3421, Database Organization & Management I**

An introductory class in databases explaining what a database is and how to use one. Topics include database design, ER modeling, database normalization, relational algebra, SQL, and B trees. Each student will design, load, query and update a nontrivial database using a relational database management system (RDBMS). An introduction to a NoSQL database will be included.

**COMP 4448, Data Science Tools 2**

Building a successful predictive model is a multi-faceted process. This course focuses on hypothesis testing and the development of predictive models. Students will also learn how to perform graph-based modeling and optimization. Students will do assignments and a final project. This is a hands-on course. Students will use Python libraries, Linux commands, and various data sets to perform these activities.

**COMP 4431, Data Mining**

Data mining is the process of extracting useful information implicitly hidden in large databases. Various techniques from statistics and artificial intelligence are used here to discover hidden patterns in massive collections of data. This course is an introduction to these techniques and their underlying mathematical principles. Topics covered include: basic data analysis, frequent pattern mining, clustering, classification, and model assessment.

**COMP 4432, Machine Learning**

This course will give an overview of machine learning techniques, their strengths and weaknesses, and the problems they are designed to solve. This will include the broad differences between supervised/unsupervised and reinforcement learning as well as associated learning problems such as classification and regression. Techniques covered, at the discretion of the instructor, may include approaches such as linear and logistic regression, neural networks, support vector machines, kNN, decision trees, random forests, Naive Bayes, EM, k-Means, and PCA. After course completion, students will have a working knowledge of these approaches and experience applying them to learning problems.

**COMP 4705, Data Visualization**

This course explores visualization techniques and theory. The course covers how to use visualization tools to effectively present data as part of quantitative statements within a publication/report and as an interactive system. Both design principles (color, layout, scale, and psychology of vision) as well as technical visualization tools/languages will be covered.

**COMP 4449, Data Science Capstone**

Students identify and fill a demand for an innovative data science product, such as a data base tool, analytical software, or domain specific analysis. The product is defined, implemented, documented, tested, and presented by the student or student team with the instructor and other stakeholders acting as project supervisors to verify that goals are met through the 10-week development process.

**COMP 4433, Parallel and Distributed Computing**

Current techniques for effective use of parallel processing and large-scale distributed systems for data science. Programming assignments will give students experience in the use of these techniques. Specific topics will vary from year to year to incorporate recent developments. This course is not to be used for the MS Computer Science.

Are you ready to develop the technical and analytical skills you need to advance your career? Request more information about the online MS in Data Science program.