Course Description

DATA 201 Probability for Data Science 3-0-3

An introduction to probability, emphasizing the combined use of mathematics and programming. Discrete and continuous families of distributions. Bounds and approximations. Transforms and convergence. Markov chains and Markov Chain Monte Carlo. Dependence, conditioning, Bayesian methods. The multivariate normal, random permutations, symmetry, and order statistics. Use of numerical computation, graphics, simulation, and computer algebra.

Pre-requisites: STAT 201, MATH 208 or MATH 225

DATA 211 Introduction to Data Science 3-0-3

An overview of Data driven approach, Data analytics lifecycle. Basic statistics: Variance, Co-variance, Correlation, Confidence interval and Histogram. Data frames, series, slicing, sorting. Relational database with primary and foreign key. SQL implementation in Python. Data acquisition, cleaning, scrubbing, and manipulation. Correlation analysis, PCA, Linear Regression, Gradient descent, Bayesian classifier, Decision tree, K-means clustering, Hierarchical clustering, Big data, and high-dimensional data. Overview of MapReduce and Hadoop.

Pre-requisites: MATH 102 or MATH 106, ICS 104

DATA 301 Data, Inference, and Decisions 3-0-3

This course covers the probabilistic foundations of inference in data science. Key topics include frequentist and Bayesian decision-making, maximum likelihood estimation, statistical inference and hypothesis testing, false discovery rate control with ROC analysis, Bayesian hierarchical models, rejection and Gibbs sampling, robust methods like bootstrap confidence intervals and permutation based hypothesis tests, nonparametric methods like kernel density estimation and k-nearest neighbors, machine learning fundamentals including decision trees and ensemble methods, equip students with essential skills for data-driven decision-making.

Pre-requisites: DATA 201

DATA 311 Data Engineering 3-0-3

Data lineage lifecycle, including question formulation, data collection and cleaning, and exploratory data analysis (EDA) and visualization. Introduction to statistical concepts such as measurement error. Techniques for scalable data processing concepts in data architecture and data stores (databases, warehousing, data lakes, data streams). Data ingestion and ETL (Extract, Transform, Load) processes. Batch vs. real-time data processing. Construction of data processing pipelines to support analytics and machine learning workflows. Workflow orchestration, automation, and the scheduling and managing of end-to-end data processing pipelines. Data observability and monitoring. Introduction to Infrastructure as Code (IaC) for data engineering. Alignment of data governance and security practices including privacy and compliance. End-to-end data hands-on data projects integrating diverse concepts, leveraging cloud platforms, tools, and techniques to design, build, and deploy data processing pipelines.

Pre-requisites: DATA 211, ICS 202, MATH 208 or MATH 225

DATA 321 Matrix Theory for Data Science 3-0-3

Matrix Operations. Matrix Inverses, Smith Normal Form. LU-Factorization, PLU-Factorization. Determinant and Invertibility, Cramer’s Rule. Eigenvalues and Eigenvectors. Diagonalization, Multiplicity Theorems. Subspaces and Spanning, Null Space, Image Space, Eigenspace. Independence and Dimension. Orthogonality, Expansion Theorem. Rank of a Matrix, Nullity, Rank-Nullity Theorem. Similarity and Diagonalization, Symmetric Matrices. Best Approximation and Least Squares. Orthogonal Diagonalization, Principal Axes Theorem. Positive Definite Matrices, Cholesky Factorization. QR-Factorization, Power Method. Singular Value Decomposition, Pseudoinverse, Penrose Theorem. Unitary Diagonalization, Schur’s Theorem, Spectral Theorem.

Pre-requisites: MATH 208 or MATH 225

Note: Not to be taken for credits with MATH 432

DATA 322 Mathematical Modeling for Data Science 3-0-3

Introduction to mathematical modeling in data science. Classification of mathematical models into linear, nonlinear, and regularized models. Exploration of models for supervised learning (regression, classification) and unsupervised learning (dimension reduction, clustering). Tree-based models and ensemble techniques such as random forests and boosting. Case studies on ridge regression, lasso regression, and support vector machines, with practical applications and insights into model selection and evaluation.

Pre-requisites: DATA 211, MATH 208

DATA 341 Statistical Methods for Data Science 3-0-3

Statistical methods used to solve data problems. Topics include group comparisons and ANOVA, standard parametric statistical models, multiple linear regression, robust regression, logistic regression and classification, bias and variance, and bootstrap method. An important focus of the course is on statistical computing and reproducible statistical analysis. The course includes hands-on experience in analyzing real world data from the social, life, and physical sciences. The R language (or a similar language like Python or Julia) is used.

Pre-requisites: STAT 201 or STAT 214

DATA 351 Human Contexts and Ethics of Data 3-0-3

Introduction to the ethical, societal, and contextual issues surrounding data collection, analysis, and use. Examples from real-world applications; implications of data on privacy, equity, accountability, and societal trust. Case studies, discussions, and projects, enhanced critical thinking skills to analyze and address ethical dilemmas in data practices.

Pre-requisites: COE 292

DATA 361 Fundamentals of Database Systems 3-0-3

Fundamental database concepts, relational data manipulation, data modeling, capturing business rules, normalization, database system development process, transaction, processing, distributed processing, data warehouses, and databases on the web. Database system implementation using a host programming language is to be done as a term project.

Pre-requisites: Junior Standing or ICS 202

Note: Not to be taken for credits with ICS 321

DATA 399 Summer Training                                                                                          0-0-1
A continuous period of 8 weeks spent as a normal employee in industry, business, or government agencies with the purpose of familiarizing students with the real world of work and enabling them to integrate their classroom learning to a real work environment. During this period, a student is exposed to a real-life work in the field. Students are required to submit progress reports during the work period. Students are also required to give a presentation and submit a final report on their experience and the knowledge they gained during their Training.
Pre-requisites: ENGL 214, DATA 311, Completion of at least 85 hours, Major and Cumulative GPA of at least 2.0

This course serves as the initial phase of a two-semester senior-year capstone project in data science. Student teams will leverage the knowledge acquired from previous courses to tackle real-world problems through the development of comprehensive data solutions. The course will focus on the inception and planning stages, including the development of project plans and software requirements specifications. In this first part, students will work on outlining project objectives and requirements, laying the groundwork for subsequent development phases. Following this, student teams will have the option to proceed with the creation of a complete design document or opt for an agile-like methodology.
Pre-requisites: Department Approval

DATA 412 Senior Design Project II                                                                                    0-6-3
This is the second part of a two-semester senior-year capstone project. Student teams employ knowledge gained from courses throughout the program to develop a data solution to a real-world problem from conception to completion. In this part, students review and refine documents prepared in DS411; finalize design, complete implementation of the application, test their code, and evaluate their final product.
Pre-requisites: DATA 411

DATA 421 Optimization for Data Science                                                                         3-0-3
Convex Optimization Problems, Coordinate Descent, Steepest Descent, Improving Directions Methods, Newton, Quasi-Newton, Conjugate-Gradient. Stochastic Gradient Descent. Meta Heuristics such as Evolutionary Algorithms and Particle Swarm. Applications to Data Science and Machine Learning problems.
Pre-requisites: MATH 201, DATA 311

DATA 471 Big Data Analytics                                                                                             3-0-3
Introduction to Big Data Engineering, practical fundamentals of mining massive data and machine learning with theory to aid intuition building; Introduction to theories, concepts, practical contexts, and algorithms for analyzing very large amounts of data; Emphasis on hands-on, contemporary big data engineering skills applicable in research and industry.
Pre-requisites: MATH 101 or MATH 106, STAT 319