200 Level Courses
0DATA 201 Probability for Data Science3-0-3
An introduction to probability, emphasizing the combined use of mathematics and programming. Discrete and continuous families of distributions. Bounds and approximations. Transforms and convergence. Markov chains and Markov Chain Monte Carlo. Dependence, conditioning, Bayesian methods. The multivariate normal, random permutations, symmetry, and order statistics. Use of numerical computation, graphics, simulation, and computer algebra.
DATA 211 Introduction to Data Science3-0-3
An overview of Data driven approach, Data analytics lifecycle. Basic statistics: Variance, Co-variance, Correlation, Confidence interval and Histogram. Data frames, series, slicing, sorting. Relational database with primary and foreign key. SQL implementation in Python. Data acquisition, cleaning, scrubbing, and manipulation. Correlation analysis, PCA, Linear Regression, Gradient descent, Bayesian classifier, Decision tree, K-means clustering, Hierarchical clustering, Big data, and high-dimensional data. Overview of MapReduce and Hadoop.
300 Level Courses
0DATA 301 Data, Inference, and Decisions3-0-3
This course covers the probabilistic foundations of inference in data science. Key topics include frequentist and Bayesian decision-making, maximum likelihood estimation, statistical inference and hypothesis testing, false discovery rate control with ROC analysis, Bayesian hierarchical models, rejection and Gibbs sampling, robust methods like bootstrap confidence intervals and permutation based hypothesis tests, nonparametric methods like kernel density estimation and k-nearest neighbors, machine learning fundamentals including decision trees and ensemble methods, equip students with essential skills for data-driven decision-making.
DATA 311 Data Engineering3-0-3
Data lineage lifecycle, including question formulation, data collection and cleaning, and exploratory data analysis (EDA) and visualization. Introduction to statistical concepts such as measurement error. Techniques for scalable data processing concepts in data architecture and data stores (databases, warehousing, data lakes, data streams). Data ingestion and ETL (Extract, Transform, Load) processes. Batch vs. real-time data processing. Construction of data processing pipelines to support analytics and machine learning workflows. Workflow orchestration, automation, and the scheduling and managing of end-to-end data processing pipelines. Data observability and monitoring. Introduction to Infrastructure as Code (IaC) for data engineering. Alignment of data governance and security practices including privacy and compliance. End-to-end data hands-on data projects integrating diverse concepts, leveraging cloud platforms, tools, and techniques to design, build, and deploy data processing pipelines.
DATA 321 Matrix Theory for Data Science3-0-3
Matrix Operations. Matrix Inverses, Smith Normal Form. LU-Factorization, PLU-Factorization. Determinant and Invertibility, Cramer’s Rule. Eigenvalues and Eigenvectors. Diagonalization, Multiplicity Theorems. Subspaces and Spanning, Null Space, Image Space, Eigenspace. Independence and Dimension. Orthogonality, Expansion Theorem. Rank of a Matrix, Nullity, Rank-Nullity Theorem. Similarity and Diagonalization, Symmetric Matrices. Best Approximation and Least Squares. Orthogonal Diagonalization, Principal Axes Theorem. Positive Definite Matrices, Cholesky Factorization. QR-Factorization, Power Method. Singular Value Decomposition, Pseudoinverse, Penrose Theorem. Unitary Diagonalization, Schur’s Theorem, Spectral Theorem.
DATA 322 Mathematical Modeling for Data Science3-0-3
Introduction to mathematical modeling in data science. Classification of mathematical models into linear, nonlinear, and regularized models. Exploration of models for supervised learning (regression, classification) and unsupervised learning (dimension reduction, clustering). Tree-based models and ensemble techniques such as random forests and boosting. Case studies on ridge regression, lasso regression, and support vector machines, with practical applications and insights into model selection and evaluation.
DATA 341 Statistical Methods for Data Science3-0-3
Statistical methods used to solve data problems. Topics include group comparisons and ANOVA, standard parametric statistical models, multiple linear regression, robust regression, logistic regression and classification, bias and variance, and bootstrap method. An important focus of the course is on statistical computing and reproducible statistical analysis. The course includes hands-on experience in analyzing real world data from the social, life, and physical sciences. The R language (or a similar language like Python or Julia) is used.
DATA 361 Fundamentals of Database Systems3-0-3
Fundamental database concepts, relational data manipulation, data modeling, capturing business rules, normalization, database system development process, transaction, processing, distributed processing, data warehouses, and databases on the web. Database system implementation using a host programming language is to be done as a term project.
DATA 391 Human Contexts and Ethics of Data3-0-3
Introduction to the ethical, societal, and contextual issues surrounding data collection, analysis, and use. Examples from real-world applications; implications of data on privacy, equity, accountability, and societal trust. Case studies, discussions, and projects, enhanced critical thinking skills to analyze and address ethical dilemmas in data practices.
DATA 399 Summer Training0-0-1
A continuous period of 8 weeks spent as a normal employee in industry, business, or government agencies with the purpose of familiarizing students with the real world of work and enabling them to integrate their classroom learning to a real work environment. During this period, a student is exposed to a real-life work in the field. Students are required to submit progress reports during the work period. Students are also required to give a presentation and submit a final report on their experience and the knowledge they gained during their Training.
400 Level Courses
0DATA 411 Senior Design Project I0-1-0
This course serves as the initial phase of a two-semester senior-year capstone project in data science. Student teams will leverage the knowledge acquired from previous courses to tackle real-world problems through the development of comprehensive data solutions. The course will focus on the inception and planning stages, including the development of project plans and software requirements specifications. In this first part, students will work on outlining project objectives and requirements, laying the groundwork for subsequent development phases. Following this, student teams will have the option to proceed with the creation of a complete design document or opt for an agile-like methodology.
DATA 412 Senior Design Project II0-6-3
This is the second part of a two-semester senior-year capstone project. Student teams employ knowledge gained from courses throughout the program to develop a data solution to a real-world problem from conception to completion. In this part, students review and refine documents prepared in DS411; finalize design, complete implementation of the application, test their code, and evaluate their final product.
DATA 421 Optimization for Data Science3-0-3
Convex Optimization Problems, Coordinate Descent, Steepest Descent, Improving Directions Methods, Newton, Quasi-Newton, Conjugate-Gradient. Stochastic Gradient Descent. Meta Heuristics such as Evolutionary Algorithms and Particle Swarm. Applications to Data Science and Machine Learning problems.
DATA 441 Large Language Models3-0-3
Theory and practice of Large Language Models including the underlying architectures (transformers, attention mechanisms), training methodologies (pretraining, fine-tuning, instruction-tuning, RLHF), prompt engineering, LLM-based agents, multimodal LLMs, evaluation metrics, deployment, and the ethical and societal challenges of LLMs.
DATA 471 Big Data Analytics3-0-3
Introduction to Big Data Engineering, practical fundamentals of mining massive data and machine learning with theory to aid intuition building; Introduction to theories, concepts, practical contexts, and algorithms for analyzing very large amounts of data; Emphasis on hands-on, contemporary big data engineering skills applicable in research and industry.