Projects

Predicting Baseball Statistics

Skills
Data Science Machine Learning Supervised Learning Unsupervised Learning Predictive Modeling Exploratory Data Analysis Linear Regression Regularization Cross-Validation Python Logistic Regression Data Visualization Oversampling Decision Trees Random Forests scikit-learn Clustering
Company
GitHub

• Performed data cleaning, exploratory data analysis, and created data visualizations using Major League Baseball data with over 2 million records and 91 parameters
• Built and trained predictive models of baseball statistics (e.g., employed oversampling to compensate for class imbalance and predicted home runs with 87% accuracy)
• Compared performance of classification (logistic regression, classification tree, random forest classification) and regression (linear regression, regression tree, random forest regression) techniques