Email | LinkedIn | GitHub | Resume
I am currently pursuing my Master of Data Science at UCI and recieved my BS in Data Science from UCSD. I have experience with everything from data engineering to machine learning from my various internships.We’ve all had those games where a loss felt completely out of our control. Sometimes even an easy win isn’t satisfying, and that frustration is what inspired me to take a step toward better matchmaking balance. To predict the outcome of a League of Legends match before it starts, I analyzed player and champion stats to calculate win probabilities and identify the key factors that contribute to a team’s success.
This project would be a first step to introducing machine learning enhanced matchmaking balance, which in turn can give a more enjoyable player experience by making games feel more competitive. This project was also inspired by the DraftRec paper (DraftRec), which included the current best match prediction mode, but with modifications tailored to win prediction rather than draft recommendations.
✅ Linear relationships matter – Logistic Regression outperformed many non-linear models.
✅ Role-based skill differences are important – Player strength in specific roles significantly impacts outcomes.
✅ More complexity isn’t always better – The most complex transformer models underperformed.
✅ Hybrid models are effective – Combining neural networks with traditional models improved accuracy.
🔹 Start simple – Establishing a baseline made it easier to measure real improvements.
🔹 Plan data collection carefully – Fetching match history scaled exponentially, making early prototyping crucial.
🔹 Evaluate models beyond a single metric – Looking at raw precision, recall, and accuracy helped identify true performance gains.
✅ Technical Skills: Machine Learning, Neural Networks, Data Cleaning, Feature Engineering
✅ Tools Used: Python, PyTorch, Riot API, AWS EC2, Scikit-Learn
For more details, refer to the PowerPoint! 🚀
Project overview: In this project, I worked with a 35GB Steam reviews dataset to sharpen my big data and NLP skills using PySpark. The primary focus was on practicing essential PySpark concepts like partitioning, groupbys, joins, and efficient computation techniques to handle large-scale data. I aimed to optimize calculations and improve processing speed, gaining valuable experience in managing big datasets.
I also explored natural language processing using PySpark NLP and John Snow Labs’ NLP library. This involved cleaning the review data, performing entity recognition using pre-trained models, and conducting unsupervised sentiment analysis using TextBlob and clustering. These tasks helped me practice working with complex text data and extracting meaningful insights from unstructured reviews.
One of the main challenges I encountered was setting up PySpark locally due to compatibility issues with various package versions. After overcoming these versioning hurdles, I successfully ran the analysis and achieved my goal of applying both big data techniques and NLP in a real-world context.
Technical Knowledge: Big Data Processing, Natural Language Processing (NLP), Entity Recognition, Sentiment Analysis
Tools: Python, PySpark, PySpark NLP, John Snow Labs NLP, Steam Reviews Dataset (40GB)
Project overview: In our final-year project, my team of two classmates and I (supervised by two professors) explored how soft decision trees (SDTs) can improve machine learning model interpretability while maintaining high performance. Unlike traditional decision trees, which make binary splits, SDTs use probabilistic decisions to handle ambiguity more effectively. Our research focused on applying feature learning techniques and leveraging the Neural Feature Matrix (NFM) to visualize and interpret the features learned by the model.
We tested our approach using datasets like MNIST and CelebA, demonstrating that SDTs could not only match the accuracy of neural networks but also offer insights into the decision-making process. On MNIST, we successfully replicated prior work, showing that SDTs could accurately classify digits and explain the decision paths. For the more complex CelebA dataset, the model identified key facial features, further validating SDTs’ ability to handle real-world data. Finally, we combined our previous neural network with features learned from the SDTs and found the neural network improved its accuracy within the same amount of training epochs by 8%.
Our findings highlight the potential of SDTs to combine the flexibility of neural networks with the transparency of decision trees, improving overall accuracy in both regards. By making decision-making processes clearer, SDTs are promising for applications requiring interpretable AI, such as medical diagnoses or financial models.
Technical Knowledge: Matrix Calculus, Neural Networks, Soft Decision Trees, Feature Learning, Data Visualization, Neural Feature Matrix (NFM)
Tools: Python, PyTorch, MNIST, CelebA
Project overview: In my Fantasy League of Legends project, I aimed to integrate various skills learned in class and through personal study. I built a full-stack application where I used Go to develop a CRUD backend that facilitates API requests between the frontend and a prediction model. The frontend, designed with Next.JS, was deployed on Firebase as an interactive single-page application for users to select players and predict game outcomes.
For the prediction model, I collected data from the RIOT API and trained a PyTorch model in Python. I then deployed this model to an AWS Lambda API endpoint to predict match winners based on the selected players. To ensure reliability, I wrote test cases using Cypress for the frontend, Testify for the backend, and PyTest for the Lambda API endpoint. This project brought together everything from backend and frontend development to machine learning deployment, offering a full end-to-end experience.
Technical Knowledge: Data Collection (RIOT API), Machine Learning, API Development, Frontend/Backend Development
Tools: Python, PyTorch, AWS Lambda, Firebase, Next.JS, Go, Cypress, PyTest, Google Cloud Run
Page template forked from evanca