I am a Year 2 Data Science and Analytics undergraduate in National University of Singapore, with analytical and conceptual knowledge on data analytics and machine learning. My past internship roles revolves around data analytics, training language based models for accurate text summarisation and web development. I have also worked on numerous personal projects which includes building Neural Networks and statistical models for classification problems.
As a self-motivated individual, Ernest took it upon himself to pick up web development skills and I gave him the opportunity to help in front-end web development for our Pill Spectroscopy project where he developed the front-end infrastructure, in which he has completed and has done a good job!
I got to know Ernest when he took CS2040 Data Structures and Algorithms during the first half of 2019. He has shown very good aptitude with the material and with coding in general, and thus will not have a problem academically or in any technical computer science work.
During the period of my mentorship with Ernest, he created wireframes and the backend infrastructure for his web application. From our interactions, I observed that he is a very passionate, responsible and knowledgeable person with an insatiable hunger to learn. He takes it in his stride to learn beyond the classroom and has shown determination in pursuing his interest.
I build a RNN with 2 LSTM layers and 1 embedding layer that can predict a book's genre based on its description. I did manual clipping and padding and implemented an embedding layer from scratch. Following, I tuned the hyperparameters of my model various times to compare each performance outcome to achieve the highest validation accuracy.
I used K-Means clustering to segment customers on an E-commerce platform into categories in terms of their Revenues, Frequency and Recency on the platform. This would tell us how different marketing strategies must be employed for different catgeries of customers to reduce churn rate and increase retention rate.
I compared the difference in validation accuracy between using Logistic Regression and Multinomial Naive Bayes as my model to predict if a food review is positive or negative. Some techniques explored in this project includes sample random under-sampling, TF-IDF vectorizers and using Bokeh and WordCloud for in-depth data visualization.
The aim of this project was to compare the performance between Accelereated Proximal Gradient and Proximal Gradient method in finding the optimal solution in a L-1 regularised logistic regression. This would be crucial when implementing a recommender system or any system that deals with sparsity of input matrices.
Getting data is the first step to data analytics. It is an important skill to be able to scrap data from any kind of static or dynamic websites. My web scraping ability is showcased across numerous projects such as scraping book data and food reviews using BeautifulSoup and Selenium. I hope to explore Scrapy in the future.