====== Machine Learning ====== **with Ed Borasky (@znmeb on twitter)** ==== What are the benefits of machine learning? ==== We can let the machine do its own work. It's a hands off approach to managing. ==== Our interests in machine learning: ==== Natural Language Processing Data Hacks Genetic Algorithms K Nearest neighbor algorithms Clustering Support Vector Machines Scalability (a huge problem since some algorithms are in O(n^3) or worse time) ==== Projects we are working on or interested in: ==== Categorizing articles in RSS feeds to make a daily paper from the blogs you read. Finding new, eye opening, news sources and having them brought to us. Sentiment analysis- Commercial applications of determining if someone has a positive or negative opinion of product that they are talking about. This is a difficult problem, complicated by sarcasm and other language use factors. How can machine bridge the correlation to causality gap? ==== Applications to Twitter: ==== Latent Semantic Analysis to reduce the last 200 tweets to simple commonalities using singular value decomposition, shared subjects. We treat each persons tweets as a single document and make a matrix of the terms they used. This can be very slow in R. Bayesian classifiers to filter out annoying tweets. Every tweet is run through a constant time calculation to determine is class. RSS vs. Twitter as a data source. Blogs are more focused on specific topics. ==== What were the ideas behind the Netflix contest? ==== The benefit to Netflix is in the hundreds of millions of dollars. ==== Next Steps: ==== Come see the Write Your Own Bayesian Classifier talk by John Meleski at Open Source Bridge. Possibility of an R language school that would meet twice. First day how to install and set up R. Second day, doing some modeling and data analysis. ==== Further Reading: ==== Toby Segaran’s book, "Programming Collective Intelligence" (O'Reilly, 2007). ?ADD that blog here? ==== Tools: ==== R has a comprehensive NLP library that allows clustering and other techniques. Python Helpers Libraries for Faster Numeric Computing: scipy numpy