Finding data sets Part 1: General data sources

I often encounter interesting algorithms or R packages which I want to test. The nice ones provide data for testing but often it is only dummy data. To get a good understanding of the method and its limitations real data might be required. Sometimes I would also like to explore data I have not used … Continue reading Finding data sets Part 1: General data sources

Deriving the Predicted Residual Sum of Squares Statistic

Recently I was looking into measures to evaluate a regularized least squares model. One thing I would have liked was cross-validation to be able to compare different models. When researching possibilities, I discovered PRESS (Predicted Residual Sum of Squares Statistic). The main resources I used to study the subject are: [1] Adrien Bartoli: Maximizing the … Continue reading Deriving the Predicted Residual Sum of Squares Statistic

Cat tracking data collected with my Tractive GPS Pet Tracker

For about one year I have used the Tractive Pet Tracker to track my families (and my) cats, my grandmothers cats and our road trip to Sweden (German). Now I want to share that cat tracking data and some additional data with you. In this post I will Share the raw data with you Filter … Continue reading Cat tracking data collected with my Tractive GPS Pet Tracker

Testing SegNet on real world road scenes

Recently I stumbled over SegNet, a tool which uses a convolutional decoder-encoder (type of neural net) to distinguish between 12 categories (like road, pedestrian, …) that a self-driving car might encounter. I tried this tool with several different pictures of roads and want to share the results with you. The big question is: Would I … Continue reading Testing SegNet on real world road scenes

Adding basemap.at tiles to an R leaflet plot

Recently I wanted to visualise some data in a map of Austria. R Leaflet provides a pretty good looking map by default (openstreetmap.org) but I wanted to use basemap.at, which is a map for Austria and therefore probably the most accurate map available for Austria. Actually it is not very difficult but it was the … Continue reading Adding basemap.at tiles to an R leaflet plot

Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account. First steps in R Install required libraries … Continue reading Use R to connect to twitter and create a wordcloud of your tweets

Use rvest to scrape NFL weather data

If you are following my progress in the Data Science Learning Club you might know that I am using NFL data for the tasks. For predicting sports events I think it is not only important to have statistics about the players, teams and previous games but also about the weather. From when I was a … Continue reading Use rvest to scrape NFL weather data

Learning Club 01: Find and explore a dataset

The first activity of the data science learning club I am participating in is to find and explore a dataset. I already described the data I found and will use in the last post. You can follow all my learning club related activities here. The tasks of this activity are (quoted from the thread above): … Continue reading Learning Club 01: Find and explore a dataset

Learning Club 00.b: Setup your development environment (Get started with python package nfldb)

So, just a few days ago I posted Learning Club 00: Set up your development environment (Getting started with R). There I made a mistake and decided to use R without thinking about the data set I would use. I am still happy I wrote the post because it can give all the R users … Continue reading Learning Club 00.b: Setup your development environment (Get started with python package nfldb)

Learning Club 00: Set up your development environment (Getting started with R)

A few weeks ago I became aware of Renee’s (owner of the blog Becoming a data scientist) plan to start a data science learning club and I thought it was a cool idea. In the learning club she will post activities and the first one was about setting up your development environment: Activity 00: Set … Continue reading Learning Club 00: Set up your development environment (Getting started with R)