Finding data sets Part 2: TV, music, book ratings and sports data

The first part gave a more general overview on where to get data. This section will give you specific data sources, e.g. if you like sports, movies, books, … and so on. Over the next couple of weeks you’ll find these posts on my blog: General data sources TV, music, book ratings and sports data … Continue reading Finding data sets Part 2: TV, music, book ratings and sports data

Finding data sets Part 1: General data sources

I often encounter interesting algorithms or R packages which I want to test. The nice ones provide data for testing but often it is only dummy data. To get a good understanding of the method and its limitations real data might be required. Sometimes I would also like to explore data I have not used … Continue reading Finding data sets Part 1: General data sources

Easy and efficient way to log overwriting of a directory in SQL Server Integration Services (SSIS)

Recently I had the problem that I had a File System Task that moved a file but whenever the file was already there the package failed. So I set OverwriteDestination to TRUE. But now I lost complete control over which files were just moved and which did overwrite some already existing directory. My desired result … Continue reading Easy and efficient way to log overwriting of a directory in SQL Server Integration Services (SSIS)

Workaround: Restore failed (MSSQL Server)

That a database restore fails, can be due to several reasons. I actually didn’t figure out why it didn’t work for me but I found a workaround. Since it is a very similar process I will also show you how you can copy a database on the same server with a different name. Idea 1: … Continue reading Workaround: Restore failed (MSSQL Server)

Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account. First steps in R Install required libraries … Continue reading Use R to connect to twitter and create a wordcloud of your tweets

Learning Club 01: Find and explore a dataset

The first activity of the data science learning club I am participating in is to find and explore a dataset. I already described the data I found and will use in the last post. You can follow all my learning club related activities here. The tasks of this activity are (quoted from the thread above): … Continue reading Learning Club 01: Find and explore a dataset

Learning Club 00: Set up your development environment (Getting started with R)

A few weeks ago I became aware of Renee’s (owner of the blog Becoming a data scientist) plan to start a data science learning club and I thought it was a cool idea. In the learning club she will post activities and the first one was about setting up your development environment: Activity 00: Set … Continue reading Learning Club 00: Set up your development environment (Getting started with R)

Downgrading an ETL project from SSIS 2014 to SSIS 2012

Recently I came into the situation that I had a SSIS package built with Visual Studio 2013 and SQL Server Data Tools (SSDT) 2014 which should be deployed to SQL Server 2012. Running the packages from Visual Studio 2013 and inserting data in a database on SQL Server 2012 worked, it also worked to deploy … Continue reading Downgrading an ETL project from SSIS 2014 to SSIS 2012

[SSIS Errors] The process cannot access the file because it is being used by another process

This is an error message that took me a few hours because I could not find that particular solution on the internet and … I did not look properly at the SSIS output So I’d like to share it with you. Information: 0x402090DC at Load Observations, Flat File Destination [268]: The processing of file “D:\test\Log.txt” … Continue reading [SSIS Errors] The process cannot access the file because it is being used by another process