Welcome to rs_datasets

This tool allows you download, unpack and read recommender systems datasets into pandas.DataFrame as easy as data = Dataset().

Installation

pip install rs_datasets

Available datasets

The following datasets are available for automatic download and can be retrieved with this package.

Note

Check dataset license to know available usecases. Authors of this package are not affiliated with dataset contents in any way.

Dataset	Users	Items	Interactions
Movielens	162k	62k	up to 25m
Million Song Dataset	1m	385k	48m
Netflix	480k	17.7k	100m
Goodreads	800k	1.5m	225m
Last.fm	360k	290k	17.5m
Epinions	49k	140k	660k
Book Crossing	279k	271k	1.1m
Jester	73k	100	4.1m
Amazon	?¹	?¹	up to 32m
Rekko	100k, 500k	8k	500k, 9.6m
Steam	12k	5k	200k
Anime	73k	11k	7.8m
Retail Rocket	1.4m	235k	2.7m
YooChoose	9m	52k	33m, 1m
Diginetica	232k	184k	1.2m, 18k

Example of use

from rs_datasets import MovieLens
ml = MovieLens()
ml.info()

ratings
   user_id  item_id  rating  timestamp
0        1        1     4.0  964982703
1        1        3     4.0  964981247
2        1        6     4.0  964982224
items
   item_id  ...                                       genres
0        1  ...  Adventure|Animation|Children|Comedy|Fantasy
1        2  ...                   Adventure|Children|Fantasy
2        3  ...                               Comedy|Romance
[3 rows x 3 columns]
tags
   user_id  item_id              tag   timestamp
0        2    60756            funny  1445714994
1        2    60756  Highly quotable  1445714996
2        2    60756     will ferrell  1445714992
links
   item_id  imdb_id  tmdb_id
0        1   114709    862.0
1        2   113497   8844.0
2        3   113228  15602.0

Loaded DataFrames are available as class attributes.

Their download speed is extremely slow and I wasn't patient enough to download the biggest one to check this. ↩↩