Welcome to rs_datasets
This tool allows you download, unpack and read
recommender systems datasets into pandas.DataFrame
as easy as data = Dataset()
.
Installation
pip install rs_datasets
Available datasets
The following datasets are available for automatic download and can be retrieved with this package.
Note
Check dataset license to know available usecases. Authors of this package are not affiliated with dataset contents in any way.
Dataset | Users | Items | Interactions |
---|---|---|---|
Movielens | 162k | 62k | up to 25m |
Million Song Dataset | 1m | 385k | 48m |
Netflix | 480k | 17.7k | 100m |
Goodreads | 800k | 1.5m | 225m |
Last.fm | 360k | 290k | 17.5m |
Epinions | 49k | 140k | 660k |
Book Crossing | 279k | 271k | 1.1m |
Jester | 73k | 100 | 4.1m |
Amazon | ?1 | ?1 | up to 32m |
Rekko | 100k, 500k | 8k | 500k, 9.6m |
Steam | 12k | 5k | 200k |
Anime | 73k | 11k | 7.8m |
Retail Rocket | 1.4m | 235k | 2.7m |
YooChoose | 9m | 52k | 33m, 1m |
Diginetica | 232k | 184k | 1.2m, 18k |
Example of use
from rs_datasets import MovieLens
ml = MovieLens()
ml.info()
ratings
user_id item_id rating timestamp
0 1 1 4.0 964982703
1 1 3 4.0 964981247
2 1 6 4.0 964982224
items
item_id ... genres
0 1 ... Adventure|Animation|Children|Comedy|Fantasy
1 2 ... Adventure|Children|Fantasy
2 3 ... Comedy|Romance
[3 rows x 3 columns]
tags
user_id item_id tag timestamp
0 2 60756 funny 1445714994
1 2 60756 Highly quotable 1445714996
2 2 60756 will ferrell 1445714992
links
item_id imdb_id tmdb_id
0 1 114709 862.0
1 2 113497 8844.0
2 3 113228 15602.0
Loaded DataFrames are available as class attributes.