Netflix
Classic dataset from the famous Netflix Prize which took place 2006-2009.
No ratings available for test data.
Stats
- 480,189 users
- 17,770 movies
- 100m ratings on the scale from 1 to 5
Example
from rs_datasets import Netflix
netflix = Netflix()
netflix.info()
movies
item_id year title
0 1 2003.0 Dinosaur Planet
1 2 2004.0 Isle of Man TT 2004 Review
2 3 1997.0 Character
test
item_id user_id timestamp
0 1 1046323 2005-12-19
1 1 1080030 2005-12-23
2 1 1830096 2005-03-14
train
item_id user_id rating timestamp
0 373 643460 4 2005-01-26
1 373 349399 5 2002-11-06
2 373 1315469 2 2005-08-15
Warning
It is not recommended to read data without wrapper (df = pd.read_parquet
)
when using PyCharm scientific mode.
PyCharm tries to load all 100m rows to show DataFrame info,
which causes huge memory consumption and freezes.
When loading with a wrapper (as in using this class) it doesn't
load that until you specifically try to show it.