Hi, today i want to post my personal project notes about data analytics with Python.
This post based on data source that i got from Github’s account the ebook writer on this URL. I have used the dataset sample on path ch02/movielens, while there are 3 datasets (in DAT format file) consist of users.dat, ratings.dat, and movies.dat.
These below are my history Python script that hopefully can be useful.
#import pandas library import pandas as pd #unames, is variable as frame of users.dat unames = ['user_id','gender','age','occupaton','zip'] users=pd.read_table('ch02/movielens/users.dat',sep='::',header=None,names=unames) #rnames, is variable as frame of ratings.dat rnames = ['user_id','movie_id','rating','timestamp'] ratings = pd.read_table('ch02/movielens/ratings.dat',sep='::',header=None,names=rnames) #mnames, is variable as frame of movies.dat mnames = ['movie_id','title','genres'] movies = pd.read_table('ch02/movielens/movies.dat',sep='::',header=None,names = mnames) #data, is a variable that i've filled with merge result of dataset ratings, users, and movies data = pd.merge(pd.merge(ratings,users),movies) #i want to got MEAN view about my data, based on Gender column, and row/index Title #and i've used aggregate function MEAN mean_ratings = data.pivot_table('rating',index=['title'],columns='gender',aggfunc='mean')