Data Analytics with Python and Its Aggregate Function

Hi, today i want to post my personal project notes about data analytics with Python.

This post based on data source that i got from Github’s account the ebook writer on this URL. I have used the dataset sample on path ch02/movielens, while there are 3 datasets (in DAT format file) consist of users.dat, ratings.dat, and movies.dat.

These below are my history Python script that hopefully can be useful.

#import pandas library
import pandas as pd

#unames, is variable as frame of users.dat
unames = ['user_id','gender','age','occupaton','zip']
#rnames, is variable as frame of ratings.dat
rnames = ['user_id','movie_id','rating','timestamp']
ratings = pd.read_table('ch02/movielens/ratings.dat',sep='::',header=None,names=rnames)
#mnames, is variable as frame of movies.dat
mnames = ['movie_id','title','genres']
movies = pd.read_table('ch02/movielens/movies.dat',sep='::',header=None,names = mnames)

#data, is a variable that i've filled with merge result of dataset ratings, users, and movies
data = pd.merge(pd.merge(ratings,users),movies)

#i want to got MEAN view about my data, based on Gender column, and row/index Title
#and i've used aggregate function MEAN
mean_ratings = data.pivot_table('rating',index=['title'],columns='gender',aggfunc='mean')