Data Analytics with Python and Its Aggregate Function

Hi, today i want to post my personal project notes about data analytics with Python.

This post based on data source that i got from Github’s account the ebook writer on this URL. I have used the dataset sample on path ch02/movielens, while there are 3 datasets (in DAT format file) consist of users.dat, ratings.dat, and movies.dat.

These below are my history Python script that hopefully can be useful.

#import pandas library
import pandas as pd

#unames, is variable as frame of users.dat
unames = ['user_id','gender','age','occupaton','zip']
#rnames, is variable as frame of ratings.dat
rnames = ['user_id','movie_id','rating','timestamp']
ratings = pd.read_table('ch02/movielens/ratings.dat',sep='::',header=None,names=rnames)
#mnames, is variable as frame of movies.dat
mnames = ['movie_id','title','genres']
movies = pd.read_table('ch02/movielens/movies.dat',sep='::',header=None,names = mnames)

#data, is a variable that i've filled with merge result of dataset ratings, users, and movies
data = pd.merge(pd.merge(ratings,users),movies)

#i want to got MEAN view about my data, based on Gender column, and row/index Title
#and i've used aggregate function MEAN
mean_ratings = data.pivot_table('rating',index=['title'],columns='gender',aggfunc='mean')

Big Data and What Data Scientist Do ?

Hi, i am on my personal project (again), about Big Data Learning.
Until when i am googling, i have founded this presentation:

It could be open my mind, how to create some approach to manipulation data in order to more data-product-able. As an overview, big data processing by data scientist with steps consist of:

  • Explore Data
  • Represent Data
  • Discover Data
  • Learn From Data
  • Data Product
  • Deliver Insight
  • Visualize Insight

For keeping my consistent in personal project, i write it in this post.

Thank you

Data Analysis & Visualization with Python (Preparation Phase)

Hi, i am in my personal project, that is Data Analysis & Visualization with Python.
These below information was my notes for project’s preparation.

Source: “Python for Data Analysis”, by Wes McKinney, O’Reilly Publisher

Python libraries which were needed.
> Pandas
This library could be installed with MacOS Terminal by command “pip install pandas”.
Library which combines about
a) high performance array computing feature from NumPy Library
b) dan, flexible data manipulation capabilities from Spreadsheet dan relational database

> NumPy (Numerical Python)
This library could be installed with MacOS Terminal by command “pip install numpy”.
Library as the source of figure Scientific Computing in Python, the package contents consist of: – a “multidimensional object array” which was called “ndarray”
– source of mathematic functions and many operans for array management
– tools for read-write array based data sets

> (matplotlib)
This library could be installed with MacOS Terminal by command “pip install matplotlib”
Library which usefull for producing plots and 2D data visualizations.

> IPython
This library could be installed with MacOS Terminal by command “pip install ipython”
Standard library for filter scientific in Python, the usage of IPython could be combined with matplotlib.

Thank you