Big Data, Bitcoin, and Blockchain

When i learn about big data, i got the future of big data. According to http://www.imovingtolondon.com, there is possibility that blockchain as the future trend of big data. Why…. blockchain started with a crypto currency called Bitcoin, around 2010 or about 6 years ago. In the beginning, it was appealing to a specific crowd but then interest grew. People made money out of investing in it, and many cases related with it. However the more interesting part about Bitcoin came recently, and that was not the currency itself but the technology (and the concept) it is built upon – no central ownership. The rebellious idea of it as a currency, turns out to be a genius idea when it comes to data. It almost seems like the currency is just a byproduct of the amazing data architecture.

Until i have signed up at coinstarter.com. Many people talk about cloud technology, and all of the amazing things about that. Until has founded about its limitiation, at the end of the day it is a case of hard drive ownership. With blockchain that somewhere and someone does not exist – everything is shared within the blocks creating the chain. Blockchain as the storage of Bitcoin transaction data. While it immutable, non reversible, and non forgeable database. Any other database, server, or storage medium is not tamper resistant. The blockchain is an interesting type of data set, and not exactly a database, but more a referral layer.

Reference: imovinglondon.com

Data Analytics with Python and Its Aggregate Function

Hi, today i want to post my personal project notes about data analytics with Python.

This post based on data source that i got from Github’s account the ebook writer on this URL. I have used the dataset sample on path ch02/movielens, while there are 3 datasets (in DAT format file) consist of users.dat, ratings.dat, and movies.dat.

These below are my history Python script that hopefully can be useful.

#import pandas library
import pandas as pd

#unames, is variable as frame of users.dat
unames = ['user_id','gender','age','occupaton','zip']
users=pd.read_table('ch02/movielens/users.dat',sep='::',header=None,names=unames)
#rnames, is variable as frame of ratings.dat
rnames = ['user_id','movie_id','rating','timestamp']
ratings = pd.read_table('ch02/movielens/ratings.dat',sep='::',header=None,names=rnames)
#mnames, is variable as frame of movies.dat
mnames = ['movie_id','title','genres']
movies = pd.read_table('ch02/movielens/movies.dat',sep='::',header=None,names = mnames)

#data, is a variable that i've filled with merge result of dataset ratings, users, and movies
data = pd.merge(pd.merge(ratings,users),movies)

#i want to got MEAN view about my data, based on Gender column, and row/index Title
#and i've used aggregate function MEAN
mean_ratings = data.pivot_table('rating',index=['title'],columns='gender',aggfunc='mean')

Big Data and What Data Scientist Do ?

Hi, i am on my personal project (again), about Big Data Learning.
Until when i am googling, i have founded this presentation:

It could be open my mind, how to create some approach to manipulation data in order to more data-product-able. As an overview, big data processing by data scientist with steps consist of:

  • Explore Data
  • Represent Data
  • Discover Data
  • Learn From Data
  • Data Product
  • Deliver Insight
  • Visualize Insight

For keeping my consistent in personal project, i write it in this post.

Thank you

Data Analysis & Visualization with Python (Preparation Phase)

Hi, i am in my personal project, that is Data Analysis & Visualization with Python.
These below information was my notes for project’s preparation.

Source: “Python for Data Analysis”, by Wes McKinney, O’Reilly Publisher

Python libraries which were needed.
> Pandas
This library could be installed with MacOS Terminal by command “pip install pandas”.
Library which combines about
a) high performance array computing feature from NumPy Library
b) dan, flexible data manipulation capabilities from Spreadsheet dan relational database

> NumPy (Numerical Python)
This library could be installed with MacOS Terminal by command “pip install numpy”.
Library as the source of figure Scientific Computing in Python, the package contents consist of: – a “multidimensional object array” which was called “ndarray”
– source of mathematic functions and many operans for array management
– tools for read-write array based data sets

> (matplotlib)
This library could be installed with MacOS Terminal by command “pip install matplotlib”
Library which usefull for producing plots and 2D data visualizations.

> IPython
This library could be installed with MacOS Terminal by command “pip install ipython”
Standard library for filter scientific in Python, the usage of IPython could be combined with matplotlib.

Thank you