I am learning microservices architecture and trying to map that things in “reporting needs”.
When i learn about big data, i got the future of big data. According to http://www.imovingtolondon.com, there is possibility that blockchain as the future trend of big data. Why…. blockchain started with a crypto currency called Bitcoin, around 2010 or about 6 years ago. In the beginning, it was appealing to a specific crowd but then interest grew. People made money out of investing in it, and many cases related with it. However the more interesting part about Bitcoin came recently, and that was not the currency itself but the technology (and the concept) it is built upon – no central ownership. The rebellious idea of it as a currency, turns out to be a genius idea when it comes to data. It almost seems like the currency is just a byproduct of the amazing data architecture.
Until i have signed up at coinstarter.com. Many people talk about cloud technology, and all of the amazing things about that. Until has founded about its limitiation, at the end of the day it is a case of hard drive ownership. With blockchain that somewhere and someone does not exist – everything is shared within the blocks creating the chain. Blockchain as the storage of Bitcoin transaction data. While it immutable, non reversible, and non forgeable database. Any other database, server, or storage medium is not tamper resistant. The blockchain is an interesting type of data set, and not exactly a database, but more a referral layer.
Hi, today i want to post my personal project notes about data analytics with Python.
This post based on data source that i got from Github’s account the ebook writer on this URL. I have used the dataset sample on path ch02/movielens, while there are 3 datasets (in DAT format file) consist of users.dat, ratings.dat, and movies.dat.
These below are my history Python script that hopefully can be useful.
#import pandas library import pandas as pd #unames, is variable as frame of users.dat unames = ['user_id','gender','age','occupaton','zip'] users=pd.read_table('ch02/movielens/users.dat',sep='::',header=None,names=unames) #rnames, is variable as frame of ratings.dat rnames = ['user_id','movie_id','rating','timestamp'] ratings = pd.read_table('ch02/movielens/ratings.dat',sep='::',header=None,names=rnames) #mnames, is variable as frame of movies.dat mnames = ['movie_id','title','genres'] movies = pd.read_table('ch02/movielens/movies.dat',sep='::',header=None,names = mnames) #data, is a variable that i've filled with merge result of dataset ratings, users, and movies data = pd.merge(pd.merge(ratings,users),movies) #i want to got MEAN view about my data, based on Gender column, and row/index Title #and i've used aggregate function MEAN mean_ratings = data.pivot_table('rating',index=['title'],columns='gender',aggfunc='mean')
Hi, i am on my personal project (again), about Big Data Learning.
Until when i am googling, i have founded this presentation:
It could be open my mind, how to create some approach to manipulation data in order to more data-product-able. As an overview, big data processing by data scientist with steps consist of:
- Explore Data
- Represent Data
- Discover Data
- Learn From Data
- Data Product
- Deliver Insight
- Visualize Insight
For keeping my consistent in personal project, i write it in this post.