Data Analysis with Pandas
Data Analysis using Pandas¶
Pandas has become the defacto package for data analysis. In this workshop, we are going to use the basics of pandas to analyze the interests of today's group. We are going to use meetup.com's api and fetch the list of interests that are listed in each of our meetup.com profile. We will compute which interests are common, which are uncommon, and find out which of the two members have most similar interests. Lets get started by importing the essentials.
You would need meetup.com's python api and pandas installed.
import meetup.api
import pandas as pd
from IPython.display import Image, display, HTML
from itertools import combinations
Next we need your meetup.com API. You will find it https://secure.meetup.com/meetup_api/key/ Also we need today's event id. The event id created under Chicago Pythonistas is 233460758 and that under Chicago Python user group is 236205125. Use the one that has the higher number of RSVPs so that you get more data points. As an additional exercise, you might go for merging the two sets of RSVPs - but that's not needed for the workshop.
API_KEY = ''
event_id=''
The following function uses the api and loads the data into a pandas data frame. Note we are a bit sloppy both in style and how we load the data. In actual production code, we should add adequate logging with well-defined exceptions to indicate what's going wrong.
def get_members(event_id):
client = meetup.api.Client(API_KEY)
rsvps=client.GetRsvps(event_id=event_id, urlname='_ChiPy_')
member_id = ','.join([str(i['member']['member_id']) for i in rsvps.results])
return client.GetMembers(member_id=member_id)
def get_topics(members):
topics = set()
for member in members.results:
try:
for t in member['topics']:
topics.add(t['name'])
except:
pass
return list(topics)
def df_topics(event_id):
members = get_members(event_id=event_id)
topics = get_topics(members)
columns=['name','id','thumb_link'] + topics
data = []
for member in members.results:
topic_vector = [0]*len(topics)
for topic in member['topics']:
index = topics.index(topic['name'])
topic_vector[index-1] = 1
try:
data.append([member['name'], member['id'], member['photo']['thumb_link']] + topic_vector)
except:
pass
return pd.DataFrame(data=data, columns=columns)
#df.to_csv('output.csv', sep=";")
So you need to call the df_topics function with the event id and it would give you back a pandas dataframe containing basic information of a member and along with all possible interests. If the member has indicated interest, that column will have a one, if not then the column will have a zero.
Load data from meetup.com into a dataframe by calling df_topics with the event id as parameter¶
What does the first and last 10 rows of the dataset look like?¶
What are the column names?¶
Additional Exercise: Can you merge the two data for two events into one data frame and remove the dups?¶
What are the top 10 most common interests of today’s attendees?¶
What is the third most popular and third least popular topic of interest? Are there ties?¶
Which members have the third most popular interest?¶
Which members have the third most popular interest?¶
Which memebers have the highest number of topics of interest?¶
What is the average number of topics of interest?¶
Which two members have the most common overlap of interests?¶
How many members are there who have no overlaps at all?¶
Given a member which other member(s) have the most common interests?¶
Comments
Comments powered by Disqus