pandas Plot area by specifiying index column

pandas Plot area by specifiying index column - python

My data looks like:
Club Count
0 AC Milan 2
1 Ajax 1
2 FC Barcelona 4
3 Bayern Munich 2
4 Chelsea 1
5 Dortmund 1
6 FC Porto 1
7 Inter Milan 1
8 Juventus 1
9 Liverpool 2
10 Man U 2
11 Real Madrid 7
I'm trying to plot an Area plot using Club as the X Axis, when plotting all data, it looks correct but the X axis displayed is the index and not the Clubs.
When specifying the index as Club(index=x), it shows correct, but the scale of the y axis is set from 0 to 0.05, assuming that's why nothing is displayed since the count is from 1 to 7 any suggestions ?
Code used:
data.columns = ['Club', 'Count']
x=data.Club
y=data.Count
print(data)
ax.margins(0, 10)
data.plot.area()
df = pd.DataFrame(y,index=x)
df.plot.area()
results:

Change to
df = pd.Series(y,index=x)
df.plot.area()

Related

I want to create a new column territory based on the city column

Data Frame :
city Temperature
0 Chandigarh 15
1 Delhi 22
2 Kanpur 20
3 Chennai 26
4 Manali -2
0 Bengalaru 24
1 Coimbatore 35
2 Srirangam 36
3 Pondicherry 39
I need to create another column in data frame, which contains a boolean value for each city to indicate whether it's a union territory or not. Chandigarh, Pondicherry and Delhi are only 3 union territories here.
I have written below code
import numpy as np
conditions = [df3['city'] == 'Chandigarh',df3['city'] == 'Pondicherry',df3['city'] == 'Delhi']
values =[1,1,1]
df3['territory'] = np.select(conditions, values)
Is there any easier or efficient way that I can write?

You can use isin:
union_terrs = ["Chandigarh", "Pondicherry", "Delhi"]
df3["territory"] = df3["city"].isin(union_terrs).astype(int)
which checks each entry in city column and if it is in union_terrs, gives True and otherwise False. The astype makes True/False to 1/0 conversion,
to get
city Temperature territory
0 Chandigarh 15 1
1 Delhi 22 1
2 Kanpur 20 0
3 Chennai 26 0
4 Manali -2 0
0 Bengalaru 24 0
1 Coimbatore 35 0
2 Srirangam 36 0
3 Pondicherry 39 1

How to get minimum number of occurrences of value in pandas groupby

home_team_name home_team_goal_count
0 Bayern München 2
1 Bayern München 2
2 Bayern München 1
3 Köln 2
4 Köln 2
I groupby the data on the variable home_team_name.
df.groupby("home_team_name")
The values of home_team_goal_count can only be 2 or 1. I want to get the minimum number of occurrences
of the values in each group. The result I would want is 1 for Bayern Munchen and 0 for Koln. To illustrate Bayern Munchen has 2 times 2 and 1 times 1, therefore the minimum is 1. Koln has 2 times 2 and 0 time 1 therefore the minimum is 0.

First count values by SeriesGroupBy.value_counts, reshape and add 0 for all combinations 1,2 and last get minimum by min:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.min(axis=1))
print (s)
home_team_name
Bayern München 1
Köln 0
dtype: int64
Details:
print (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0))
home_team_goal_count 1 2
home_team_name
Bayern München 1 2
Köln 0 2
If possible only 1 or only 2 values in input data is necessary reindex:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.reindex([1, 2], axis=1, fill_value=0)
.min(axis=1))

Let's try using pd.crosstab:
pd.crosstab(df['home_team_name'], df['home_team_goal_count'])\
.reindex([1, 2], axis=1, fill_value=0).min(1)
Result:
home_team_name
Bayern München 1
Köln 0
dtype: int64

import pandas as pd
import numpy as np
list1=['Bayern Munchen','Bayern Munchen','Bayern Munchen','FC Koln','FC Koln']
list2=[2,2,1,2,2]
d={'Home Team Name':list1,'Home Team Goal Count':list2}
data=pd.DataFrame(d)
data['Name']= data['Home Team Name'] +" "+ data['Home Team Goal Count'].astype(str)
data['Name']
Out[39]:
0 Bayern Munchen 2
1 Bayern Munchen 2
2 Bayern Munchen 1
3 FC Koln 2
4 FC Koln 2
name,count=np.unique(data['Name'].tolist(),return_counts=True)
name=[' '.join(x.split(' ')[:-1]) for x in name]
name
Out[99]: ['Bayern Munchen', 'Bayern Munchen', 'FC Koln']
min_val=pd.DataFrame({"Name":name,"Count":count})
name=[]
min_val_count=[]
for x in min_val.Name.unique():
name.append(min_val[min_val.Name!=x].min()[0])
if min_val[min_val.Name!=x].min()[1]==2:
min_val_count.append(0)
else:
min_val_count.append(min_val[min_val.Name!=x].min()[1])
minimum_val_dict=dict(zip(name,min_val_count))
minimum_val_dict
Out[104]: {'FC Koln': 0, 'Bayern Munchen': 1}
A slightly longer version as compared to the answers above.

Even another way to do this would be to use a cateorical variable, since there's a finite set of states. So:
(
df
.astype({"home_team_goal_count": "category"})
.groupby("home_team_name")["home_team_goal_count"]
.apply(lambda x: x.value_counts().min())
)
If you want to know which value occurred the least, you can call .idxmin() instead of .min().

How do I get my line graph to add to the previous data point?

I'm trying to create a line graph that shows the total breweries in a city over a period of time. The graph isn't adding on to the previous data point, it's just starting over from each new date.
This is the data set I'm working with:
Year Opened City Brewery Name
2012 Charlottesville 1
Fredericksburg 1
Norfolk 1
2013 Leesburg 1
Manassas 2
Richmond 2
2014 Fredericksburg 1
Purcellville 3
Richmond 4
Roanoke 3
Virginia Beach 3
2015 Fredericksburg 2
Leesburg 1
Manassas 1
Norfolk 1
Richmond 1
Sterling 1
Virginia Beach 2
Here is the graph code:
# plot data
fig, ax = plt.subplots(figsize=(15,7))
# use unstack()
top10brew_df.unstack().plot(kind='line', y="Brewery Name", ax=ax)
what my graph looks like now

Use reindex and fill values with 0
df = df.pivot(index='Year', columns='Opened City',values='Brewery Name').reindex(range(2012,2015)).fillna(0)
Plot the results
fig, ax = plt.subplots(figsize=(15,7))
plt.xticks(df.index.values.astype(int))
df.plot.line(ax=ax)

Retrieve the numbers from the file corresponding to the given regions specified in the file

The below is my dataframe :
Sno Name Region Num
0 1 Rubin Indore 79744001550
1 2 Rahul Delhi 89824304549
2 3 Rohit Noida 91611611478
3 4 Chirag Delhi 85879761557
4 5 Shan Bharat 95604535786
5 6 Jordi Russia 80777784005
6 7 El Russia 70008700104
7 8 Nino Spain 87707101233
8 9 Mark USA 98271377772
9 10 Pattinson Hawk Eye 87888888889
Retrieve the numbers and store it region wise from the given CSV file.
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
I am getting the results, but I want to achieve the data by the use of dictionary in python. Can I use it?

IIUC, you can use groupby, apply the list aggregation then use to_dict:
data.groupby('Region')['Num'].apply(list).to_dict()
[out]
{'Bharat': [95604535786],
'Delhi': [89824304549, 85879761557],
'Hawk Eye': [87888888889],
'Indore': [79744001550],
'Noida': [91611611478],
'Russia': [80777784005, 70008700104],
'Spain': [87707101233],
'USA': [98271377772]}

create separate columns whose titles are based on values in a column

I am trying to create values for each location of data. I have:
Portafolio Zona Region COM PROV Type of Housing
654738 1 2 3 21 compuesto
65344 3 8 4 22 error
I want to make new columns for each of the types of housing and for their values i want to be able to count how many there are total in each portafolio, zona, region, com, and prov. I have struggled with it for 2 days and I am new to python pandas. It should look like this:
Zona Region COM PROV Compuesto Error
1 2 3 21 24 444
3 8 4 22 34 32

You want pd.pivot_table specifying that the aggregation function is size
df1 = pd.pivot_table(df, index=['Zona', 'Region', 'COM', 'PROV'],
columns='Type of Housing',
aggfunc='size').reset_index()
df1.columns.name=None
Output: df1
Zona Region COM PROV compuesto error
0 1 2 3 21 1.0 NaN
1 3 8 4 22 NaN 1.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas Plot area by specifiying index column - python

Change to df = pd.Series(y,index=x) df.plot.area()

Related

I want to create a new column territory based on the city column

How to get minimum number of occurrences of value in pandas groupby

How do I get my line graph to add to the previous data point?

Retrieve the numbers from the file corresponding to the given regions specified in the file

create separate columns whose titles are based on values in a column

Categories

Resources