My data looks like:
Club Count
0 AC Milan 2
1 Ajax 1
2 FC Barcelona 4
3 Bayern Munich 2
4 Chelsea 1
5 Dortmund 1
6 FC Porto 1
7 Inter Milan 1
8 Juventus 1
9 Liverpool 2
10 Man U 2
11 Real Madrid 7
I'm trying to plot an Area plot using Club as the X Axis, when plotting all data, it looks correct but the X axis displayed is the index and not the Clubs.
When specifying the index as Club(index=x), it shows correct, but the scale of the y axis is set from 0 to 0.05, assuming that's why nothing is displayed since the count is from 1 to 7 any suggestions ?
Code used:
data.columns = ['Club', 'Count']
x=data.Club
y=data.Count
print(data)
ax.margins(0, 10)
data.plot.area()
df = pd.DataFrame(y,index=x)
df.plot.area()
results:
Change to
df = pd.Series(y,index=x)
df.plot.area()
Related
Data Frame :
city Temperature
0 Chandigarh 15
1 Delhi 22
2 Kanpur 20
3 Chennai 26
4 Manali -2
0 Bengalaru 24
1 Coimbatore 35
2 Srirangam 36
3 Pondicherry 39
I need to create another column in data frame, which contains a boolean value for each city to indicate whether it's a union territory or not. Chandigarh, Pondicherry and Delhi are only 3 union territories here.
I have written below code
import numpy as np
conditions = [df3['city'] == 'Chandigarh',df3['city'] == 'Pondicherry',df3['city'] == 'Delhi']
values =[1,1,1]
df3['territory'] = np.select(conditions, values)
Is there any easier or efficient way that I can write?
You can use isin:
union_terrs = ["Chandigarh", "Pondicherry", "Delhi"]
df3["territory"] = df3["city"].isin(union_terrs).astype(int)
which checks each entry in city column and if it is in union_terrs, gives True and otherwise False. The astype makes True/False to 1/0 conversion,
to get
city Temperature territory
0 Chandigarh 15 1
1 Delhi 22 1
2 Kanpur 20 0
3 Chennai 26 0
4 Manali -2 0
0 Bengalaru 24 0
1 Coimbatore 35 0
2 Srirangam 36 0
3 Pondicherry 39 1
home_team_name home_team_goal_count
0 Bayern München 2
1 Bayern München 2
2 Bayern München 1
3 Köln 2
4 Köln 2
I groupby the data on the variable home_team_name.
df.groupby("home_team_name")
The values of home_team_goal_count can only be 2 or 1. I want to get the minimum number of occurrences
of the values in each group. The result I would want is 1 for Bayern Munchen and 0 for Koln. To illustrate Bayern Munchen has 2 times 2 and 1 times 1, therefore the minimum is 1. Koln has 2 times 2 and 0 time 1 therefore the minimum is 0.
First count values by SeriesGroupBy.value_counts, reshape and add 0 for all combinations 1,2 and last get minimum by min:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.min(axis=1))
print (s)
home_team_name
Bayern München 1
Köln 0
dtype: int64
Details:
print (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0))
home_team_goal_count 1 2
home_team_name
Bayern München 1 2
Köln 0 2
If possible only 1 or only 2 values in input data is necessary reindex:
s = (df.groupby("home_team_name")['home_team_goal_count']
.value_counts()
.unstack(fill_value=0)
.reindex([1, 2], axis=1, fill_value=0)
.min(axis=1))
Let's try using pd.crosstab:
pd.crosstab(df['home_team_name'], df['home_team_goal_count'])\
.reindex([1, 2], axis=1, fill_value=0).min(1)
Result:
home_team_name
Bayern München 1
Köln 0
dtype: int64
import pandas as pd
import numpy as np
list1=['Bayern Munchen','Bayern Munchen','Bayern Munchen','FC Koln','FC Koln']
list2=[2,2,1,2,2]
d={'Home Team Name':list1,'Home Team Goal Count':list2}
data=pd.DataFrame(d)
data['Name']= data['Home Team Name'] +" "+ data['Home Team Goal Count'].astype(str)
data['Name']
Out[39]:
0 Bayern Munchen 2
1 Bayern Munchen 2
2 Bayern Munchen 1
3 FC Koln 2
4 FC Koln 2
name,count=np.unique(data['Name'].tolist(),return_counts=True)
name=[' '.join(x.split(' ')[:-1]) for x in name]
name
Out[99]: ['Bayern Munchen', 'Bayern Munchen', 'FC Koln']
min_val=pd.DataFrame({"Name":name,"Count":count})
name=[]
min_val_count=[]
for x in min_val.Name.unique():
name.append(min_val[min_val.Name!=x].min()[0])
if min_val[min_val.Name!=x].min()[1]==2:
min_val_count.append(0)
else:
min_val_count.append(min_val[min_val.Name!=x].min()[1])
minimum_val_dict=dict(zip(name,min_val_count))
minimum_val_dict
Out[104]: {'FC Koln': 0, 'Bayern Munchen': 1}
A slightly longer version as compared to the answers above.
Even another way to do this would be to use a cateorical variable, since there's a finite set of states. So:
(
df
.astype({"home_team_goal_count": "category"})
.groupby("home_team_name")["home_team_goal_count"]
.apply(lambda x: x.value_counts().min())
)
If you want to know which value occurred the least, you can call .idxmin() instead of .min().
I'm trying to create a line graph that shows the total breweries in a city over a period of time. The graph isn't adding on to the previous data point, it's just starting over from each new date.
This is the data set I'm working with:
Year Opened City Brewery Name
2012 Charlottesville 1
Fredericksburg 1
Norfolk 1
2013 Leesburg 1
Manassas 2
Richmond 2
2014 Fredericksburg 1
Purcellville 3
Richmond 4
Roanoke 3
Virginia Beach 3
2015 Fredericksburg 2
Leesburg 1
Manassas 1
Norfolk 1
Richmond 1
Sterling 1
Virginia Beach 2
Here is the graph code:
# plot data
fig, ax = plt.subplots(figsize=(15,7))
# use unstack()
top10brew_df.unstack().plot(kind='line', y="Brewery Name", ax=ax)
what my graph looks like now
Use reindex and fill values with 0
df = df.pivot(index='Year', columns='Opened City',values='Brewery Name').reindex(range(2012,2015)).fillna(0)
Plot the results
fig, ax = plt.subplots(figsize=(15,7))
plt.xticks(df.index.values.astype(int))
df.plot.line(ax=ax)
The below is my dataframe :
Sno Name Region Num
0 1 Rubin Indore 79744001550
1 2 Rahul Delhi 89824304549
2 3 Rohit Noida 91611611478
3 4 Chirag Delhi 85879761557
4 5 Shan Bharat 95604535786
5 6 Jordi Russia 80777784005
6 7 El Russia 70008700104
7 8 Nino Spain 87707101233
8 9 Mark USA 98271377772
9 10 Pattinson Hawk Eye 87888888889
Retrieve the numbers and store it region wise from the given CSV file.
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
delhi_list = []
for i in range(len(data)):
if data.loc[i]['Region'] == 'Delhi':
delhi_list.append(data.loc[i]['Num'])
I am getting the results, but I want to achieve the data by the use of dictionary in python. Can I use it?
IIUC, you can use groupby, apply the list aggregation then use to_dict:
data.groupby('Region')['Num'].apply(list).to_dict()
[out]
{'Bharat': [95604535786],
'Delhi': [89824304549, 85879761557],
'Hawk Eye': [87888888889],
'Indore': [79744001550],
'Noida': [91611611478],
'Russia': [80777784005, 70008700104],
'Spain': [87707101233],
'USA': [98271377772]}
I am trying to create values for each location of data. I have:
Portafolio Zona Region COM PROV Type of Housing
654738 1 2 3 21 compuesto
65344 3 8 4 22 error
I want to make new columns for each of the types of housing and for their values i want to be able to count how many there are total in each portafolio, zona, region, com, and prov. I have struggled with it for 2 days and I am new to python pandas. It should look like this:
Zona Region COM PROV Compuesto Error
1 2 3 21 24 444
3 8 4 22 34 32
You want pd.pivot_table specifying that the aggregation function is size
df1 = pd.pivot_table(df, index=['Zona', 'Region', 'COM', 'PROV'],
columns='Type of Housing',
aggfunc='size').reset_index()
df1.columns.name=None
Output: df1
Zona Region COM PROV compuesto error
0 1 2 3 21 1.0 NaN
1 3 8 4 22 NaN 1.0