How to use Pivot table in non-numeric values? - python

I am working with Pivot function in Pandas:
My input table is:
POI_Entity_ID State
ADD_Q319_143936 Rajasthan
Polyline-Kot-2089 New Delhi
Q111267412 Rajasthan
EL_Q113_32573 Rajasthan
RCE_UDZ_10979 New Delhi
I want my output as:
Sate counts of POI_Entity_ID
Rajasthan 3
New Delhi 2

You can count the number of rows in each group using the pandas.DataFrame.value_counts().
In your case it would look something like:
d = {'State': ['Rajasthan','New Delhi','Rajasthan','Rajasthan','New Delhi'], 'POI_Entity_ID': ['ADD_Q319_143936','Polyline-Kot-2089','Q111267412','EL_Q113_32573','RCE_UDZ_10979']}
df = pd.DataFrame(data=d)
df['State'].value_counts()
The last line produces the following table:
Rajasthan 3
New Delhi 2
You can also use pandas.DataFrame.groupby() combined with the count() method:
df.groupby('State').count()
yields the following table:
POI_Entity_ID
State
New Delhi 2
Rajasthan 3

You can use a pivot table and aggregator function as count, keeping your index as 'State'.
d ={'POI_Entity_ID': ['ADD_Q319_143936','Polyline-Kot-2089','Q111267412','EL_Q113_32573',
'RCE_UDZ_10979'], 'State':['Rajasthan', 'New Delhi' ,'Rajasthan',
'Rajasthan' ,'New Delhi']}
df=pd.DataFrame(data=d)
pivotdf=pd.pivot_table(data=df,index='State',values='POI_Entity_ID',aggfunc='count')
gives you a table like :
POI_Entity_ID
State
New Delhi 2
Rajasthan 3

Related

How to extract unique values from pandas column where values are in list

I want to extract unique cities from city column in pandas dataframe. City column has values in list. How would I extract the cities frequency like:
Lahore 3
Karachi 2
Sydney 1
etc.
Sample dataframe:
Name Age City
a jack 34 [Sydney,Delhi]
b Riti 31 [Lahore,Delhi]
c Aadi 16 [New York, Karachi, Lahore]
d Mohit 32 [Peshawar,Delhi, Karachi]
Thank you
Let us try explode + value_counts
out = df.City.explode().value_counts()

How to aggregate rows in a pandas dataframe

I have a dataframe shown in the image 1. It is a sample of pubs in London,UK (3337 pubs/rows). And the geometry is at an LSOA level. In some LSOAs, there is more than 1 pub. I want my dataframe to summarise the number of pubs in every LSOA. I already have the information by using
psdf['lsoa11nm'].value_counts()
prints out:
City of London 001F 103
City of London 001G 40
Westminster 013B 36
Westminster 018A 36
Westminster 013E 30
...
Lambeth 005A 1
Croydon 043C 1
Hackney 002E 1
Merton 022D 1
Bexley 008B 1
Name: lsoa11nm, Length: 1630, dtype: int64
I cant use this as a new dataframe because it is a key and one column as opposed two columns where one would be lsoa11nm and the other pub count.
Does anyone know how to groupby the dataframe so that there will be only one row for every lsoa, that says how many pubs are in it?

Drop duplicate rows in a dataframe of particular column

I have a dataframe like the following:
Districtname pincode
0 central delhi 110001
1 central delhi 110002
2 central delhi 110003
3 central delhi 110004
4 central delhi 110005
How can I drop rows based on column DistrictName and select the first unique value
The output I want:
Districtname pincode
0 central delhi 110001
Data Frames can be dropped using pandas.DataFrame.drop_duplicates() and defaults to keeping the first occurrence. In your case DataFrame.drop_duplicates(subset = "Districtname") should work. If you would like to update the same DataFrame DataFrame.drop_duplicates(subset = "Districtname", inplace = True) will do the job. Docs: https://pandas.pydata.org/pandas-docs/version/0.17/generated/pandas.DataFrame.drop_duplicates.html
Use drop_duplicates with inplace=true:
df.drop_duplicates('Districtname',inplace=True)

sum using group by not giving expected result

I need to sum values of one column using group by on another column and override the dataframe with those values
I have tried-
df.groupby('S/T name')['Age group (Years)Total Persons'].sum()
Dataframe to implement sum on -
S/T code S/T name city name population
1 NSW Greater sydney 1000
1 NSW rest of nsw 100
1 NSW rest of nsw 2000
2 Victoria Geelong 1200
2 Victoria Melbourne 1300
2 Victoria Melbourne 1000
Required ouput-
S/T code S/T name population
1 NSW 3100
2 Victoria 3500
You seem to be summing on the wrong column in your example, switching to population would have got you most of the way:
df.groupby('S/T name')['population'].sum()
Since you want to retain the S/T code column though you can use agg. Calling sum on your population column and mean on your S/T code column:
df.groupby('S/T name').agg({'population': 'sum', 'S/T code': 'mean'})
Output:
S/T name S/T code population
NSW 1 3100
Victoria 2 3500
Try the following code:
Solution 1
grouped_df = df.groupby('S/T name')['population'].sum()
print(grouped_df)
The above code will group results by column S/T name and give the sum of population column.
Solution 2
grouped_df1 = df.groupby('S/T name').agg({'S/Tcode':'unique','population': 'sum'})
grouped_df1

How to avoid using iloc or hard coding the index number pandas to dynamically fetch rows from single data frame into multiple subsets?

My dataframe looks likes this
country1 state1 city1 District1
india 36 20 40
china 27 21 35
honkong 34 21 38
london 32 21 38
company technology car brand population
adf java Ford 40
ydfh java Hyundai 19
klyu java Nissan 47
hy6g dotnet Toyota 20
rghtr dotnet Hyundai 30
htryr dotnet hummer 12
I wanted to create a multiple subset from single dataframe, I do not wanted to use index number or iloc function or hard coding the index number because it will filter out whenver there is new entry either after entry london or after last entry
If there is any new entry comes it should also needs to be captured, any clues how to perform in pandas or using numpy?
hope this question is clear
Assuming your data frame is saved as df you can use groupby and save the grouped sub-data to a dictionary for future reference.
d = {}
for group, frame in df.groupby('country1'):
d[group] = frame
Also if you want to groupby multiply columns pass a list to groupby as follows
for group, frame in df.groupby(['country1', 'technology']):
d[group] = frame

Categories

Resources