returning rows within range in pandas MultiIndex - python

I have a dataframe that looks like:
count
year person
a.smith 1
2008 b.johns 2
c.gilles 3
a.smith 4
2009 b.johns 3
c.gilles 2
in which both year and person are part of the index. I'd like to return all rows with a.smith for all years. I can locate a count for a specific year with df.loc[(2008, 'a.smith)], which outputs 1. But if I try df.loc[(:,'a.smith)], I get SyntaxError: invalid syntax.
How do I use df.loc for a range of index values in a MultiIndex?

Using pd.IndexSlice
idx = pd.IndexSlice
df.loc[idx[:,'a.smith'],:]
Out[200]:
count
year person
2008 a.smith 1
2009 a.smith 4
Data Input
df
Out[211]:
count
year person
2008 a.smith 1
b.johns 2
c.gilles 3
2009 a.smith 4
b.johns 3
c.gilles 2

Related

Need pandas groupby.count() or groupby.size.unstack() to output a dataframe I can use

So I need to count the number of occurrences of a value per year, per animal. I've managed to do it but it's outputting a single column kind of dataframe rather than the data being in workable cells.
I've used:
df.groupby(["Animal", "year"])["value"].count()
and:
df.groupby(["Animal", "year", "value"]).size().unstack(fill_value=0)
and I tried a pivot but it gave an error, in the error message it said too much data but the main error was "Index contains duplicate entries". I've about 13000 rows of data.
I have no idea what to do with the response. I can't call the columns like you would with a Dataframe, I don't know how to. I want to be able to create proportions, like value A is 10% and value B is 90% for animal 1 for year 2020, in a Dataframe.
I tried to do df.to_frame() after the count() created a series, one but it just created a one column DataFrame.
The data is like:
Animal Year Value
1 2020 A
1 2020 A
1 2019 B
1 2019 B
2 2020 A
And I need it to be:
Animal Year A B
1 2020 2 0
1 2019 0 2
2 2020 1 0
But in a full proper Dataframe of 4 columns, not squished into one column. Then I can create the % proportions from there.
Try:
x = df.pivot_table(
index=["Animal", "Year"], columns="Value", aggfunc="size", fill_value=0
).reset_index()
x.columns.name = None
print(x)
Prints:
Animal Year A B
0 1 2019 0 2
1 1 2020 2 0
2 2 2020 1 0

How to crosstab or count dataframe rows by date in pandas

I am fairly new to working with pandas. I have a dataframe with individual entries like this:
dfImport:
id
date_created
date_closed
0
01-07-2020
1
02-09-2020
10-09-2020
2
07-03-2019
02-09-2020
I would like to filter it in a way, that I get the total number of created and closed objects (count id's) grouped by Year and Quarter and Month like this:
dfInOut:
Year
Qrt
month
number_created
number_closed
2019
1
March
1
0
2020
3
July
1
0
September
1
2
I guess I'd have to use some combination of crosstab or group_by, but I tried out alot of ideas and already did research on the problem, but I can't seem to figure out a way. I guess it's an issue of understanding. Thanks in advance!
Use DataFrame.melt with crosstab:
df['date_created'] = pd.to_datetime(df['date_created'], dayfirst=True)
df['date_closed'] = pd.to_datetime(df['date_closed'], dayfirst=True)
df1 = df.melt(value_vars=['date_created','date_closed']).dropna()
df = (pd.crosstab([df1['value'].dt.year.rename('Year'),
df1['value'].dt.quarter.rename('Qrt'),
df1['value'].dt.month.rename('Month')], df1['variable'])
[['date_created','date_closed']])
print (df)
variable date_created date_closed
Year Qrt Month
2019 1 3 1 0
2020 3 7 1 0
9 1 2
df = df.rename_axis(None, axis=1).reset_index()
print (df)
Year Qrt Month date_created date_closed
0 2019 1 3 1 0
1 2020 3 7 1 0
2 2020 3 9 1 2

Index duplicate rows in Python DataFrame

I am trying to add a column to index duplicate rows and order by another column.
Here's the example dataset:
df = pd.DataFrame({'Name' = ['A','A','A','B','B','B','B'], 'Score'=[9,10,10,8,7,8,8], 'Year'=[2019,2018,2017,2019,2018,2017,2016']})
I want to use ['Name', 'Score'] for identifying duplicates. Then index the duplicate order by Year to get following result:
Here rows 2 and 3 are duplicate rows because they have same name and score, so I order them by year and give index.
Is anyone have good idea to realize this in Python? Thank you so much!
You are looking for cumcount:
df['Index'] = (df.sort_values('Year', ascending=False)
.groupby(['Name','Score'])
.cumcount() + 1
)
Output:
Name Score Year Index
0 A 9 2019 1
1 A 10 2018 1
2 A 10 2017 2
3 B 8 2019 1
4 B 7 2018 1
5 B 8 2017 2
6 B 8 2016 3

Want ot count values in column when a differnet column value is fixed in python

I am having the following dataset:
Year Y Z
2018 A 1
2019 B 1
2019 A 1
2019 A 1
2019 A 1
2019 C 1
2020 A 1
Now I want to find the number A in the year 2019 alone using python. How do I find it?
hello you can use the following if df is the name of your dataframe :
sum(df[df['Year']==2019]['Y']==A)
return 3 for your example

Combine pandas DataFrames to give unique element counts

I have a few pandas DataFrames and I am trying to find a good way to calculate and plot the number of times each unique entry occurs across DataFrames. As an example if I had the 2 following DataFrames:
year month
0 1900 1
1 1950 2
2 2000 3
year month
0 1900 1
1 1975 2
2 2000 3
I was thinking maybe there is a way to combine them into a single DataFrame while using a new column counts to keep track of the number of times a unique combination of year + month occurred in any of the DataFrames. From there I figured I could just scatter plot the year + month combinations with their corresponding counts.
year month counts
0 1900 1 2
1 1950 2 1
2 2000 3 2
3 1975 2 1
Is there a good way to achieve this?
concat then using groupby agg
pd.concat([df1,df2]).groupby('year').month.agg(['count','first']).reset_index().rename(columns={'first':'month'})
Out[467]:
year count month
0 1900 2 1
1 1950 1 2
2 1975 1 2
3 2000 2 3

Categories

Resources