groupby .sum() takes only one element in pandas dataframe - python

I have a Pandas dataframe with two columns:
I would like to group the numbers by the column Fee_Code. I do the following:
df.groupby('Fee_Code').sum()
However, as output I get for the row Management fees: 137651.03, or the first value. When I do:
df.groupby('Fee_Code').count()
I do see that Management fees has 2 observations. So why is then .sum() not working?
EDITS:
df.groupby('Fee_Code').get_group('Management fees') returns:

Solved it. My column of values was not numeric, so it was just taking the first element.
To make it numeric I did the following:
df.loc[:, 'Value'] = pd.to_numeric( df.loc[:, 'Value'], downcast='float', errors='coerce')
And then .groupby(..).sum(..) worked perfectly fine.

Related

How to check if two pandas dataframes have same values and concatenate those rows?

I got a DF called "df" with 4 numerical columns [frame,id,x,y]
I made a loop that creates two dataframes called df1 and df2. Both df1 and df2 are subseted of the original dataframe.
What I want to do (and I am not understanding how to do it) is this: I want to CHECK if df1 and df2 have same VALUES in the column called "id". If they do, I want to concatenate those rows of df2 (that have the same id values) to df1.
For example: if df1 has rows with different id values (1,6,4,8) and df2 has this id values (12,7,8,10). I want to concatenate df2 rows that have the id value=8 to df1. That is all I need
This is my code:
for i in range(0,max(df['frame']),30):
df1=df[df['frame'].between(i, i+30)]
df2=df[df['frame'].between(i-30, i)]
There are several ways to accomplish what you need.
The simplest one is to get the slice of df2 that contains the values you need with .isin() and concatenate it with df1 in one line.
df3 = pd.concat([df1, df2[df2.id.isin(df1.id)]], axis = 0)
To gain more control and avoid any errors that might stem from updating df1 and df2 elsewhere, you may want to take the apart this one-liner.
look_for_vals = set(df1['id'].tolist())
# do some stuff
need_ix = df2[df2["id"].isin(look_for_vals )].index
# do more stuff
df3 = pd.concat([df1, df2.loc[need_ix,:]], axis=0)
Instead of set() you may also use df1['id'].unique()

How should I filter one dataframe by entries from another one in pandas with isin?

I have two dataframes (df1, df2). The columns names and indices are the same (the difference in columns entries). Also, df2 has only 20 entries (which also existed in df1 as i said).
I want to filter df1 by df2 entries, but when i try to do it with isin but nothing happens.
df1.isin(df2) or df1.index.isin(df2.index)
Tell me please what I'm doing wrong and how should I do it..
First of all the isin function in pandas returns a Dataframe of booleans and not the result you want. So it makes sense that the cmds you used did not work.
I am possitive that hte following psot will help
pandas - filter dataframe by another dataframe by row elements
If you want to select the entries in df1 with an index that is also present in df2, you should be able to do it with:
df1.loc[df2.index]
or if you really want to use isin:
df1[df1.index.isin(df2.index)]

Pandas - How to combine duplicate items into one with several columns

I have the below DataFrame
As you can see, ItemNo 1 is duplicated three times, and each column has a value corresponding to it.
I am looking for a method to check against all columns, and if they match then put Price, Sales, and Stock as one entry, not three.
Any help will be greatly appreciated.
Simply remove all the NaN instances and redefine the column names
df = df1.apply(lambda x: pd.Series(x.dropna().values), axis=1)
df.columns = ['ItemNo','Category','SIZE','Model','Customer','Week Date','<New col name>']
For converging to one row, you can use groupby like this
df.groupby('ItemNo', as_index=False).first()

Pandas concat columns

I have two df-s:
I want to concatenate along the columns, e.g. get a 1000x61118 DataFrame. so I'm doing:
df_full = pd.concat([df_dev, df_temp2], axis=1)
df_full
This, however, yields a 2000x61118 df, and fills everything with NaNs... And I have no idea why. What could cause this behaviour?
Create default index values by DataFrame.reset_index with drop=True for correct align both DataFrames:
df_full = pd.concat([df_dev.reset_index(drop=True), df_temp2.reset_index(drop=True)], axis=1)

Pandas: add column with the most recent values

I have two pandas dataframes, both index with datetime entries. The df1 has non-unique time indices, whereas df2 has unique ones. I would like to add a column df2.a to df1 in the following way: for every row in df1 with timestamp ts, df1.a should contain the most recent value of df2.a whose timestamp is less then ts.
For example, let's say that df2 is sampled every minute, and there are rows with timestamps 08:00:15, 08:00:47, 08:02:35 in df1. In this case I would like the value from df2.a[08:00:00] to be used for the first two rows, and df2.a[08:02:00] for the third. How can I do this?
You are describing an asof-join, which was just released in pandas 0.19.
pd.merge(df1, df2, left_on='ts', right_on='a')
apply to rows of df1, reindex on df2 with ffill.
df1['df2.a'] = df1.apply(lambda x: pd.Series(df2.a.reindex([x.name]).ffill().values), axis=1)

Categories

Resources