Pandas group by row based values based on condition - python

I need to find event names, date and value (of the column) which meet a condition from the below data using Pandas. I want to find the if any events processed value is less than it's own average value for that particular event.
The condition is for any event_name,
number_of_items_processed < avg(number_of_items_processed)
The output I am looking for is,

Use from this code:
df['m']=df.groupby(by=['event'])['number_of_processed'].transform('mean')
df=df[df.number_of_processed<df.m]

Related

Pandas: I want to create a column in a Time Series where the value depends on the previous row's value

I have a time series dataframe where I need to create a boolean-valued column,
which is
True if the current value is more than 10% different from the previous row's value.
False otherwise
As you have not provided any sample data, so I am not 100% sure, what you exactly want.
But here is what I think you need to do
df=pd.DataFrame([np.random.randn(10,1)],columns=['first'])
df['prev']=df.shift(1)
df['prev']-df['first']>df['first']*0.1

How to extract certain under specific condition in pandas? (Sentimental analysis)

The picture is what my dataframe looks like. I have user_name, movie_name and time column. I want to extract only rows that are first day of certain movie. For example, if movie a's first date in the time column is 2018-06-27, i want all the rows in that date and if movie b's first date in the time column is 2018-06-12, i only want those rows. How would i do that with pandas?
I assume that time column is of datetime type. If not, convert this
column calling pd.to_datetime.
Then run:
df.groupby('movie_name').apply(lambda grp:
grp[grp.time.dt.date == grp.time.min().date()])
Groupby groups the source DataFrame into grops concerning particular films.
Then grp.time.min().date() computes the minimal (first) date from the
current group.
And finally the whole lamda function returns only rows from this date
(also from the current group).
The same for other groups of rows (films).

Pandas - select lowest value to date

I'm new to Pandas.
I've got a dataframe where I want to group by user and then find their lowest score up until that date in the their speed column.
So I can't just use df.groupby(['user'])['speed'].transform('min) as this would give the min of all values not just form the current row to the first.
What can I use to get what I need?
Without seeing your dataset it's hard to help you directly. The problem does boil down to the following. You need to select the range of data you want to work with (so select rows for the date range and columns for the user/speed).
That would look something like x = df.loc[["2-4-2018","2-4-2019"], ['users', 'speed']]
From there you could do a simple x['users'].min() for the value or x['users'].idxmin() for the index of the value.
I haven't played around for a bit with Dataframes, but you're looking for how to slice Dataframes.

How to calculate based on multiple conditions using Python data frames?

I have excel data file with thousands of rows and columns.
I am using python and have started using pandas dataframes to analyze data.
What I want to do in column D is to calculate annual change for values in column C for each year for each ID.
I can use excel to do this – if the org ID is same are that in the prior row, calculate annual change (leaving the cells highlighted in blue because that’s the first period for that particular ID). I don’t know how to do this using python. Can anyone help?
Assuming the dataframe is already sorted
df.groupby(‘ID’).Cash.pct_change()
However, you can speed things up with the assumption things are sorted. Because it’s not necessary to group in order to calculate percentage change from one row to next
df.Cash.pct_change().mask(
df.ID != df.ID.shift()
)
These should produce the column values you are looking for. In order to add the column, you’ll need to assign to a column or create a new dataframe with the new column
df[‘AnnChange’] = df.groupby(‘ID’).Cash.pct_change()

Panda help on a particular request

I apologize for the uninformative title but I need help for a pandas request that I could not resume in a small title.
So I have a dataframe of some orders containing columns for
OrderId
ClientId
OrderDate
ReturnQuantity
I would like to add a boolean column HasReturnedBefore, which is True only if a customer with the same ClientId has made one or more previous order (OrderDate inferior), with a ReturnQuantity greater than 0.
I don't how to take that problem, I am not enough familiar with all the subtleties of pandas at the moment.
If I understand your question correctly, this is what you need:
df.sort_values(by=['ClientId','OrderDate']).assign(HasReturnedBefore = lambda x: (x['ClientId'] == x['ClientId'].shift(1))&(x.groupby('ClientId')['ReturnQuantity'].transform(all)))
First you need to sort_values by the columns that you use to distinguish records - ClientId and OrderDate in this case.
Now you can use assign which used to add new column to dataframe.
In documentation you can see how to use assign but in this case what I did was:
Check if ClientID is the same as the next ClientID and
Check if the user had had all values of ReturnQuantity greater than 0
The reason why the first occurrence of user with multiple orders is false is because it is treated as if it had no previous purchases (which it didn't) but it could be set to True - but it would require additional editing.
Additional functions:
shift - moves all record by the given number of rows
groupby - groups the dataframe by desired columns and provided function
transform - merges the groupby object with existing dataframe

Categories

Resources