How to do for loops with conditions in python data frame - python

I am currently trying to add 1 to an entire column if the value(int) is greater than 0. The code that I am currently using for it is like so:`
for coldcloudy in final.coldcloudy:
final.loc[final['coldcloudy'] > 0,coldcloudy] +=1
However I keep on getting a 'KeyError: 0' with it. Essentially, I want the code to go row by row in a particular column and add 1 if the integer is zero. for the values that are added by 1, I will add to another column. Can someone please help?

You don't need for loop:
final = pd.DataFrame({'coldcloudy':np.random.choice([0,1],20)})
final.loc[final.coldcloudy > 0, 'coldcloudy'] += 1
print(final)
Output:
coldcloudy
0 2
1 2
2 0
3 0
4 2
5 2
6 0
7 2
8 0
9 0
10 2
11 2
12 0
13 2
14 2
15 0
16 2
17 0
18 2
19 2

Related

rewritng a column cell values in a dataframe based on when the value change without using if statment

i have a column with faulty values as it is supposed to count cycles, but the device where the data from resets the count after 50 so i was left with exmalple [1,1,1,1,2,2,2,,3,3,3,3,...,50,50,50,1,1,1,2,2,2,2,3,3,3,...,50,50,.....,50]
My solution is and i cannt even make it work:(for simplicity i made the data resets from 10 cycles
data = {'Cyc-Count':[1,1,2,2,2,3,4,5,6,7,7,7,8,9,10,1,1,1,2,3,3,3,3,
4,4,5,6,6,6,7,8,8,8,8,9,10]}
df = pd.DataFrame(data)
x=0
count=0
old_value=df.at[x,'Cyc-Count']
for x in range(x,len(df)-1):
if df.at[x,'Cyc-Count']==df.at[x+1,'Cyc-Count']:
old_value=df.at[x+1,'Cyc-Count']
df.at[x+1,'Cyc-Count']=count
else:
old_value=df.at[x+1,'Cyc-Count']
count+=1
df.at[x+1,'Cyc-Count']=count
i need to fix this but preferably without even using if statments
the desired output for the upper example should be
data = {'Cyc-Count':[1,1,2,2,2,3,4,5,6,7,7,7,8,9,10,11,11,11,12,13,13,13,13,
14,14,15,16,16,16,17,18,18,18,18,19,20]}
hint" my method has a big issue is that the last indexed value will be hard to change since when comparing it with its index+1 > it dosnt even exist
IIUC, you want to continue the count when the counter decreases.
You can use vectorial code:
s = df['Cyc-Count'].shift()
df['Cyc-Count2'] = (df['Cyc-Count']
+ s.where(s.gt(df['Cyc-Count']))
.fillna(0, downcast='infer')
.cumsum()
)
Or, to modify the column in place:
s = df['Cyc-Count'].shift()
df['Cyc-Count'] += (s.where(s.gt(df['Cyc-Count']))
.fillna(0, downcast='infer').cumsum()
)
output:
Cyc-Count Cyc-Count2
0 1 1
1 1 1
2 1 1
3 1 1
4 2 2
5 2 2
6 2 2
7 3 3
8 3 3
9 3 3
10 3 3
11 4 4
12 5 5
13 5 5
14 5 5
15 1 6
16 1 6
17 1 6
18 2 7
19 2 7
20 2 7
21 2 7
22 3 8
23 3 8
24 3 8
25 4 9
26 5 10
27 5 10
28 1 11
29 2 12
30 2 12
31 3 13
32 4 14
33 5 15
34 5 15
used input:
l = [1,1,1,1,2,2,2,3,3,3,3,4,5,5,5,1,1,1,2,2,2,2,3,3,3,4,5,5,1,2,2,3,4,5,5]
df = pd.DataFrame({'Cyc-Count': l})
You can use df.loc to access a group of rows and columns by label(s) or a boolean array.
syntax: df.loc[df['column name'] condition, 'column name or the new one'] = 'value if condition is met'
for example:
import pandas as pd
numbers = {'set_of_numbers': [1,2,3,4,5,6,7,8,9,10,0,0]}
df = pd.DataFrame(numbers,columns=['set_of_numbers'])
print (df)
df.loc[df['set_of_numbers'] == 0, 'set_of_numbers'] = 999
df.loc[df['set_of_numbers'] == 5, 'set_of_numbers'] = 555
print (df)
before: ‘set_of_numbers’: [1,2,3,4,5,6,7,8,9,10,0,0]
After: ‘set_of_numbers’: [1,2,3,4,555,6,7,8,9,10,999,999]

Apply a function if id equals X

I have a DataFrame like this
id subid a
1 1 1 2
2 1 1 10
3 1 1 20
4 1 2 30
5 1 2 35
6 1 2 36
7 1 2 40
8 2 2 20
9 2 2 29
10 2 2 30
And I want to calculate and save the value of the mean of variable "a" for each id. For example I want the mean of the variable "a" if id=2. And then save that result on a list
This is what I have so far:
for i in range(2):
results=[]
if df.iloc[:,3]==i:
value=np.mean(df)
results.append(value)
I think what you are trying to do is:
df.groupby('id')['a'].mean()
It will return mean of both 1 and 2 but if you want to take only mean of 2 then you can do this:
df.groupby('id')['a'].mean()[2]
By doing this you're only taking mean of a column whose id is 2.
Problems here,
results=[] should be out of loops, otherwise for each time the loop runs, result resets to [].
I'm aware that iloc[:,2] is a column you're looking for.
value = df['a'].mean()

check if value is changed between group of transactions

given this df:
df = pd.DataFrame({"A":[1,0,0,0,0,1,0,1,0,0,1],'B':['enters','A','B','C','D','exit','walk','enters','Q','Q','exit'],"Value":[4,4,4,4,5,6,6,6,6,6,6]})
A B Value
0 1 enters 4
1 0 A 4
2 0 B 4
3 0 C 4
4 0 D 5
5 1 exit 6
6 0 walk 6
7 1 enters 6
8 0 Q 6
9 0 Q 6
10 1 exit 6
There are 2 'transactions' here. When someone enters and leaves. So tx#1 is between 0 and 5 and tx#2 between 7 and 10.
My goal is to show if the value was changed? So in tx1 the value has changed from 4 to 6 and in tx#2 no change. Expected result:
index tx value_before value_after
0 1 4 6
7 2 6 6
I tried to fill the 0 between each tx with 1 and then group but I get all A column as 1. I'm not sure how to define the group by if each tx stands on its own.
Assign a new transaction number on each new 'enter' and pivot:
df['tx'] = np.where(df.B.eq('enters'),1,0).cumsum()
df[df.B.isin(['enters','exit'])].pivot('tx','B','Value')
Result:
B enters exit
tx
1 4 6
2 6 6
Not exactly what you want but it has all the info
df[df['B'].isin(['enters', 'exit'])].drop(['A'], axis=1).reset_index()
index B Value
0 0 enters 4
1 5 exit 6
2 7 enters 6
3 10 exit 6
You can get what you need with cumsum(), and pivot_table():
df['tx'] = np.where(df['B']=='enters',1,0).cumsum()
res = pd.pivot_table(df[df['B'].isin(['enters','exit'])],
index=['tx'],columns=['B'],values='Value'
).reset_index().rename(columns ={'enters':'value_before','exit':'value_after'})
Which prints:
res
tx value_before value_after
0 1 4 6
1 2 6 6
If you always have a sequence "enters - exit" you can create a new dataframe and assign certain values to each column:
result = pd.DataFrame({'tx': [x + 1 for x in range(len(new_df['value_before']))],
'value_before': df['Value'].loc[df['B'] == 'enters'],
'value_after': list(df['Value'].loc[df['B'] == 'exit'])})
Output:
tx value_before value_after
0 1 4 6
7 2 6 6
You can add 'reset_index(drop=True)' at the end if you don't want to see an index from the original dataframe.
I added 'list' for 'value_after' to get right concatenation.

"Drop random rows" from pandas dataframe

In a pandas dataframe, how can I drop a random subset of rows that obey a condition?
In other words, if I have a Pandas dataframe with a Label column, I'd like to drop 50% (or some other percentage) of rows where Label == 1, but keep all of the rest:
Label A -> Label A
0 1 0 1
0 2 0 2
0 3 0 3
1 10 1 11
1 11 1 12
1 12
1 13
I'd love to know the simplest and most pythonic/panda-ish way of doing this!
Edit: This question provides part of an answer, but it only talks about dropping rows by index, disregarding the row values. I'd still like to know how to drop only from rows that are labeled a certain way.
Use the frac argument
df.sample(frac=.5)
If you define the amount you want to drop in a variable n
n = .5
df.sample(frac=1 - n)
To include the condition, use drop
df.drop(df.query('Label == 1').sample(frac=.5).index)
Label A
0 0 1
1 0 2
2 0 3
4 1 11
6 1 13
Using drop with sample
df.drop(df[df.Label.eq(1)].sample(2).index)
Label A
0 0 1
1 0 2
2 0 3
3 1 10
5 1 12

Python random sampling in multiple indices

I have a data frame according to below:
id_1 id_2 value
1 0 1
1 1 2
1 2 3
2 0 4
2 1 1
3 0 5
3 1 1
4 0 5
4 1 1
4 2 6
4 3 7
11 0 8
11 1 14
13 0 10
13 1 9
I would like to take out a random sample of size n, without replacement, from this table based on id_1. This row needs to be unique with respect to the id_1 column and can only occur once.
End result something like:
id_1 id_2 value
1 1 2
2 0 4
4 3 7
13 0 10
I have tried to do a group by and use the indices to take out a row through random.sample but it dosent go all the way.
Can someone give me a pointer on how to make this work? Code for DF below!
As always, thanks for time and input!
/swepab
df = pd.DataFrame({'id_1' : [1,1,1,2,2,3,3,4,4,4,4,11,11,13,13],
'id_2' : [0,1,2,0,1,0,1,0,1,2,3,0,1,0,1],
'value_col' : [1,2,3,4,1,5,1,5,1,6,7,8,14,10,9]})
You can do this using vectorized functions (not loops) using
import numpy as np
uniqued = df.id_1.reindex(np.random.permutation(df.index)).drop_duplicates()
df.ix[np.random.choice(uniqued.index, 1, replace=False)]
uniqued is created by a random shuffle + choice of a unique element by id_1. Then, a random sample (without replacement) is generated on it.
This samples one random per id:
for id in sorted(set(df["id_1"])):
print(df[df["id_1"] == id].sample(1))
PS:
translated above solution using pythons list comprehension, returning a list of of indices:
idx = [df[df["id_1"] == val].sample(1).index[0] for val in sorted(set(df["id_1"]))]

Categories

Resources