Duplicate rows based on value with condition [duplicate] - python

This question already has answers here:
Pandas - Duplicate Row based on condition
(3 answers)
Closed 2 years ago.
I need to replicate some rows in a panda data frame like this
name times
A 2
B 1
C 3
D 20
...
What I need is to replicate rows just when col2 value is less than 20
What I'm doing now is:
for t in df["times"]:
if t < 20:
df = df.loc[df.index.repeat(t)]
But the script keeps running and I have to stop it (I've been waiting a long time...).
Is there any way to improve this or doing it in another way?

Use:
#condition lt for <
mask = df['times'].lt(20)
#filter by boolean indexing
df1 = df[mask].copy()
#repeat rows
df1 = df1.loc[df1.index.repeat(df1['times'])]
#add rows higher like 20, sorting and create default index
df = pd.concat([df1, df[~mask]]).sort_index().reset_index(drop=True)
print (df)
name times
0 A 2
1 A 2
2 B 1
3 C 3
4 C 3
5 C 3
6 D 20

Related

How to add a new pandas column whose value is conditioned on one column, but value depends on other columns? [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 1 year ago.
I have a dataframe that looks like this:
idx group valA valB
-----------------------
0 A 10 5
1 A 22 7
2 B 9 0
3 B 6 1
I want to add a new column 'val' that takes 'valA' if group = 'A' and takes 'valB' if group = 'B'.
idx group valA valB val
---------------------------
0 A 10 5 10
1 A 22 7 22
2 B 9 0 0
3 B 6 1 1
How can I do this?
This should do the trick
df['val'] = df.apply(lambda x: x['valA'] if x['group'] == 'A' else x['valB'], axis=1)

Python: How to multiply selection of a colomn of a dataframe (without using a for loop) [duplicate]

This question already has answers here:
Pandas conditional creation of a series/dataframe column
(13 answers)
Closed 2 years ago.
I am working with a pandas dataframe, and want to multiply a selection of a specific column. I want to select all the values of col2 less than 5, and multiply them by 10. How do I do this (without using a
for loop)?
df = pandas.DataFrame({'col1':[2,3,5,7], 'col2':[1,2,30,4]})
col1 col2
0 2 1
1 3 2
2 5 30
3 7 4
I tried the following code, but this did not result in the dataframe I want:
df['col2'] = df.col2[df['col2']<5]*10
col1 col2
0 2 10.0
1 3 20.0
2 5 NaN
3 7 40.0
With a for loop, I was able to multiply each value smaller than 5 by 10, but the data set i am working on is quite large, so I would rather not use a for loop.
count=0
for value in df.col2:
if value < 5:
df.col2[count]=df.col2[count]*10
count+=1
Does someone know how to do this?
I managed to do this with Lambda expression:
df["col2"] = df["col2"].apply(lambda x:x*10 if x < 5 else x)
I assumed that numbers larger than 10 stay unchanged

Sort data frame in ascending order by mean of other column [duplicate]

This question already has answers here:
How to sort a dataFrame in python pandas by two or more columns?
(3 answers)
Closed 3 years ago.
I have a data frame:
df =
ID Num
a 3
b 4
b 2
a 1
Want to sort in ascending order by taking into account unique values of ID column
My Try:
df.sort_values(by=['Num'])
But it gave me ascending order by neglecting ID column
Desired output:
df =
ID Num
a 1
a 3
b 2
b 4
Just do:
df.sort_values(['ID', 'Num'])
Output
ID Num
3 a 1
0 a 3
2 b 2
1 b 4

df.sort_values().groupby results in a column length issue [duplicate]

This question already has answers here:
How to move pandas data from index to column after multiple groupby
(4 answers)
How to convert index of a pandas dataframe into a column
(9 answers)
Closed 4 years ago.
This is the original table:
A B C E
0 1 1 5 4
1 1 1 1 1
2 3 3 8 2
I wanted to apply some aggregate functions to this table which I did with:
df.sort_values('C').groupby(['A', 'B'], sort=False).agg({'C': 'sum', 'E': 'last'})
My new table looks like this:
A B C E
1 1 6 4
3 3 8 2
When I measure the column lenght of the original VS the modified table with this command len(df.columns) , the results differ though.
The original table returns 4 columns and the modified table returns 2 columns.
My question: Why did this happen and how can I get to return 4 columns with the modified table?

Replace specific values in Pandas DataFrame [duplicate]

This question already has answers here:
Creating a new column depending on the equality of two other columns [duplicate]
(5 answers)
How to compare two columns of the same dataframe?
(3 answers)
Closed 4 years ago.
I have next DataFrame in Pandas:
data1 = pd.DataFrame(data=[[1, 10, 100], [2,2,200],[3,3,300], [4,40,400]],
columns=['A', 'B', 'C'])
Here it is:
A B C
0 1 10 100
1 2 2 200
2 3 3 300
3 4 40 400
What I want to do: find rows, where 'A' == 'B' and replace for this rows column 'C' value.
So what I want to get:
A B C
0 1 10 100
1 2 2 -1
2 3 3 -1
3 4 40 400
What I already tried:
data1[data1['A']==data1['B']]
So I find necessary rows. Now I try to replace values in this rows:
data1[data1['A']==data1['B']]['C'] = -1
But data1 is the same! Looks like this difficult chain indexing goes wrong or all this operation returns copy of dataframe. But I can't save it to new dataframe, because I used = in last command, I just can't write newdf = data1[...] = -1.
I found also replace function:
data1.replace(data1[data1['A']==data1['B']], "-1")
But it replace all values in row, when I need only last column:
A B C
0 1 10 100
1 -1 -1 -1
2 -1 -1 -1
3 4 40 400
P.S. I know I can do it by using for loop. But I try to find better (more elegant) solution.
use DataFrame.loc!
mask = data1['A'] == data1['B']
data1.loc[mask, 'C'] = -1
You can using pandas mask
data1.C=data1.C.mask(data1.A==data1.B,-1)
data1
Out[371]:
A B C
0 1 10 100
1 2 2 -1
2 3 3 -1
3 4 40 400
data1.loc[data1.A==data1.B,'C']='-1'
df['C'] = np.where(df.A == df.B, -1, df.C)

Categories

Resources