Replace number in beginning and last python pandas data frame - python

I have table like this:
id a b
1 -10 -5
2 -5 0
3 0 5
4 5 10
i want to replace to -9999999 and 9999999 in the beginning and last like this:
id a b
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999
is there any possible way to do that?

Use if id is column use DataFrame.iat for set first row and second column and last row and last column by indexing:
df.iat[0, 1] = -9999999
df.iat[-1, -1] = 9999999
print (df)
id a b
0 1 -9999999 -5
1 2 -5 0
2 3 0 5
3 4 5 9999999
Use if id is index set first column:
df.iat[0, 0] = -9999999
df.iat[-1, -1] = 9999999
print (df)
a b
id
1 -9999999 -5
2 -5 0
3 0 5
4 5 9999999

Related

Set value when row is maximum in group by - Python Pandas

I am trying to create a column (is_max) that has either 1 if a column B is the maximum in a group of values of column A or 0 if it is not.
Example:
[Input]
A B
1 2
2 3
1 4
2 5
[Output]
A B is_max
1 2 0
2 5 0
1 4 1
2 3 0
What I'm trying:
df['is_max'] = 0
df.loc[df.reset_index().groupby('A')['B'].idxmax(),'is_max'] = 1
Fix your code by remove the reset_index
df['is_max'] = 0
df.loc[df.groupby('A')['B'].idxmax(),'is_max'] = 1
df
Out[39]:
A B is_max
0 1 2 0
1 2 3 0
2 1 4 1
3 2 5 1
I make assumption A is your group now that you did not state
df['is_max']=(df['B']==df.groupby('A')['B'].transform('max')).astype(int)
or
df1.groupby('A')['B'].apply(lambda x: x==x.max()).astype(int)

ApplyMap function on Multiple columns pandas

I have this dataframe
dd = pd.DataFrame({'a':[1,5,3],'b':[3,2,3],'c':[2,4,5]})
a b c
0 1 3 2
1 5 2 4
2 3 3 5
I just want to replace numbers of column a and b which are smaller than column c numbers. I want to this operation row wise
I did this
dd.applymap(lambda x: 0 if x < x['c'] else x )
I get error
TypeError: 'int' object is not subscriptable
I understood x is a int but how to get value of column c for that row
I want this output
a b c
0 0 3 2
1 5 0 4
2 0 0 5
Use DataFrame.mask with DataFrame.lt:
df = dd.mask(dd.lt(dd['c'], axis=0), 0)
print (df)
a b c
0 0 3 2
1 5 0 4
2 0 0 5
Or you can set values by compare broadcasting by column c:
dd[dd < dd['c'].to_numpy()[:, None]] = 0
print (dd)
a b c
0 0 3 2
1 5 0 4
2 0 0 5

Subtraction of elements column-wise, in pandas

I have the following dataframe:
frame=pd.DataFrame({"col1":[1,5,9,4,7,3],"col2":[5,8,7,9,3,4],"col3":[3,4,2,7,9,1],
"col4":[2,4,7,4,9,0],"col5":[3,4,5,2,1,1],"col6":[8,7,5,4,1,2]})
it results in the following output:
col1 col2 col3 col4 col5 col6
0 1 5 3 2 3 8
1 5 8 4 4 4 7
2 9 7 2 7 5 5
3 4 9 7 4 2 4
4 7 3 9 9 1 1
5 3 4 1 0 1 2
I want to create a new dataframe that differences col1 and col2, col3 and col4 and col5 and col6
Expected output is like that:
col1-col2 col3-col4 col5-col6
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1
Thanks in advance
dfr = pd.DataFrame({'col1-col2': frame.col1 - frame.col2,
'col3-col4': frame.col3 - frame.col4,
'col5-col6': frame.col5 - frame.col6})
If many columns use general solution - select pair and unpair columns, convert to numpy array and create new DataFrame by contructor:
#pandas 0.24+
arr = frame.iloc[:, ::2].to_numpy() - frame.iloc[:, 1::2].to_numpy()
#pandas below
#arr = frame.iloc[:, ::2].values - frame.iloc[:, 1::2].values
c = [f'{a}-{b}' for a, b in zip(frame.columns[::2], frame.columns[1::2])]
df = pd.DataFrame(arr, columns=c)
print (df)
col1-col2 col3-col4 col5-col6
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1
If performance is important, convert to numpy array first, store to variable and then indexing:
#pandas 0.24+
arr = frame.to_numpy()
#pandas below
#arr = frame.values
c = [f'{a}-{b}' for a, b in zip(frame.columns[::2], frame.columns[1::2])]
df = pd.DataFrame(arr[:, ::2] - arr[:, 1::2], columns=c)
df = pd.DataFrame(frame.apply(lambda x: [x['col1']-x['col2'],x['col3']-x['col4'],x['col5']-x['col6']],axis=1).tolist())
df.rename({0:'col1-col2',1:'col3-col4',2:'col4-col5'},axis=1)
col1-col2 col3-col4 col4-col5
0 -4 1 -5
1 -3 0 -3
2 2 -5 0
3 -5 3 -2
4 4 0 0
5 -1 1 -1

Get rows from DataFrame based on array of indices

I have an array with numbers which corresponds to the row numbers that need to be selected from a DataFrame.
For example, arr = np.array([0,0,1,1]) and the DataFrame is seen below. arr is the row number and not the index.
Index A B C D
3 10 0 0 0
4 5 2 0 0
Using arr I would like to produce a DataFrame that looks like this
Index A B C D
3 10 0 0 0
3 10 0 0 0
4 5 2 0 0
4 5 2 0 0
You can use iloc with integer indexing:
df.iloc[[0,0,1,1], :] # or df.iloc[arr, :]
# A B C D
#Index
#3 10 0 0 0
#3 10 0 0 0
#4 5 2 0 0
#4 5 2 0 0

replace rows in a pandas data frame

I want to start with an empty data frame and then add to it one row each time.
I can even start with a 0 data frame data=pd.DataFrame(np.zeros(shape=(10,2)),column=["a","b"]) and then replace one line each time.
How can I do that?
Use .loc for label based selection, it is important you understand how to slice properly: http://pandas.pydata.org/pandas-docs/stable/indexing.html#selection-by-label and understand why you should avoid chained assignment: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
In [14]:
data=pd.DataFrame(np.zeros(shape=(10,2)),columns=["a","b"])
data
Out[14]:
a b
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
[10 rows x 2 columns]
In [15]:
data.loc[2:2,'a':'b']=5,6
data
Out[15]:
a b
0 0 0
1 0 0
2 5 6
3 0 0
4 0 0
5 0 0
6 0 0
7 0 0
8 0 0
9 0 0
[10 rows x 2 columns]
If you are replacing the entire row then you can just use an index and not need row,column slices.
...
data.loc[2]=5,6

Categories

Resources