I have pasted some part of df as given below , but I have more than 400 columns in actual df .
>>> df_final
c d name e f g h g h
0 0 0 aa 0 0 0 0 0 0
1 1 2 bb 1 2 1 2 1 2
2 2 4 cc 2 4 2 4 2 4
3 3 6 dd 3 6 3 6 3 6
4 4 8 ee 4 8 4 8 4 8
5 5 10 ff 5 10 5 10 5 10
6 6 12 gg 6 12 6 12 6 12
I want 'name' and 'c' and first and second positions but order for other columns don't matter. I would like to use
cols = ['name' , 'c']
col_position = [1 , 2]
How can I re-order data frame using list cols and col_position?
How can i set datatype as str for cols and float for other columns ?
Thanks in advance
I think need:
df1 = df[cols + np.setdiff1d(df.columns, cols).tolist()]
print (df1)
name c d e f g g.1 h h.1
0 aa 0 0 0 0 0 0 0 0
1 bb 1 2 1 2 1 1 2 2
2 cc 2 4 2 4 2 2 4 4
3 dd 3 6 3 6 3 3 6 6
4 ee 4 8 4 8 4 4 8 8
5 ff 5 10 5 10 5 5 10 10
6 gg 6 12 6 12 6 6 12 12
And:
c1 = df.columns[col_position].tolist()
df1 = df[c1 + np.setdiff1d(df.columns, c1).tolist()]
print (df1)
d name c e f g g.1 h h.1
0 0 aa 0 0 0 0 0 0 0
1 2 bb 1 1 2 1 1 2 2
2 4 cc 2 2 4 2 2 4 4
3 6 dd 3 3 6 3 3 6 6
4 8 ee 4 4 8 4 4 8 8
5 10 ff 5 5 10 5 5 10 10
6 12 gg 6 6 12 6 6 12 12
Alternative with select by positions:
c1 = np.arange(len(df.columns))
df1 = df.iloc[:, col_position + np.setdiff1d(c1, col_position).tolist()]
print (df1)
d name c e f g h g.1 h.1
0 0 aa 0 0 0 0 0 0 0
1 2 bb 1 1 2 1 2 1 2
2 4 cc 2 2 4 2 4 2 4
3 6 dd 3 3 6 3 6 3 6
4 8 ee 4 4 8 4 8 4 8
5 10 ff 5 5 10 5 10 5 10
6 12 gg 6 6 12 6 12 6 12
Construct a list to slice by
cols = ['name', 'c']
df[cols + df.columns.difference(cols).tolist()]
name c d e f g g.1 h h.1
0 aa 0 0 0 0 0 0 0 0
1 bb 1 2 1 2 1 1 2 2
2 cc 2 4 2 4 2 2 4 4
3 dd 3 6 3 6 3 3 6 6
4 ee 4 8 4 8 4 4 8 8
5 ff 5 10 5 10 5 5 10 10
6 gg 6 12 6 12 6 6 12 12
Slice, drop, and join
cols = ['name', 'c']
df[cols].join(df.drop(cols, 1))
name c d e f g h g.1 h.1
0 aa 0 0 0 0 0 0 0 0
1 bb 1 2 1 2 1 2 1 2
2 cc 2 4 2 4 2 4 2 4
3 dd 3 6 3 6 3 6 3 6
4 ee 4 8 4 8 4 8 4 8
5 ff 5 10 5 10 5 10 5 10
6 gg 6 12 6 12 6 12 6 12
Slice, drop, and concat
cols = ['name', 'c']
pd.concat([df[cols], df.drop(cols, 1)], axis=1)
name c d e f g h g.1 h.1
0 aa 0 0 0 0 0 0 0 0
1 bb 1 2 1 2 1 2 1 2
2 cc 2 4 2 4 2 4 2 4
3 dd 3 6 3 6 3 6 3 6
4 ee 4 8 4 8 4 8 4 8
5 ff 5 10 5 10 5 10 5 10
6 gg 6 12 6 12 6 12 6 12
By position with iloc
positions = df.columns.map({'name': 0, 'c': 1}.get).argsort()
df.iloc[:, positions]
name c d e f g h g.1 h.1
0 aa 0 0 0 0 0 0 0 0
1 bb 1 2 1 2 1 2 1 2
2 cc 2 4 2 4 2 4 2 4
3 dd 3 6 3 6 3 6 3 6
4 ee 4 8 4 8 4 8 4 8
5 ff 5 10 5 10 5 10 5 10
6 gg 6 12 6 12 6 12 6 12
Or with a focus on OP's vars
cols = ['name' , 'c']
col_position = [1 , 2]
m = dict(zip(cols, col_position))
positions = df.columns.map(m.get).argsort()
df.iloc[:, positions]
I tried this,
l=df.columns.values
cols = ['name' , 'c']
col_position = [1 , 2]
for u in zip(cols,col_position):
l.remove(u[0])
l.insert(u[1],u[0])
df=df[l]
Related
I have a dataframe generated by pandas, as follows:
NO CODE
1 a
2 a
3 a
4 a
5 a
6 a
7 b
8 b
9 a
10 a
11 a
12 a
13 b
14 a
15 a
16 a
I want to convert the CODE column data to get the NUM column. The encoding rules are as follows:
NO CODE NUM
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 a 1
15 a 2
16 a 3
thank you!
Try:
a_group = df.CODE.eq('a')
df['NUM'] = np.where(a_group,
df.groupby(a_group.ne(a_group.shift()).cumsum())
.CODE.cumcount()+1,
df.CODE)
on
df = pd.DataFrame({'CODE':list('baaaaaabbaaaabbaa')})
yields
CODE NUM
-- ------ -----
0 b b
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 b b
15 a 1
16 a 2
IIUC
s=df.CODE.eq('b').cumsum()
df['NUM']=df.CODE.where(df.CODE.eq('b'),s[~df.CODE.eq('b')].groupby(s).cumcount()+1)
df
Out[514]:
NO CODE NUM
0 1 a 1
1 2 a 2
2 3 a 3
3 4 a 4
4 5 a 5
5 6 a 6
6 7 b b
7 8 b b
8 9 a 1
9 10 a 2
10 11 a 3
11 12 a 4
12 13 b b
13 14 a 1
14 15 a 2
15 16 a 3
Here is code I wrote to generate a dataframe that contains 4 columns
num_rows = 10
df = pd.DataFrame({ 'id_col' : [x+1 for x in range(num_rows)] , 'c1': [randint(0, 9) for x in range(num_rows)], 'c2': [randint(0, 9) for x in range(num_rows)], 'c3': [randint(0, 9) for x in range(num_rows)] })
df
print(df) renders :
id_col c1 c2 c3
0 1 3 1 5
1 2 0 2 4
2 3 1 2 5
3 4 0 5 6
4 5 0 0 1
5 6 6 5 8
6 7 1 6 8
7 8 5 8 8
8 9 1 5 2
9 10 2 9 2
I've set the number or rows to be dynamically generated via the num_rows variable.
How to dynamically generate 1000 columns where each column is prepended by 'c'. So columns c1,c2,c3....c1000 are generated where each columns contains 10 rows ?
For better performance I suggest use for create DataFrame numpy function numpy.random.randint and then change columns names by list comprehension, for new column by position use DataFrame.insert:
np.random.seed(458)
N = 15
M = 10
df = pd.DataFrame(np.random.randint(10, size=(M, N)))
df.columns = ['c{}'.format(x+1) for x in df.columns]
df.insert(0, 'idcol', np.arange(M))
print (df)
idcol c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
0 0 8 2 1 6 2 1 0 9 7 8 0 5 5 6 0
1 1 0 2 5 0 0 2 5 2 9 2 1 0 0 5 0
2 2 5 1 3 5 4 5 3 0 2 1 7 8 9 5 4
3 3 8 7 7 0 1 3 6 7 5 8 8 9 8 5 5
4 4 2 8 1 7 3 7 4 6 0 7 0 9 4 0 4
5 5 9 2 1 6 1 9 5 6 7 4 6 1 7 3 7
6 6 1 9 3 9 7 7 2 7 9 8 2 7 2 5 5
7 7 7 6 6 6 4 2 9 0 6 5 7 0 0 4 9
8 8 6 4 2 1 3 1 7 0 4 3 0 5 4 7 7
9 9 1 3 5 7 2 2 1 5 6 1 9 5 9 6 3
Another solution with numpy.hstack for stack first id column to 2d array:
np.random.seed(458)
arr = np.hstack([np.arange(M)[:, None], np.random.randint(10, size=(M, N))])
df = pd.DataFrame(arr)
df.columns = ['idcol'] + ['c{}'.format(x) for x in df.columns[1:]]
print (df)
idcol c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
0 0 8 2 1 6 2 1 0 9 7 8 0 5 5 6 0
1 1 0 2 5 0 0 2 5 2 9 2 1 0 0 5 0
2 2 5 1 3 5 4 5 3 0 2 1 7 8 9 5 4
3 3 8 7 7 0 1 3 6 7 5 8 8 9 8 5 5
4 4 2 8 1 7 3 7 4 6 0 7 0 9 4 0 4
5 5 9 2 1 6 1 9 5 6 7 4 6 1 7 3 7
6 6 1 9 3 9 7 7 2 7 9 8 2 7 2 5 5
7 7 7 6 6 6 4 2 9 0 6 5 7 0 0 4 9
8 8 6 4 2 1 3 1 7 0 4 3 0 5 4 7 7
9 9 1 3 5 7 2 2 1 5 6 1 9 5 9 6 3
IIUC, use str.format and dict comprehension
num_rows = 10
num_cols = 15
df = pd.DataFrame({ 'c{}'.format(n): [randint(0, 9) for x in range(num_rows)] for n in range(num_cols)},
index=[x+1 for x in range(num_rows)] , )
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
1 1 6 2 1 3 1 8 8 2 0
2 2 6 2 2 5 7 4 1 6 2
3 1 2 6 8 7 5 5 7 2 2
4 5 5 3 3 4 7 8 1 8 6
5 7 2 8 6 5 6 2 0 0 4
6 8 2 4 4 6 3 0 1 0 2
7 5 6 8 5 1 0 4 8 4 7
8 1 5 4 5 2 4 4 6 2 7
9 5 7 7 8 5 0 2 7 3 2
10 4 8 5 3 3 7 5 1 5 1
You can use the np.random.randint to create a full array of random values, f-strings (Python 3.6+) with a list comprehension for column naming, and pd.DataFrame.assign with np.arange for defining "id_col":
import pandas as pd, numpy as np
rows = 10
cols = 5
minval, maxval = 0, 10
df = pd.DataFrame(np.random.randint(minval, maxval, (rows, cols)),
columns=[f'c{i}' for i in range(1, cols+1)])\
.assign(id_col=np.arange(1, num_rows+1))
print(df)
c1 c2 c3 c4 c5 id_col
0 8 4 6 0 8 1
1 8 3 5 9 0 2
2 1 3 3 6 2 3
3 6 4 1 1 7 4
4 3 7 0 9 5 5
5 4 6 8 8 6 6
6 0 3 9 9 7 7
7 0 6 1 2 4 8
8 3 7 1 2 0 9
9 6 6 0 5 8 10
This question already has answers here:
Set values on the diagonal of pandas.DataFrame
(8 answers)
Closed 5 years ago.
I have a Pandas Dataframe question. I have a df with index=column. It looks like below.
df:
DNA Cat2
Item A B C D E F F H I J .......
DNA Item
Cat2 A 812 62 174 0 4 46 46 7 2 15
B 62 427 27 0 0 12 61 2 4 11
C 174 27 174 0 0 13 22 5 2 4
D 0 0 0 0 0 0 0 0 0 0
E 4 0 0 0 130 10 57 33 4 5
F 46 12 13 0 10 187 4 5 0 0
......
Another words, df=df.transpose(). All I want to do is find pandas (or numpy for df.values())function to delete index=column values. My ideal output would be below.
df:
DNA Cat2
Item A B C D E F F H I J .......
DNA Item
Cat2 A 0 62 174 0 4 46 46 7 2 15
B 62 0 27 0 0 12 61 2 4 11
C 174 27 0 0 0 13 22 5 2 4
D 0 0 0 0 0 0 0 0 0 0
E 4 0 0 0 0 10 57 33 4 5
F 46 12 13 0 10 0 4 5 0 0
......
Is there a python function that makes this step very fast? I tried for loop with df.iloc[i,i]=0 but since my dataset is ver big, it takes long time to finish. Thanks in advance!
Setup
np.random.seed([3,1415])
i = pd.MultiIndex.from_product(
[['Cat2'], list('ABCDEFGHIJ')],
names=['DNA', 'Item']
)
a = np.random.randint(5, size=(10, 10))
df = pd.DataFrame(a + a.T + 1, i, i)
df
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 1 6 6 7 7 7 4 4 8 2
B 6 1 3 6 1 6 6 4 8 5
C 6 3 9 8 9 6 7 8 4 9
D 7 6 8 1 6 9 4 5 4 3
E 7 1 9 6 9 7 3 7 2 6
F 7 6 6 9 7 9 3 4 6 6
G 4 6 7 4 3 3 9 4 5 5
H 4 4 8 5 7 4 4 5 4 5
I 8 8 4 4 2 6 5 4 9 7
J 2 5 9 3 6 6 5 5 7 3
Option 1
Simplest way is to multiply by 1 less the identity
df * (1 - np.eye(len(df), dtype=int))
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 0 6 6 7 7 7 4 4 8 2
B 6 0 3 6 1 6 6 4 8 5
C 6 3 0 8 9 6 7 8 4 9
D 7 6 8 0 6 9 4 5 4 3
E 7 1 9 6 0 7 3 7 2 6
F 7 6 6 9 7 0 3 4 6 6
G 4 6 7 4 3 3 0 4 5 5
H 4 4 8 5 7 4 4 0 4 5
I 8 8 4 4 2 6 5 4 0 7
J 2 5 9 3 6 6 5 5 7 0
Option 2
However, we can also use pd.DataFrame.mask with np.eye. Masking is nice because it doesn't have to be numeric and it will still work.
df.mask(np.eye(len(df), dtype=bool), 0)
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 0 6 6 7 7 7 4 4 8 2
B 6 0 3 6 1 6 6 4 8 5
C 6 3 0 8 9 6 7 8 4 9
D 7 6 8 0 6 9 4 5 4 3
E 7 1 9 6 0 7 3 7 2 6
F 7 6 6 9 7 0 3 4 6 6
G 4 6 7 4 3 3 0 4 5 5
H 4 4 8 5 7 4 4 0 4 5
I 8 8 4 4 2 6 5 4 0 7
J 2 5 9 3 6 6 5 5 7 0
Option 3
In the event the columns and indices are not identical, OR the are out of order. We can use equality to tell us where to mask.
d = df.iloc[::-1]
d.mask(d.index == d.columns.values[:, None], 0)
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 J 2 5 9 3 6 6 5 5 7 0
I 8 8 4 4 2 6 5 4 0 7
H 4 4 8 5 7 4 4 0 4 5
G 4 6 7 4 3 3 0 4 5 5
F 7 6 6 9 7 0 3 4 6 6
E 7 1 9 6 0 7 3 7 2 6
D 7 6 8 0 6 9 4 5 4 3
C 6 3 0 8 9 6 7 8 4 9
B 6 0 3 6 1 6 6 4 8 5
A 0 6 6 7 7 7 4 4 8 2
I would like to hold the first value in a column while another column does not equal zero. For Column B, values alternate between -1, 0, 1. For Column C, values equal any integer. The objective is holding the first value of Column C while Column B equals zero. The current DataFrame is as follows:
A B C
1 8 1 9
2 2 1 1
3 3 0 7
4 9 0 8
5 5 0 9
6 6 0 1
7 1 1 9
8 6 1 10
9 3 0 4
10 8 0 8
11 5 0 9
12 6 0 10
The resulting DataFrame should be as follows:
A B C
1 8 1 9
2 2 1 1
3 3 0 7
4 9 0 7
5 5 0 7
6 6 0 7
7 1 1 9
8 6 1 10
9 3 0 4
10 8 0 4
11 5 0 4
12 6 0 4
13 3 1 9
You need first create NaNs by condition in column C and then add values by ffill:
mask = (df['B'].shift().fillna(False)).astype(bool) | (df['B'])
df['C'] = df.loc[mask, 'C']
df['C'] = df['C'].ffill().astype(int)
print (df)
A B C
1 8 1 9
2 2 1 1
3 3 0 7
4 9 0 7
5 5 0 7
6 6 0 7
7 1 1 9
8 6 1 10
9 3 0 4
10 8 0 4
11 5 0 4
12 6 0 4
13 3 1 9
Or use where and if type of all values is integer, add astype:
mask = (df['B'].shift().fillna(False)).astype(bool) | (df['B'])
df['C'] = df['C'].where(mask).ffill().astype(int)
print (df)
A B C
1 8 1 9
2 2 1 1
3 3 0 7
4 9 0 7
5 5 0 7
6 6 0 7
7 1 1 9
8 6 1 10
9 3 0 4
10 8 0 4
11 5 0 4
12 6 0 4
13 3 1 9
Suppose the following pandas dataframe
Wafer_Id v1 v2
0 0 9 6
1 0 7 8
2 0 1 5
3 1 6 6
4 1 0 8
5 1 5 0
6 2 8 8
7 2 2 6
8 2 3 5
9 3 5 1
10 3 5 6
11 3 9 8
I want to group it according to WaferId and I would like to get something like
w
Out[60]:
Wafer_Id v1_1 v1_2 v1_3 v2_1 v2_2 v2_3
0 0 9 7 1 6 ... ...
1 1 6 0 5 6
2 2 8 2 3 8
3 3 5 5 9 1
I think that I can obtain the result with the pivot function but I am not sure of how to do it
Possible solution
oes = pd.DataFrame()
oes['Wafer_Id'] = [0,0,0,1,1,1,2,2,2,3,3,3]
oes['v1'] = np.random.randint(0, 10, 12)
oes['v2'] = np.random.randint(0, 10, 12)
oes['id'] = [0, 1, 2] * 4
oes.pivot(index='Wafer_Id', columns='id')
oes
Out[74]:
Wafer_Id v1 v2 id
0 0 8 7 0
1 0 3 3 1
2 0 8 0 2
3 1 2 5 0
4 1 4 1 1
5 1 8 8 2
6 2 8 6 0
7 2 4 7 1
8 2 4 3 2
9 3 4 6 0
10 3 9 2 1
11 3 7 1 2
oes.pivot(index='Wafer_Id', columns='id')
Out[75]:
v1 v2
id 0 1 2 0 1 2
Wafer_Id
0 8 3 8 7 3 0
1 2 4 8 5 1 8
2 8 4 4 6 7 3
3 4 9 7 6 2 1