Adding a column to a dataframe after every nth column

Adding a column to a dataframe after every nth column - python

I have a dataframe of 9,000 columns and 100 rows. I want to insert a column after every 3rd column such that its value is equal to 50 for all rows.
Existing DataFrame
0 1 2 3 4 5 6 7 8 9....9000
0 a b c d e f g h i j ....x
1 k l m n o p q r s t ....x
.
.
100 u v w x y z aa bb cc....x
Desired DataFrame
0 1 2 3 4 5 6 7 8 9....12000
0 a b c 50 d e f 50 g h i j ....x
1 k l m 50 n o p 50 q r s t ....x
.
.
100 u v w 50 x y z 50 aa bb cc....x

Create new DataFrame by indexing each 3rd column, add .5 for correct sorting and add to original with concat:
df.columns = np.arange(len(df.columns))
df1 = pd.DataFrame(50, index=df.index, columns= df.columns[2::3] + .5)
df2 = pd.concat([df, df1], axis=1).sort_index(axis=1)
df2.columns = np.arange(len(df2.columns))
print (df2)
0 1 2 3 4 5 6 7 8 9 10 11 12
0 a b c 50 d e f 50 g h i 50 j
1 k l m 50 n o p 50 q r s 50 t

Numpy
# How many columns to group
x = 3
# Get the shape of things
a = df.to_numpy()
m, n = a.shape
k = n // x
# Get only a multiple of x columns and reshape
b = a[:, :k * x].reshape(m, k, x)
# Get the other columns missed by b
c = a[:, k * x:]
# array of 50's that we'll append to the last dimension
_50 = np.ones((m, k, 1), np.int64) * 50
# append 50's and reshape back to 2D
d = np.append(b, _50, axis=2).reshape(m, k * (x + 1))
# Create DataFrame while appending the missing bit
pd.DataFrame(np.append(d, c, axis=1))
0 1 2 3 4 5 6 7 8 9 10 11 12
0 a b c 50 d e f 50 g h i 50 j
1 k l m 50 n o p 50 q r s 50 t
Setup
df = pd.DataFrame(np.reshape([*'abcdefghijklmnopqrst'], (2, -1)))

So here is one solution
s=pd.concat([y.assign(new=50) for x, y in df.groupby(np.arange(df.shape[1])//3,axis=1)],axis=1)
s.columns=np.arange(s.shape[1])

Related

pd.Dataframe.update puts the result at the top of the dataframe

Lets say I have two dataframes like this:
n = {'x':['a','b','c','d','e'], 'y':['1','2','3','4','5'],'z':['0','0','0','0','0']}
nf = pd.DataFrame(n)
m = {'x':['b','d','e'], 'z':['10','100','1000']}
mf = pd.DataFrame(n)
I want to update the zeroes in the z column in the nf dataframe with the values from the z column in the mf dataframe only in the rows with keys from the column x
when i call
nf.update(mf)
i get
x y z
b 1 10
d 2 100
e 3 1000
d 4 0
e 5 0
instead of the desired output
x y z
a 1 0
b 2 10
c 3 0
d 4 100
e 5 1000

To answer your problem, you need to match the indexes of both dataframes, here how you can do it :
n = {'x':['a','b','c','d','e'], 'y':['1','2','3','4','5'],'z':['0','0','0','0','0']}
nf = pd.DataFrame(n).set_index('x')
m = {'x':['b','d','e'], 'z':['10','100','1000']}
mf = pd.DataFrame(m).set_index('x')
nf.update(mf)
nf = nf.reset_index()

Sorting a dataframe by another

I have an initial dataframe X:
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 -1 n o p
5 -1 q r s
6 -1 t v à
with many columns and rows (this is a toy example). After applying some Machine Learning procedures, I get back a similar dataframe, but with the -1s changed to 0s or 1s and the rows sorted in a different way; for example:
x y z w
4 1 n o p
0 1 a b c
6 0 t v à
1 1 d e f
2 0 g h i
5 0 q r s
3 0 k l m
How could I do in order to sort the second dataframe as the first one? For example, like
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 1 n o p
5 0 q r s
6 0 t v à

If you can't trust just sorting the indexes (e.g. if the first df's indexes are not sorted, or if you have something other than RangeIndex), just use loc
df2.loc[df.index]
x y z w
0 1 a b c
1 1 d e f
2 0 g h i
3 0 k l m
4 1 n o p
5 0 q r s
6 0 t v à

Use:
df.sort_index(inplace=True)
It restores the order, just by index

pandas apply and applymap functions are taking long time to run on large dataset

I have two functions applied on a dataframe
res = df.apply(lambda x:pd.Series(list(x)))
res = res.applymap(lambda x: x.strip('"') if isinstance(x, str) else x)
{{Update}} Dataframe has got almost 700 000 rows. This is taking much time to run.
How to reduce the running time?
Sample data :
A
----------
0 [1,4,3,c]
1 [t,g,h,j]
2 [d,g,e,w]
3 [f,i,j,h]
4 [m,z,s,e]
5 [q,f,d,s]
output:
A B C D E
-------------------------
0 [1,4,3,c] 1 4 3 c
1 [t,g,h,j] t g h j
2 [d,g,e,w] d g e w
3 [f,i,j,h] f i j h
4 [m,z,s,e] m z s e
5 [q,f,d,s] q f d s
This line of code res = df.apply(lambda x:pd.Series(list(x))) takes items from a list and fill one by one to each column as shown above. There will be almost 38 columns.

I think:
res = df.apply(lambda x:pd.Series(list(x)))
should be changed to:
df1 = pd.DataFrame(df['A'].values.tolist())
print (df1)
0 1 2 3
0 1 4 3 c
1 t g h j
2 d g e w
3 f i j h
4 m z s e
5 q f d s
And second if not mixed columns values - numeric with strings:
cols = res.select_dtypes(object).columns
res[cols] = res[cols].apply(lambda x: x.str.strip('"'))

Random value for each row in pandas data Frame

Hi I have the following data frames:
import numpy as np
import pandas as pd
df = pd.DataFrame()
df['T1'] = ['A','B','C','D','E']
df['T2'] = ['G','H','I','J','K']
df['Match'] = df['T1'] +' Vs '+ df['T2']
Nsims = 5
df1 = pd.DataFrame((pd.np.tile(df,(Nsims,1))))
I created two new columns T1_point and T2_point by summing of five random numbers.
when I do as follow: it gave me the same number for all rows.
Ninit = 5
df1['T1_point'] = np.sum(np.random.uniform(size=Ninit))
df1['T2_point'] = np.sum(np.random.uniform(size=Ninit))
What I wanted to do is that I would like to get different values for each row by using random number.
How could I do that?
Thanks
Zep.

What you are basically asking is for a random number in each row. Just create a list of random numbers then and append them to your dataframe?
import random
df1['RAND'] = [ random.randint(1,10000000) for k in df1.index]
print df1
0 1 RAND
0 A G 6850189
1 B H 3692984
2 C I 8062507
3 D J 6156287
4 E K 7037728
5 A G 7641046
6 B H 1884503
7 C I 7887030
8 D J 4089507
9 E K 4253742
10 A G 8947290
11 B H 8634259
12 C I 7172269
13 D J 4906697
14 E K 7040624
15 A G 4702362
16 B H 5267067
17 C I 3282320
18 D J 6185152
19 E K 9335186
20 A G 3448703
21 B H 6039862
22 C I 9884632
23 D J 4846228
24 E K 5510052

Concatenate strings along the off diagonals

Setup
import pandas as pd
from string import ascii_uppercase
df = pd.DataFrame(np.array(list(ascii_uppercase[:25])).reshape(5, 5))
df
0 1 2 3 4
0 A B C D E
1 F G H I J
2 K L M N O
3 P Q R S T
4 U V W X Y
Question
How do I concatenate the strings along the off diagonals?
Expected Result
0 A
1 FB
2 KGC
3 PLHD
4 UQMIE
5 VRNJ
6 WSO
7 XT
8 Y
dtype: object
What I Tried
df.unstack().groupby(sum).sum()
This works fine. But #Zero's answer is far faster.

You could do
In [1766]: arr = df.values[::-1, :] # or np.flipud(df.values)
In [1767]: N = arr.shape[0]
In [1768]: [''.join(arr.diagonal(i)) for i in range(-N+1, N)]
Out[1768]: ['A', 'FB', 'KGC', 'PLHD', 'UQMIE', 'VRNJ', 'WSO', 'XT', 'Y']
In [1769]: pd.Series([''.join(arr.diagonal(i)) for i in range(-N+1, N)])
Out[1769]:
0 A
1 FB
2 KGC
3 PLHD
4 UQMIE
5 VRNJ
6 WSO
7 XT
8 Y
dtype: object
You may also do arr.diagonal(i).sum() but ''.join is more explicit.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a column to a dataframe after every nth column - python

So here is one solution s=pd.concat([y.assign(new=50) for x, y in df.groupby(np.arange(df.shape[1])//3,axis=1)],axis=1) s.columns=np.arange(s.shape[1])

Related

pd.Dataframe.update puts the result at the top of the dataframe

Sorting a dataframe by another

pandas apply and applymap functions are taking long time to run on large dataset

Random value for each row in pandas data Frame

Concatenate strings along the off diagonals

Categories

Resources