Combine Multiple Pandas columns into a Single Column - python

Hi I have a DataFrame for which I have multiple columns I want to combine into 1 with several other columns that I want to be duplicated. An example dataframe:
df = pd.DataFrame(np.random.randint(10, size=60).reshape(6, 10))
df.columns = ['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5']
x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
0 2 6 9 4 3 8 6 1 0 7
1 1 4 8 7 3 0 5 7 3 1
2 6 7 4 8 1 5 7 7 8 5
3 6 3 4 8 0 8 7 2 3 8
4 8 5 6 1 6 3 2 1 1 4
5 1 3 7 5 1 6 5 3 8 5
I would like a nice way to produce the following DataFrame:
x1 x2 x3 x4 x5 y
0 2 6 9 4 3 8
1 1 4 8 7 3 0
2 6 7 4 8 1 5
3 6 3 4 8 0 8
4 8 5 6 1 6 3
5 1 3 7 5 1 6
6 2 6 9 4 3 6
7 1 4 8 7 3 5
8 6 7 4 8 1 7
9 6 3 4 8 0 7
10 8 5 6 1 6 2
11 1 3 7 5 1 5
12 2 6 9 4 3 1
13 1 4 8 7 3 7
14 6 7 4 8 1 7
15 6 3 4 8 0 2
16 8 5 6 1 6 1
17 1 3 7 5 1 3
18 2 6 9 4 3 0
19 1 4 8 7 3 3
20 6 7 4 8 1 8
21 6 3 4 8 0 3
22 8 5 6 1 6 1
23 1 3 7 5 1 8
24 2 6 9 4 3 7
25 1 4 8 7 3 1
26 6 7 4 8 1 5
27 6 3 4 8 0 8
28 8 5 6 1 6 4
29 1 3 7 5 1 5
Is there a nice way to produce this DataFrame with Pandas functions or is it more complicated?
Thanks

You can do this with df.melt().
df.melt(
id_vars = ['x1','x2','x3','x4','x5'],
value_vars = ['y1','y2','y3','y4','y5'],
value_name = 'y'
).drop(columns='variable')
df.melt() will have the column called variable that has the value for which column it originally came from (so is that row coming from y1, y2, etc), so you want to drop that as you see above.

Related

Generating new column w value 1 to n with n depending on another column in Pandas [duplicate]

This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 9 months ago.
Suppose I have the following dataframe
import pandas as pd
df = pd.DataFrame({'a': [1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4],
'b': [3,4,3,7,5,9,4,2,5,6,7,8,4,2,4,5,8,0]})
a b
0 1 3
1 1 4
2 1 3
3 2 7
4 2 5
5 2 9
6 2 4
7 2 2
8 3 5
9 3 6
10 3 7
11 3 8
12 4 4
13 4 2
14 4 4
15 4 5
16 4 8
17 4 0
And I would like to make a new column c with values 1 to n where n depends on the value of column a as follow:
a b c
0 1 3 1
1 1 4 2
2 1 3 3
3 2 7 1
4 2 5 2
5 2 9 3
6 2 4 4
7 2 2 5
8 3 5 1
9 3 6 2
10 3 7 3
11 3 8 4
12 4 4 1
13 4 2 2
14 4 4 3
15 4 5 4
16 4 8 5
17 4 0 6
While I can write it using a for loop, my data frame is huge and it's computationally costly, is there any efficient to generate such column? Thanks.
Use groupby_cumcount:
df['c'] = df.groupby('a').cumcount().add(1)
print(df)
# Output
a b c
0 1 3 1
1 1 4 2
2 1 3 3
3 2 7 1
4 2 5 2
5 2 9 3
6 2 4 4
7 2 2 5
8 3 5 1
9 3 6 2
10 3 7 3
11 3 8 4
12 4 4 1
13 4 2 2
14 4 4 3
15 4 5 4
16 4 8 5
17 4 0 6

Dynamic pandas dataframe generation

Here is code I wrote to generate a dataframe that contains 4 columns
num_rows = 10
df = pd.DataFrame({ 'id_col' : [x+1 for x in range(num_rows)] , 'c1': [randint(0, 9) for x in range(num_rows)], 'c2': [randint(0, 9) for x in range(num_rows)], 'c3': [randint(0, 9) for x in range(num_rows)] })
df
print(df) renders :
id_col c1 c2 c3
0 1 3 1 5
1 2 0 2 4
2 3 1 2 5
3 4 0 5 6
4 5 0 0 1
5 6 6 5 8
6 7 1 6 8
7 8 5 8 8
8 9 1 5 2
9 10 2 9 2
I've set the number or rows to be dynamically generated via the num_rows variable.
How to dynamically generate 1000 columns where each column is prepended by 'c'. So columns c1,c2,c3....c1000 are generated where each columns contains 10 rows ?
For better performance I suggest use for create DataFrame numpy function numpy.random.randint and then change columns names by list comprehension, for new column by position use DataFrame.insert:
np.random.seed(458)
N = 15
M = 10
df = pd.DataFrame(np.random.randint(10, size=(M, N)))
df.columns = ['c{}'.format(x+1) for x in df.columns]
df.insert(0, 'idcol', np.arange(M))
print (df)
idcol c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
0 0 8 2 1 6 2 1 0 9 7 8 0 5 5 6 0
1 1 0 2 5 0 0 2 5 2 9 2 1 0 0 5 0
2 2 5 1 3 5 4 5 3 0 2 1 7 8 9 5 4
3 3 8 7 7 0 1 3 6 7 5 8 8 9 8 5 5
4 4 2 8 1 7 3 7 4 6 0 7 0 9 4 0 4
5 5 9 2 1 6 1 9 5 6 7 4 6 1 7 3 7
6 6 1 9 3 9 7 7 2 7 9 8 2 7 2 5 5
7 7 7 6 6 6 4 2 9 0 6 5 7 0 0 4 9
8 8 6 4 2 1 3 1 7 0 4 3 0 5 4 7 7
9 9 1 3 5 7 2 2 1 5 6 1 9 5 9 6 3
Another solution with numpy.hstack for stack first id column to 2d array:
np.random.seed(458)
arr = np.hstack([np.arange(M)[:, None], np.random.randint(10, size=(M, N))])
df = pd.DataFrame(arr)
df.columns = ['idcol'] + ['c{}'.format(x) for x in df.columns[1:]]
print (df)
idcol c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11 c12 c13 c14 c15
0 0 8 2 1 6 2 1 0 9 7 8 0 5 5 6 0
1 1 0 2 5 0 0 2 5 2 9 2 1 0 0 5 0
2 2 5 1 3 5 4 5 3 0 2 1 7 8 9 5 4
3 3 8 7 7 0 1 3 6 7 5 8 8 9 8 5 5
4 4 2 8 1 7 3 7 4 6 0 7 0 9 4 0 4
5 5 9 2 1 6 1 9 5 6 7 4 6 1 7 3 7
6 6 1 9 3 9 7 7 2 7 9 8 2 7 2 5 5
7 7 7 6 6 6 4 2 9 0 6 5 7 0 0 4 9
8 8 6 4 2 1 3 1 7 0 4 3 0 5 4 7 7
9 9 1 3 5 7 2 2 1 5 6 1 9 5 9 6 3
IIUC, use str.format and dict comprehension
num_rows = 10
num_cols = 15
df = pd.DataFrame({ 'c{}'.format(n): [randint(0, 9) for x in range(num_rows)] for n in range(num_cols)},
index=[x+1 for x in range(num_rows)] , )
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
1 1 6 2 1 3 1 8 8 2 0
2 2 6 2 2 5 7 4 1 6 2
3 1 2 6 8 7 5 5 7 2 2
4 5 5 3 3 4 7 8 1 8 6
5 7 2 8 6 5 6 2 0 0 4
6 8 2 4 4 6 3 0 1 0 2
7 5 6 8 5 1 0 4 8 4 7
8 1 5 4 5 2 4 4 6 2 7
9 5 7 7 8 5 0 2 7 3 2
10 4 8 5 3 3 7 5 1 5 1
You can use the np.random.randint to create a full array of random values, f-strings (Python 3.6+) with a list comprehension for column naming, and pd.DataFrame.assign with np.arange for defining "id_col":
import pandas as pd, numpy as np
rows = 10
cols = 5
minval, maxval = 0, 10
df = pd.DataFrame(np.random.randint(minval, maxval, (rows, cols)),
columns=[f'c{i}' for i in range(1, cols+1)])\
.assign(id_col=np.arange(1, num_rows+1))
print(df)
c1 c2 c3 c4 c5 id_col
0 8 4 6 0 8 1
1 8 3 5 9 0 2
2 1 3 3 6 2 3
3 6 4 1 1 7 4
4 3 7 0 9 5 5
5 4 6 8 8 6 6
6 0 3 9 9 7 7
7 0 6 1 2 4 8
8 3 7 1 2 0 9
9 6 6 0 5 8 10

How to merge blocks of rows to single rows in pandas?

Assume I have a dataframe like this, for example:
0 1 2 3 4 5 6 7 8 9
0 8 9 2 1 6 2 6 8 6 3
1 1 1 8 3 1 6 3 6 3 9
2 1 4 3 5 9 3 5 9 2 3
3 4 6 3 8 4 3 1 5 1 1
4 1 8 5 3 9 6 1 7 2 2
5 6 6 7 9 1 8 2 3 2 8
6 8 3 6 9 9 5 8 4 7 7
7 8 3 3 8 7 1 4 9 7 2
8 7 6 1 4 8 1 6 9 6 6
9 3 3 2 4 8 1 8 1 1 8
10 7 7 5 7 1 4 1 8 8 6
11 6 3 2 7 6 5 7 4 8 7
I would like to put rows to certain "blocks" of given length and the flatten them to single rows. So for example, if the block length would be 3, the result here would be:
0 1 2 3 4 5 6 7 8 9 10 ... 19 20 21 22 23 24 25 26 27 28 29
2 8 9 2 1 6 2 6 8 6 3 1 ... 9 1 4 3 5 9 3 5 9 2 3
5 4 6 3 8 4 3 1 5 1 1 1 ... 2 6 6 7 9 1 8 2 3 2 8
8 8 3 6 9 9 5 8 4 7 7 8 ... 2 7 6 1 4 8 1 6 9 6 6
11 3 3 2 4 8 1 8 1 1 8 7 ... 6 6 3 2 7 6 5 7 4 8 7
How to achieve this?
I think need reshape:
n_blocks =3
df = pd.DataFrame(df.values.reshape(-1, n_blocks *df.shape[1]))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 \
0 8 9 2 1 6 2 6 8 6 3 ... 1 4 3 5 9 3 5 9
1 4 6 3 8 4 3 1 5 1 1 ... 6 6 7 9 1 8 2 3
2 8 3 6 9 9 5 8 4 7 7 ... 7 6 1 4 8 1 6 9
3 3 3 2 4 8 1 8 1 1 8 ... 6 3 2 7 6 5 7 4
28 29
0 2 3
1 2 8
2 6 6
3 8 7
[4 rows x 30 columns]
I found this solution, maybe someone comes up with a better one:
def toBlocks(df, blocklen):
shifted = [df.shift(periods=p) for p in range(blocklen)]
return pd.concat(shifted, axis=1)[blocklen-1:]

Reshaping dataframe in Pandas

Is there a quick pythonic way to transform this table
index = pd.date_range('2000-1-1', periods=36, freq='M')
df = pd.DataFrame(np.random.randn(36,4), index=index, columns=list('ABCD'))
In[1]: df
Out[1]:
A B C D
2000-01-31 H 1.368795 0.106294 2.108814
2000-02-29 -1.713401 0.557224 0.115956 -0.851140
2000-03-31 -1.454967 -0.791855 -0.461738 -0.410948
2000-04-30 1.688731 -0.216432 -0.690103 -0.319443
2000-05-31 -1.103961 0.181510 -0.600383 -0.164744
2000-06-30 0.216871 -1.018599 0.731617 -0.721986
2000-07-31 0.621375 0.790072 0.967000 1.347533
2000-08-31 0.588970 -0.360169 0.904809 0.606771
...
into this table
2001 2000
12 11 10 9 8 7 6 5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1
A H
B
C
D
Please excuse the missing values. I added the "H" manually. I hope it gets clear what I am looking for.
For easier check, I've created dataframe of the same shape but with integers as values.
The core of the solution is pandas.DataFrame.transpose, but you need to use index.year + index.month as a new index:
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=index, columns=list('ABCD'))
>>> df.set_index(keys=[df.index.year, df.index.month]).transpose()
2000 2001 2002
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
A 0 0 8 7 8 0 7 1 5 1 5 4 2 1 9 5 2 0 5 3 6 4 9 3 5 1 7 3 1 7 6 5 6 8 4 1
B 4 9 9 5 2 0 8 0 9 5 2 7 5 6 3 6 8 8 8 8 0 6 3 7 5 9 6 3 9 7 1 4 7 8 3 3
C 3 2 4 3 1 9 7 6 9 6 8 6 3 5 3 2 2 1 3 1 1 2 8 2 2 6 9 6 1 5 6 5 4 6 7 5
D 8 1 3 9 2 3 8 7 3 2 1 0 1 3 9 1 8 6 4 7 4 6 3 2 9 8 9 9 0 7 4 7 3 6 5 2
Of course, this will not work properly if you have more then one record per year+month. In this case you need to groupby your data first:
>>> i = pd.date_range('2000-1-1', periods=36, freq='W') # weekly index
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=i, columns=list('ABCD'))
>>> df.groupby(by=[df.index.year, df.index.month]).sum().transpose()
2000
1 2 3 4 5 6 7 8 9
A 12 13 15 23 9 21 21 31 7
B 33 24 19 30 15 19 20 7 4
C 20 24 26 24 15 18 29 17 4
D 23 29 14 30 19 12 12 11 5

nested loops results bunched together Python

for j in range(10):
for i in range(10):
print(j,end=" ")
My results are bunched together and I need to have 10 numbers per line. I cant use a print("0123456789"). I have tried print(j,j,j,j,j,j,j,j,j) and I get the results that I'm looking for but I'm sure this isn't the proper way to write the code.
If print(j,j,j,j,j,j,j,j,j) works then you simply need to add another print() after each iteration:
for j in range(10):
for i in range(10):
print(j,end=" ")
print()
Output:
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
Or simply:
for j in range(10):
print(" ".join(str(j) * 10))
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
Why are you using a nested for loop when you can use a single for loop:
for i in range(10):
print('{} '.format(i) * 10)
This is similar to Malik Brahimi's solution, except it doesn't put a space after the last digit on each line:
for i in range(10):
print(' '.join([str(i)]*10))
output
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4
5 5 5 5 5 5 5 5 5 5
6 6 6 6 6 6 6 6 6 6
7 7 7 7 7 7 7 7 7 7
8 8 8 8 8 8 8 8 8 8
9 9 9 9 9 9 9 9 9 9
Just for fun, here's another way to do it with a single loop, this time using a format string with numbered fields.
fmt = ('{0} ' * 10)[:-1]
for i in range(10):
print(fmt.format(i))

Categories

Resources