Is there a quick pythonic way to transform this table
index = pd.date_range('2000-1-1', periods=36, freq='M')
df = pd.DataFrame(np.random.randn(36,4), index=index, columns=list('ABCD'))
In[1]: df
Out[1]:
A B C D
2000-01-31 H 1.368795 0.106294 2.108814
2000-02-29 -1.713401 0.557224 0.115956 -0.851140
2000-03-31 -1.454967 -0.791855 -0.461738 -0.410948
2000-04-30 1.688731 -0.216432 -0.690103 -0.319443
2000-05-31 -1.103961 0.181510 -0.600383 -0.164744
2000-06-30 0.216871 -1.018599 0.731617 -0.721986
2000-07-31 0.621375 0.790072 0.967000 1.347533
2000-08-31 0.588970 -0.360169 0.904809 0.606771
...
into this table
2001 2000
12 11 10 9 8 7 6 5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1
A H
B
C
D
Please excuse the missing values. I added the "H" manually. I hope it gets clear what I am looking for.
For easier check, I've created dataframe of the same shape but with integers as values.
The core of the solution is pandas.DataFrame.transpose, but you need to use index.year + index.month as a new index:
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=index, columns=list('ABCD'))
>>> df.set_index(keys=[df.index.year, df.index.month]).transpose()
2000 2001 2002
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
A 0 0 8 7 8 0 7 1 5 1 5 4 2 1 9 5 2 0 5 3 6 4 9 3 5 1 7 3 1 7 6 5 6 8 4 1
B 4 9 9 5 2 0 8 0 9 5 2 7 5 6 3 6 8 8 8 8 0 6 3 7 5 9 6 3 9 7 1 4 7 8 3 3
C 3 2 4 3 1 9 7 6 9 6 8 6 3 5 3 2 2 1 3 1 1 2 8 2 2 6 9 6 1 5 6 5 4 6 7 5
D 8 1 3 9 2 3 8 7 3 2 1 0 1 3 9 1 8 6 4 7 4 6 3 2 9 8 9 9 0 7 4 7 3 6 5 2
Of course, this will not work properly if you have more then one record per year+month. In this case you need to groupby your data first:
>>> i = pd.date_range('2000-1-1', periods=36, freq='W') # weekly index
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=i, columns=list('ABCD'))
>>> df.groupby(by=[df.index.year, df.index.month]).sum().transpose()
2000
1 2 3 4 5 6 7 8 9
A 12 13 15 23 9 21 21 31 7
B 33 24 19 30 15 19 20 7 4
C 20 24 26 24 15 18 29 17 4
D 23 29 14 30 19 12 12 11 5
Related
Hi I have a DataFrame for which I have multiple columns I want to combine into 1 with several other columns that I want to be duplicated. An example dataframe:
df = pd.DataFrame(np.random.randint(10, size=60).reshape(6, 10))
df.columns = ['x1', 'x2', 'x3', 'x4', 'x5', 'y1', 'y2', 'y3', 'y4', 'y5']
x1 x2 x3 x4 x5 y1 y2 y3 y4 y5
0 2 6 9 4 3 8 6 1 0 7
1 1 4 8 7 3 0 5 7 3 1
2 6 7 4 8 1 5 7 7 8 5
3 6 3 4 8 0 8 7 2 3 8
4 8 5 6 1 6 3 2 1 1 4
5 1 3 7 5 1 6 5 3 8 5
I would like a nice way to produce the following DataFrame:
x1 x2 x3 x4 x5 y
0 2 6 9 4 3 8
1 1 4 8 7 3 0
2 6 7 4 8 1 5
3 6 3 4 8 0 8
4 8 5 6 1 6 3
5 1 3 7 5 1 6
6 2 6 9 4 3 6
7 1 4 8 7 3 5
8 6 7 4 8 1 7
9 6 3 4 8 0 7
10 8 5 6 1 6 2
11 1 3 7 5 1 5
12 2 6 9 4 3 1
13 1 4 8 7 3 7
14 6 7 4 8 1 7
15 6 3 4 8 0 2
16 8 5 6 1 6 1
17 1 3 7 5 1 3
18 2 6 9 4 3 0
19 1 4 8 7 3 3
20 6 7 4 8 1 8
21 6 3 4 8 0 3
22 8 5 6 1 6 1
23 1 3 7 5 1 8
24 2 6 9 4 3 7
25 1 4 8 7 3 1
26 6 7 4 8 1 5
27 6 3 4 8 0 8
28 8 5 6 1 6 4
29 1 3 7 5 1 5
Is there a nice way to produce this DataFrame with Pandas functions or is it more complicated?
Thanks
You can do this with df.melt().
df.melt(
id_vars = ['x1','x2','x3','x4','x5'],
value_vars = ['y1','y2','y3','y4','y5'],
value_name = 'y'
).drop(columns='variable')
df.melt() will have the column called variable that has the value for which column it originally came from (so is that row coming from y1, y2, etc), so you want to drop that as you see above.
I have a dataframe generated by pandas, as follows:
NO CODE
1 a
2 a
3 a
4 a
5 a
6 a
7 b
8 b
9 a
10 a
11 a
12 a
13 b
14 a
15 a
16 a
I want to convert the CODE column data to get the NUM column. The encoding rules are as follows:
NO CODE NUM
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 a 1
15 a 2
16 a 3
thank you!
Try:
a_group = df.CODE.eq('a')
df['NUM'] = np.where(a_group,
df.groupby(a_group.ne(a_group.shift()).cumsum())
.CODE.cumcount()+1,
df.CODE)
on
df = pd.DataFrame({'CODE':list('baaaaaabbaaaabbaa')})
yields
CODE NUM
-- ------ -----
0 b b
1 a 1
2 a 2
3 a 3
4 a 4
5 a 5
6 a 6
7 b b
8 b b
9 a 1
10 a 2
11 a 3
12 a 4
13 b b
14 b b
15 a 1
16 a 2
IIUC
s=df.CODE.eq('b').cumsum()
df['NUM']=df.CODE.where(df.CODE.eq('b'),s[~df.CODE.eq('b')].groupby(s).cumcount()+1)
df
Out[514]:
NO CODE NUM
0 1 a 1
1 2 a 2
2 3 a 3
3 4 a 4
4 5 a 5
5 6 a 6
6 7 b b
7 8 b b
8 9 a 1
9 10 a 2
10 11 a 3
11 12 a 4
12 13 b b
13 14 a 1
14 15 a 2
15 16 a 3
Assume I have a dataframe like this, for example:
0 1 2 3 4 5 6 7 8 9
0 8 9 2 1 6 2 6 8 6 3
1 1 1 8 3 1 6 3 6 3 9
2 1 4 3 5 9 3 5 9 2 3
3 4 6 3 8 4 3 1 5 1 1
4 1 8 5 3 9 6 1 7 2 2
5 6 6 7 9 1 8 2 3 2 8
6 8 3 6 9 9 5 8 4 7 7
7 8 3 3 8 7 1 4 9 7 2
8 7 6 1 4 8 1 6 9 6 6
9 3 3 2 4 8 1 8 1 1 8
10 7 7 5 7 1 4 1 8 8 6
11 6 3 2 7 6 5 7 4 8 7
I would like to put rows to certain "blocks" of given length and the flatten them to single rows. So for example, if the block length would be 3, the result here would be:
0 1 2 3 4 5 6 7 8 9 10 ... 19 20 21 22 23 24 25 26 27 28 29
2 8 9 2 1 6 2 6 8 6 3 1 ... 9 1 4 3 5 9 3 5 9 2 3
5 4 6 3 8 4 3 1 5 1 1 1 ... 2 6 6 7 9 1 8 2 3 2 8
8 8 3 6 9 9 5 8 4 7 7 8 ... 2 7 6 1 4 8 1 6 9 6 6
11 3 3 2 4 8 1 8 1 1 8 7 ... 6 6 3 2 7 6 5 7 4 8 7
How to achieve this?
I think need reshape:
n_blocks =3
df = pd.DataFrame(df.values.reshape(-1, n_blocks *df.shape[1]))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 \
0 8 9 2 1 6 2 6 8 6 3 ... 1 4 3 5 9 3 5 9
1 4 6 3 8 4 3 1 5 1 1 ... 6 6 7 9 1 8 2 3
2 8 3 6 9 9 5 8 4 7 7 ... 7 6 1 4 8 1 6 9
3 3 3 2 4 8 1 8 1 1 8 ... 6 3 2 7 6 5 7 4
28 29
0 2 3
1 2 8
2 6 6
3 8 7
[4 rows x 30 columns]
I found this solution, maybe someone comes up with a better one:
def toBlocks(df, blocklen):
shifted = [df.shift(periods=p) for p in range(blocklen)]
return pd.concat(shifted, axis=1)[blocklen-1:]
This question already has answers here:
Set values on the diagonal of pandas.DataFrame
(8 answers)
Closed 5 years ago.
I have a Pandas Dataframe question. I have a df with index=column. It looks like below.
df:
DNA Cat2
Item A B C D E F F H I J .......
DNA Item
Cat2 A 812 62 174 0 4 46 46 7 2 15
B 62 427 27 0 0 12 61 2 4 11
C 174 27 174 0 0 13 22 5 2 4
D 0 0 0 0 0 0 0 0 0 0
E 4 0 0 0 130 10 57 33 4 5
F 46 12 13 0 10 187 4 5 0 0
......
Another words, df=df.transpose(). All I want to do is find pandas (or numpy for df.values())function to delete index=column values. My ideal output would be below.
df:
DNA Cat2
Item A B C D E F F H I J .......
DNA Item
Cat2 A 0 62 174 0 4 46 46 7 2 15
B 62 0 27 0 0 12 61 2 4 11
C 174 27 0 0 0 13 22 5 2 4
D 0 0 0 0 0 0 0 0 0 0
E 4 0 0 0 0 10 57 33 4 5
F 46 12 13 0 10 0 4 5 0 0
......
Is there a python function that makes this step very fast? I tried for loop with df.iloc[i,i]=0 but since my dataset is ver big, it takes long time to finish. Thanks in advance!
Setup
np.random.seed([3,1415])
i = pd.MultiIndex.from_product(
[['Cat2'], list('ABCDEFGHIJ')],
names=['DNA', 'Item']
)
a = np.random.randint(5, size=(10, 10))
df = pd.DataFrame(a + a.T + 1, i, i)
df
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 1 6 6 7 7 7 4 4 8 2
B 6 1 3 6 1 6 6 4 8 5
C 6 3 9 8 9 6 7 8 4 9
D 7 6 8 1 6 9 4 5 4 3
E 7 1 9 6 9 7 3 7 2 6
F 7 6 6 9 7 9 3 4 6 6
G 4 6 7 4 3 3 9 4 5 5
H 4 4 8 5 7 4 4 5 4 5
I 8 8 4 4 2 6 5 4 9 7
J 2 5 9 3 6 6 5 5 7 3
Option 1
Simplest way is to multiply by 1 less the identity
df * (1 - np.eye(len(df), dtype=int))
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 0 6 6 7 7 7 4 4 8 2
B 6 0 3 6 1 6 6 4 8 5
C 6 3 0 8 9 6 7 8 4 9
D 7 6 8 0 6 9 4 5 4 3
E 7 1 9 6 0 7 3 7 2 6
F 7 6 6 9 7 0 3 4 6 6
G 4 6 7 4 3 3 0 4 5 5
H 4 4 8 5 7 4 4 0 4 5
I 8 8 4 4 2 6 5 4 0 7
J 2 5 9 3 6 6 5 5 7 0
Option 2
However, we can also use pd.DataFrame.mask with np.eye. Masking is nice because it doesn't have to be numeric and it will still work.
df.mask(np.eye(len(df), dtype=bool), 0)
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 A 0 6 6 7 7 7 4 4 8 2
B 6 0 3 6 1 6 6 4 8 5
C 6 3 0 8 9 6 7 8 4 9
D 7 6 8 0 6 9 4 5 4 3
E 7 1 9 6 0 7 3 7 2 6
F 7 6 6 9 7 0 3 4 6 6
G 4 6 7 4 3 3 0 4 5 5
H 4 4 8 5 7 4 4 0 4 5
I 8 8 4 4 2 6 5 4 0 7
J 2 5 9 3 6 6 5 5 7 0
Option 3
In the event the columns and indices are not identical, OR the are out of order. We can use equality to tell us where to mask.
d = df.iloc[::-1]
d.mask(d.index == d.columns.values[:, None], 0)
DNA Cat2
Item A B C D E F G H I J
DNA Item
Cat2 J 2 5 9 3 6 6 5 5 7 0
I 8 8 4 4 2 6 5 4 0 7
H 4 4 8 5 7 4 4 0 4 5
G 4 6 7 4 3 3 0 4 5 5
F 7 6 6 9 7 0 3 4 6 6
E 7 1 9 6 0 7 3 7 2 6
D 7 6 8 0 6 9 4 5 4 3
C 6 3 0 8 9 6 7 8 4 9
B 6 0 3 6 1 6 6 4 8 5
A 0 6 6 7 7 7 4 4 8 2
I know this's a very basic question, but I could not find the answer in Google.
I have a dataset started from 8am and I want to rearrange the dataset to let it starts from other time.
A example dateset is like this:
df = pd.Series([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
I want to rearrange the data to make it like from index 9->14 and then index 0->8.
How could I get it?
Desired output:
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9
pd.concat((df[9:], df[:9]))
Out:
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64
Replace 9 with your cutpoint.
You can also just use iloc.
df.iloc[list(range(10, 15)) + list(range(0, 10))]
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
dtype: int64
I think you need reindex with numpy.r_ for concanecate indices:
print (np.r_[9:len(df.index), 0:9])
[ 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8]
print (df.reindex(np.r_[9:len(df.index), 0:9]))
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64
Also working:
print (df.loc[np.r_[9:len(df.index), 0:9]])
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64