How to rearrange rows by index range in python - python

I know this's a very basic question, but I could not find the answer in Google.
I have a dataset started from 8am and I want to rearrange the dataset to let it starts from other time.
A example dateset is like this:
df = pd.Series([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
I want to rearrange the data to make it like from index 9->14 and then index 0->8.
How could I get it?
Desired output:
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9

pd.concat((df[9:], df[:9]))
Out:
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64
Replace 9 with your cutpoint.

You can also just use iloc.
df.iloc[list(range(10, 15)) + list(range(0, 10))]
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
dtype: int64

I think you need reindex with numpy.r_ for concanecate indices:
print (np.r_[9:len(df.index), 0:9])
[ 9 10 11 12 13 14 0 1 2 3 4 5 6 7 8]
print (df.reindex(np.r_[9:len(df.index), 0:9]))
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64
Also working:
print (df.loc[np.r_[9:len(df.index), 0:9]])
9 10
10 11
11 12
12 13
13 14
14 15
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
dtype: int64

Related

How to copy the current row and the next row value in a new dataframe using python?

The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
5 6 12
6 8 10
7 11 12
8 9 67
I want to create a new df with the occurence of 8 in 'B' and the next row value of 8.
New df:
The df looks like below:
A B C
1 8 23
2 8 22
3 8 45
4 9 45
6 8 10
7 11 12
Use boolean indexing with compared by shifted values with | for bitwise OR:
df = df[df.B.shift().eq(8) | df.B.eq(8)]
print (df)
A B C
0 1 8 23
1 2 8 22
2 3 8 45
3 4 9 45
5 6 8 10
6 7 11 12

How to merge blocks of rows to single rows in pandas?

Assume I have a dataframe like this, for example:
0 1 2 3 4 5 6 7 8 9
0 8 9 2 1 6 2 6 8 6 3
1 1 1 8 3 1 6 3 6 3 9
2 1 4 3 5 9 3 5 9 2 3
3 4 6 3 8 4 3 1 5 1 1
4 1 8 5 3 9 6 1 7 2 2
5 6 6 7 9 1 8 2 3 2 8
6 8 3 6 9 9 5 8 4 7 7
7 8 3 3 8 7 1 4 9 7 2
8 7 6 1 4 8 1 6 9 6 6
9 3 3 2 4 8 1 8 1 1 8
10 7 7 5 7 1 4 1 8 8 6
11 6 3 2 7 6 5 7 4 8 7
I would like to put rows to certain "blocks" of given length and the flatten them to single rows. So for example, if the block length would be 3, the result here would be:
0 1 2 3 4 5 6 7 8 9 10 ... 19 20 21 22 23 24 25 26 27 28 29
2 8 9 2 1 6 2 6 8 6 3 1 ... 9 1 4 3 5 9 3 5 9 2 3
5 4 6 3 8 4 3 1 5 1 1 1 ... 2 6 6 7 9 1 8 2 3 2 8
8 8 3 6 9 9 5 8 4 7 7 8 ... 2 7 6 1 4 8 1 6 9 6 6
11 3 3 2 4 8 1 8 1 1 8 7 ... 6 6 3 2 7 6 5 7 4 8 7
How to achieve this?
I think need reshape:
n_blocks =3
df = pd.DataFrame(df.values.reshape(-1, n_blocks *df.shape[1]))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 \
0 8 9 2 1 6 2 6 8 6 3 ... 1 4 3 5 9 3 5 9
1 4 6 3 8 4 3 1 5 1 1 ... 6 6 7 9 1 8 2 3
2 8 3 6 9 9 5 8 4 7 7 ... 7 6 1 4 8 1 6 9
3 3 3 2 4 8 1 8 1 1 8 ... 6 3 2 7 6 5 7 4
28 29
0 2 3
1 2 8
2 6 6
3 8 7
[4 rows x 30 columns]
I found this solution, maybe someone comes up with a better one:
def toBlocks(df, blocklen):
shifted = [df.shift(periods=p) for p in range(blocklen)]
return pd.concat(shifted, axis=1)[blocklen-1:]

islice and cycle with multiple levels

UPDATE:
Added the pattern required as asked
I have 2 lists and the expected output is different than the last time
Numberset1 = [10,11,12]
Numberset2 = [1,2,3,4,5]
and i want to display output by manipulating the lists, the expected output is
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 2 2
11 2 3
11 2 4
11 2 5
11 2 1
11 3 3
11 3 4
11 3 5
11 3 1
11 3 2
11 4 4
11 4 5
11 4 1
11 4 2
11 4 3
11 5 5
11 5 1
11 5 2
11 5 3
11 5 4
11 5 1
11 1 1
11 1 2
11 1 3
11 1 4
11 1 5
12 3 3
12 3 4
12 3 5
12 3 1
12 3 2
12 4 4
12 4 5
12 4 1
12 4 2
12 4 3
12 4 4
12 4 5
12 5 5
12 5 1
12 5 2
12 5 3
12 1 1
12 1 2
12 1 3
12 1 4
12 1 5
12 2 2
12 2 3
12 2 4
12 2 5
12 2 1
The code i have tried is as follows, this was suggested in previous question and i tried using it for the next level of looping but i could not get the desired output
Numberset1 = [10,11,12]
Numberset2 = [1,2,3,4,5]
from itertools import cycle, islice
it = cycle(Numberset2)
for i in Numberset1:
for a in Numberset2:
for j in islice(it, len(Numberset2)):
print(i, a,j)
skipped1 = next(it)
skipped1 = next(it)
The output i am getting is
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 1 2
11 1 3
11 1 4
11 1 5
11 1 1
11 2 3
11 2 4
11 2 5
11 2 1
11 2 2
11 3 4
11 3 5
11 3 1
11 3 2
11 3 3
11 4 5
11 4 1
11 4 2
11 4 3
11 4 4
11 5 1
11 5 2
11 5 3
11 5 4
11 5 5
12 1 3
12 1 4
12 1 5
12 1 1
12 1 2
12 2 4
12 2 5
12 2 1
12 2 2
12 2 3
12 3 5
12 3 1
12 3 2
12 3 3
12 3 4
12 4 1
12 4 2
12 4 3
12 4 4
12 4 5
12 5 2
12 5 3
12 5 4
12 5 5
12 5 1
Please note the change when the number 11 starts in the first column than the expected output
How can we use cycle and islice for multiple levels
Pattern:
The first column should be in order of numbers in Numberset1, the second column for first number in Numberset1 should be in order of numbers in Numberset2, the 3rd column for first number in Numberset1 should be in order of numbers in NUmberset2 but when the 2nd column for first number in Numberset1 changes it should also change and print from 2ndnumber in Numberset2 list and so on
Here's a version that accomplishes the task using cycle and islice. To make the code cleaner I've created a generator function aligned_cycle which cycles through the items yielded by cycle until we get the one we want to start the current cycle with.
This updated version can cope with Numberset1 having greater length than Numberset2.
from itertools import cycle, islice
def aligned_cycle(seq, start_item):
''' Make a generator that cycles over the items in `seq`.
The first item yielded equals `start_item`.
'''
if start_item not in seq:
raise ValueError("{} not in {}".format(start_item, seq))
it = cycle(seq)
for u in it:
if u == start_item:
break
yield u
yield from it
Numberset1 = [10, 11, 12]
Numberset2 = [1, 2, 3, 4, 5]
cycle_length = len(Numberset2)
for i, u in zip(Numberset1, cycle(Numberset2)):
for j in islice(aligned_cycle(Numberset2, u), cycle_length):
for k in islice(aligned_cycle(Numberset2, j), cycle_length):
print(i, j, k)
output
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 2 2
11 2 3
11 2 4
11 2 5
11 2 1
11 3 3
11 3 4
11 3 5
11 3 1
11 3 2
11 4 4
11 4 5
11 4 1
11 4 2
11 4 3
11 5 5
11 5 1
11 5 2
11 5 3
11 5 4
11 1 1
11 1 2
11 1 3
11 1 4
11 1 5
12 3 3
12 3 4
12 3 5
12 3 1
12 3 2
12 4 4
12 4 5
12 4 1
12 4 2
12 4 3
12 5 5
12 5 1
12 5 2
12 5 3
12 5 4
12 1 1
12 1 2
12 1 3
12 1 4
12 1 5
12 2 2
12 2 3
12 2 4
12 2 5
12 2 1
Jon Clements has written a more robust and more efficient version of aligned_cycle:
def aligned_cycle(iterable, start_item):
a, b = tee(iterable)
b = cycle(b)
for u, v in zip(a, b):
if u == start_item:
break
else:
return
yield u
yield from b
Thanks, Jon!

Reshaping dataframe in Pandas

Is there a quick pythonic way to transform this table
index = pd.date_range('2000-1-1', periods=36, freq='M')
df = pd.DataFrame(np.random.randn(36,4), index=index, columns=list('ABCD'))
In[1]: df
Out[1]:
A B C D
2000-01-31 H 1.368795 0.106294 2.108814
2000-02-29 -1.713401 0.557224 0.115956 -0.851140
2000-03-31 -1.454967 -0.791855 -0.461738 -0.410948
2000-04-30 1.688731 -0.216432 -0.690103 -0.319443
2000-05-31 -1.103961 0.181510 -0.600383 -0.164744
2000-06-30 0.216871 -1.018599 0.731617 -0.721986
2000-07-31 0.621375 0.790072 0.967000 1.347533
2000-08-31 0.588970 -0.360169 0.904809 0.606771
...
into this table
2001 2000
12 11 10 9 8 7 6 5 4 3 2 1 12 11 10 9 8 7 6 5 4 3 2 1
A H
B
C
D
Please excuse the missing values. I added the "H" manually. I hope it gets clear what I am looking for.
For easier check, I've created dataframe of the same shape but with integers as values.
The core of the solution is pandas.DataFrame.transpose, but you need to use index.year + index.month as a new index:
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=index, columns=list('ABCD'))
>>> df.set_index(keys=[df.index.year, df.index.month]).transpose()
2000 2001 2002
1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12 1 2 3 4 5 6 7 8 9 10 11 12
A 0 0 8 7 8 0 7 1 5 1 5 4 2 1 9 5 2 0 5 3 6 4 9 3 5 1 7 3 1 7 6 5 6 8 4 1
B 4 9 9 5 2 0 8 0 9 5 2 7 5 6 3 6 8 8 8 8 0 6 3 7 5 9 6 3 9 7 1 4 7 8 3 3
C 3 2 4 3 1 9 7 6 9 6 8 6 3 5 3 2 2 1 3 1 1 2 8 2 2 6 9 6 1 5 6 5 4 6 7 5
D 8 1 3 9 2 3 8 7 3 2 1 0 1 3 9 1 8 6 4 7 4 6 3 2 9 8 9 9 0 7 4 7 3 6 5 2
Of course, this will not work properly if you have more then one record per year+month. In this case you need to groupby your data first:
>>> i = pd.date_range('2000-1-1', periods=36, freq='W') # weekly index
>>> df = pd.DataFrame(np.random.randint(10,size=(36, 4)), index=i, columns=list('ABCD'))
>>> df.groupby(by=[df.index.year, df.index.month]).sum().transpose()
2000
1 2 3 4 5 6 7 8 9
A 12 13 15 23 9 21 21 31 7
B 33 24 19 30 15 19 20 7 4
C 20 24 26 24 15 18 29 17 4
D 23 29 14 30 19 12 12 11 5

create categorical variables by condition in python with pandas or statsmodels

I want to create categorical variables from my data with this method:
cat.var condition
1 x > 10
2 x == 10
3 x < 10
I try using C() method from patsy , but it doesn't work, I know in stata I have to use code below, but after searching I didn't find any clean way to do this in pyhton:
generate mpg3 = .
(74 missing values generated)
replace mpg3 = 1 if (mpg <= 18)
(27 real changes made)
replace mpg3 = 2 if (mpg >= 19) & (mpg <=23)
(24 real changes made)
replace mpg3 = 3 if (mpg >= 24) & (mpg <.)
(23 real changes made
you can do it this way (we will do it just for column: a):
In [36]: df
Out[36]:
a b c
0 10 12 6
1 12 8 8
2 10 5 8
3 14 7 7
4 7 12 11
5 14 11 8
6 7 7 14
7 11 9 11
8 5 14 9
9 9 12 9
10 7 8 8
11 13 9 8
12 13 14 6
13 9 7 13
14 12 7 5
15 6 9 8
16 6 12 12
17 7 12 13
18 7 7 6
19 8 13 9
df.a[df.a < 10] = 3
df.a[df.a == 10] = 2
df.a[df.a > 10] = 1
In [40]: df
Out[40]:
a b c
0 2 12 6
1 1 8 8
2 2 5 8
3 1 7 7
4 3 12 11
5 1 11 8
6 3 7 14
7 1 9 11
8 3 14 9
9 3 12 9
10 3 8 8
11 1 9 8
12 1 14 6
13 3 7 13
14 1 7 5
15 3 9 8
16 3 12 12
17 3 12 13
18 3 7 6
19 3 13 9
In [41]: df.a = df.a.astype('category')
In [42]: df.dtypes
Out[42]:
a category
b int32
c int32
dtype: object
I'm using this df as a sample.
>>> df
A
0 3
1 13
2 10
3 31
You could use .ix like this:
df['CAT'] = [np.nan for i in range(len(df.index))]
df.ix[df.A > 10, 'CAT'] = 1
df.ix[df.A == 10, 'CAT'] = 2
df.ix[df.A < 10, 'CAT'] = 3
Or define a function to do the job, like this:
def do_the_job(x):
ret = 3
if (x > 10):
ret = 1
elif (x == 10):
ret = 2
return ret
and finally run this over the right Series in your df, like this:
>> df['CAT'] = df.A.apply(do_the_job)
>> df
A CAT
0 3 3
1 13 1
2 10 2
3 31 1
I hope this help!

Categories

Resources