UPDATE:
Added the pattern required as asked
I have 2 lists and the expected output is different than the last time
Numberset1 = [10,11,12]
Numberset2 = [1,2,3,4,5]
and i want to display output by manipulating the lists, the expected output is
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 2 2
11 2 3
11 2 4
11 2 5
11 2 1
11 3 3
11 3 4
11 3 5
11 3 1
11 3 2
11 4 4
11 4 5
11 4 1
11 4 2
11 4 3
11 5 5
11 5 1
11 5 2
11 5 3
11 5 4
11 5 1
11 1 1
11 1 2
11 1 3
11 1 4
11 1 5
12 3 3
12 3 4
12 3 5
12 3 1
12 3 2
12 4 4
12 4 5
12 4 1
12 4 2
12 4 3
12 4 4
12 4 5
12 5 5
12 5 1
12 5 2
12 5 3
12 1 1
12 1 2
12 1 3
12 1 4
12 1 5
12 2 2
12 2 3
12 2 4
12 2 5
12 2 1
The code i have tried is as follows, this was suggested in previous question and i tried using it for the next level of looping but i could not get the desired output
Numberset1 = [10,11,12]
Numberset2 = [1,2,3,4,5]
from itertools import cycle, islice
it = cycle(Numberset2)
for i in Numberset1:
for a in Numberset2:
for j in islice(it, len(Numberset2)):
print(i, a,j)
skipped1 = next(it)
skipped1 = next(it)
The output i am getting is
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 1 2
11 1 3
11 1 4
11 1 5
11 1 1
11 2 3
11 2 4
11 2 5
11 2 1
11 2 2
11 3 4
11 3 5
11 3 1
11 3 2
11 3 3
11 4 5
11 4 1
11 4 2
11 4 3
11 4 4
11 5 1
11 5 2
11 5 3
11 5 4
11 5 5
12 1 3
12 1 4
12 1 5
12 1 1
12 1 2
12 2 4
12 2 5
12 2 1
12 2 2
12 2 3
12 3 5
12 3 1
12 3 2
12 3 3
12 3 4
12 4 1
12 4 2
12 4 3
12 4 4
12 4 5
12 5 2
12 5 3
12 5 4
12 5 5
12 5 1
Please note the change when the number 11 starts in the first column than the expected output
How can we use cycle and islice for multiple levels
Pattern:
The first column should be in order of numbers in Numberset1, the second column for first number in Numberset1 should be in order of numbers in Numberset2, the 3rd column for first number in Numberset1 should be in order of numbers in NUmberset2 but when the 2nd column for first number in Numberset1 changes it should also change and print from 2ndnumber in Numberset2 list and so on
Here's a version that accomplishes the task using cycle and islice. To make the code cleaner I've created a generator function aligned_cycle which cycles through the items yielded by cycle until we get the one we want to start the current cycle with.
This updated version can cope with Numberset1 having greater length than Numberset2.
from itertools import cycle, islice
def aligned_cycle(seq, start_item):
''' Make a generator that cycles over the items in `seq`.
The first item yielded equals `start_item`.
'''
if start_item not in seq:
raise ValueError("{} not in {}".format(start_item, seq))
it = cycle(seq)
for u in it:
if u == start_item:
break
yield u
yield from it
Numberset1 = [10, 11, 12]
Numberset2 = [1, 2, 3, 4, 5]
cycle_length = len(Numberset2)
for i, u in zip(Numberset1, cycle(Numberset2)):
for j in islice(aligned_cycle(Numberset2, u), cycle_length):
for k in islice(aligned_cycle(Numberset2, j), cycle_length):
print(i, j, k)
output
10 1 1
10 1 2
10 1 3
10 1 4
10 1 5
10 2 2
10 2 3
10 2 4
10 2 5
10 2 1
10 3 3
10 3 4
10 3 5
10 3 1
10 3 2
10 4 4
10 4 5
10 4 1
10 4 2
10 4 3
10 5 5
10 5 1
10 5 2
10 5 3
10 5 4
11 2 2
11 2 3
11 2 4
11 2 5
11 2 1
11 3 3
11 3 4
11 3 5
11 3 1
11 3 2
11 4 4
11 4 5
11 4 1
11 4 2
11 4 3
11 5 5
11 5 1
11 5 2
11 5 3
11 5 4
11 1 1
11 1 2
11 1 3
11 1 4
11 1 5
12 3 3
12 3 4
12 3 5
12 3 1
12 3 2
12 4 4
12 4 5
12 4 1
12 4 2
12 4 3
12 5 5
12 5 1
12 5 2
12 5 3
12 5 4
12 1 1
12 1 2
12 1 3
12 1 4
12 1 5
12 2 2
12 2 3
12 2 4
12 2 5
12 2 1
Jon Clements has written a more robust and more efficient version of aligned_cycle:
def aligned_cycle(iterable, start_item):
a, b = tee(iterable)
b = cycle(b)
for u, v in zip(a, b):
if u == start_item:
break
else:
return
yield u
yield from b
Thanks, Jon!
Related
I have a data set, redacted sample below. My goal is linear regression. My question is: Have I created unintended results, due to how I structured the df, using concat and/or div?
For example, predicting:
(2nd time rating) minus (base time rating) with
the ratio of (percent #1) over (percent #2 ).
From the df below:
((4 wk nprs rating)-(base nprs rating))
with dependent variable: ((active modalities)/(passive modalities))
I've created dataframes, in hopes of efficiency, and run OLS, all below.
Thank you for your insight.
What I've tried:
ctr = pd.read_csv('file_path/CTrial Data.csv') #CTR trial
ctr = ctr.apply(pd.to_numeric, errors='coerce') #converting all to numeric, except NaN where can't
ctr = ctr.fillna(0) #replacing NaN with 0's, dropping results in too much loss
Now the difference in pain perception over some time periods.
#OSWESTRY Pain Scale (lower back funcion)
#THIS scale is 0-50
#using absolute value leads to more than max, eg, 4-(-4)=8, not using abs results in negative values
ctr['Trial_4wk_diff'] = (ctr['osw_4wk'] - ctr['osw_base']).abs() #calculating the difference
ctr['Trial_12wk_diff'] = (ctr['osw_12wk'] - ctr['osw_base']).abs()
ctr['Trial_1yr_diff'] = (ctr['osw_1yr'] - ctr['osw_base']).abs()
Grouping the treating modalities by "active" and "passive" then
calculating the ratio. These ratings are specifically the perceived
benefit of each modality.
#perceived benefit of active treatment modalities over perceived benefit of passive modalities
ctr['active'] = pd.concat((ctr['perceived_ben_aerobic_ex'],ctr['perceived_ben_strength_ex'],ctr['perceived_ben_rom_ex']),ignore_index=True)
# 0 NaN
ctr['passive'] = pd.concat((ctr['perceived_ben_meds'],ctr['perceived_ben_rest'], ctr['perceived_ben_surgery'],ctr['perceived_ben_massage'],ctr['perceived_ben_manip'],ctr['perceived_ben_traction'],ctr['perceived_ben_work_restrict']), ignore_index = True)
# 0 NaN
ctr['mods'] = ctr['active'].div(ctr['passive']) #ratio of active vs passive perceived benefit #Here I simply used .div()
ctr['mods'] = add_constant(ctr['mods'].fillna(0))
#ctr['mods'].isna().sum()
#4
ctr['mods'] = ctr['mods'].fillna(0) #these steps remove the NaN
#ctr['mods'].isna().sum()
#0
Now running OLS.
#OSW 4 week difference with average ratio
from statsmodels.tools.tools import add_constant
X = ctr['mods'] #modality
y = ctr['nprs_4wk'].abs() #pain scale
model = sm.OLS(y, X).fit()
predictions = model.predict(X)
model.summary()
output
OLS Regression Results
Dep. Variable: nprs_4wk R-squared: -0.000
Model: OLS Adj. R-squared: -0.000
Method: Least Squares F-statistic: nan
Date: Thu, 26 Jan 2023 Prob (F-statistic): nan
Time: 11:14:45 Log-Likelihood: -260.12
No. Observations: 119 AIC: 522.2
Df Residuals: 118 BIC: 525.0
Df Model: 0
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
active_avg 0.7351 0.059 12.549 0.000 0.619 0.851
Omnibus: 12.434 Durbin-Watson: 2.044
Prob(Omnibus): 0.002 Jarque-Bera (JB): 14.152
Skew: 0.833 Prob(JB): 0.000845
Kurtosis: 2.718 Cond. No. 1.00
>>> fig, ax = plt.subplots()
>>> fig = sm.graphics.plot_fit(model, 0, ax=ax)
>>> ax.set_ylabel("NPRS 4 wk diff")
>>> ax.set_xlabel("ratio of perceived ben active vs passive")
>>> ax.set_title("4 wk pain scale vs perceived ben ratio")
Data below, with original missing values:
pb_meds
pb_rest
pb_surgery
pb_massage
pb_manip
pb_traction
pb_aerobic_ex
pb_rom_ex
pb_strength_ex
pb_work_restrict
osw_base
osw_4wk
osw_12wk
osw_1yr
4
4
1
4
4
3
4
4
4
4
14
7
19
5
2
4
1
4
4
2
1
2
2
5
35
13
0
0
4
4
4
3
3
3
2
2
2
3
18
19
14
18
1
3
1
5
5
5
5
5
5
1
13
6
8
14
4
5
1
5
3
1
3
5
3
4
18
16
0
0
3
2
1
4
4
5
4
4
4
1
11
8
8
5
4
4
2
4
1
3
2
4
4
2
10
3
3
0
4
4
3
3
2
2
2
4
5
3
15
15
8
4
4
2
1
4
4
2
2
4
4
2
15
9
4
6
4
4
1
4
3
3
3
4
4
2
10
15
9
3
3
3
3
4
4
3
5
4
4
1
10
7
2
0
4
3
4
5
4
5
4
4
10
11
22
17
4
3
1
3
4
4
4
4
5
4
19
24
16
16
4
3
1
4
4
4
4
4
4
4
22
16
17
22
4
4
3
4
4
3
3
3
4
4
16
9
9
0
3
4
2
4
1
3
3
4
4
3
19
17
24
27
2
4
1
5
4
2
3
3
4
2
10
0
0
0
3
4
2
4
3
4
3
3
4
3
18
11
16
20
5
5
1
1
1
1
1
1
1
5
28
0
0
0
3
4
1
5
2
1
3
3
3
5
23
21
15
14
3
3
1
4
3
3
3
3
3
2
15
6
1
6
3
5
2
5
3
3
4
5
3
5
24
30
22
13
4
4
1
4
3
3
4
5
4
10
0
10
8
1
5
1
5
1
1
1
4
4
4
12
16
14
20
1
4
3
2
3
3
4
2
2
4
14
13
16
11
3
3
3
2
3
3
1
3
3
3
22
0
0
22
4
4
3
3
3
3
2
2
3
3
13
22
15
22
4
4
1
4
3
3
2
4
2
5
15
6
2
2
5
5
3
4
1
3
2
3
3
5
27
0
0
0
3
1
3
4
4
3
1
1
1
1
15
15
16
14
3
4
3
4
4
3
2
3
2
4
25
9
0
0
4
4
3
3
3
3
5
5
5
5
33
26
22
19
4
4
3
4
4
3
3
4
4
4
10
0
0
0
4
4
1
3
1
1
3
3
5
4
18
12
10
15
4
3
4
4
4
3
3
3
4
4
17
14
8
18
4
4
4
4
4
4
4
4
4
4
12
9
12
13
3
3
2
2
2
3
3
3
3
10
13
0
13
4
4
1
4
4
1
4
4
4
4
12
11
8
9
4
4
2
4
4
3
3
4
3
3
10
11
5
11
4
4
5
5
5
3
3
3
3
5
18
18
13
0
4
5
3
5
3
2
5
5
5
4
17
11
10
19
3
4
5
3
1
3
3
2
4
5
17
16
28
28
4
5
1
4
1
1
4
4
4
4
12
11
0
0
4
4
4
3
3
2
3
4
4
5
18
16
16
0
3
3
3
4
5
3
4
5
5
1
14
9
9
10
2
4
3
2
3
4
3
4
3
3
24
0
17
25
4
4
3
4
4
3
4
4
4
1
32
24
26
20
4
4
3
4
2
3
3
2
3
4
18
0
0
0
4
3
2
5
5
2
4
4
3
19
11
10
5
4
4
1
4
1
1
4
4
4
3
26
0
11
0
5
5
3
3
5
5
5
5
5
1
15
15
13
11
4
4
3
4
18
8
0
0
4
1
1
4
3
3
4
4
4
4
15
18
18
0
5
5
1
5
4
3
4
4
5
18
0
4
17
4
4
3
4
3
3
2
4
4
3
10
6
11
7
4
3
3
4
4
4
4
4
4
4
12
0
15
12
4
4
3
5
4
3
4
4
4
13
11
0
0
1
2
3
5
5
3
5
5
5
4
14
0
5
0
3
3
3
3
2
3
2
4
4
3
18
8
10
15
4
3
3
4
4
3
2
4
3
3
18
15
11
12
3
4
1
5
4
4
2
5
5
4
12
7
19
14
3
4
1
5
4
3
2
3
2
3
11
0
20
0
2
4
1
3
4
4
4
4
4
5
11
3
11
4
3
2
1
4
1
3
4
4
4
2
12
12
9
15
2
5
1
4
4
3
4
5
5
1
10
0
3
4
4
4
3
5
3
3
4
4
4
2
14
0
8
14
2
3
2
4
3
3
3
3
3
4
21
19
24
25
4
5
1
5
5
3
4
5
4
4
19
18
18
25
4
3
1
4
4
3
4
5
5
4
10
0
0
0
3
3
3
4
4
3
4
4
4
3
22
4
1
0
3
4
2
4
4
3
3
4
3
4
18
5
3
0
4
4
3
3
3
3
1
3
3
4
12
20
18
10
4
4
2
4
4
4
4
4
4
4
21
18
23
0
4
4
3
4
2
3
1
2
4
5
13
12
0
7
4
4
1
4
4
3
4
4
4
3
23
24
24
23
4
4
1
4
2
1
5
5
5
3
13
9
0
18
1
1
3
2
5
1
1
4
4
3
13
6
0
0
4
5
1
5
5
5
4
5
5
3
18
1
2
1
4
4
3
4
3
3
3
4
3
3
22
31
0
0
3
3
3
3
3
3
3
3
3
3
23
5
5
10
4
2
1
5
4
1
4
4
4
4
12
5
2
2
5
5
5
3
3
3
4
4
4
4
23
8
2
3
4
4
4
3
4
3
3
3
3
5
12
6
0
0
3
2
3
3
3
3
2
3
3
2
14
11
9
0
3
4
4
2
3
4
4
4
4
16
15
10
9
3
1
1
3
3
3
1
2
2
3
17
16
23
0
4
4
2
4
4
3
4
4
4
3
11
8
3
14
12
5
4
4
1
1
3
3
3
1
3
3
3
16
2
4
7
5
4
5
4
3
3
3
4
5
4
12
11
19
12
30
4
0
0
4
3
3
4
3
3
2
2
4
3
21
1
9
8
11
13
19
15
3
3
2
3
2
2
3
3
4
3
16
19
26
30
4
4
5
3
5
5
3
4
4
4
35
17
4
12
3
3
2
3
3
3
3
3
3
16
13
10
23
4
5
4
4
5
5
15
8
0
0
4
4
1
4
4
1
1
4
5
3
16
20
4
15
3
4
3
4
3
3
3
3
3
3
18
17
18
0
4
4
3
2
2
3
3
4
4
4
17
5
1
1
3
5
4
3
2
3
1
1
1
4
25
18
29
22
4
3
2
5
2
2
5
5
5
1
10
8
6
0
2
2
4
5
4
5
5
5
4
11
0
0
0
3
2
1
5
5
3
4
4
5
3
10
7
4
15
18
22
19
0
4
4
4
2
4
4
2
3
4
5
12
0
9
0
4
5
3
4
3
3
3
4
4
5
12
11
0
0
2
4
2
4
5
3
3
4
3
4
11
20
14
0
4
4
4
4
4
4
4
4
4
4
21
14
3
0
4
4
3
3
3
3
4
4
3
5
14
20
2
0
4
4
1
1
1
1
1
3
3
3
11
4
3
5
3
3
3
3
3
3
3
3
3
3
25
0
0
0
5
5
3
5
5
3
5
5
5
5
13
16
9
0
4
4
3
4
4
5
4
5
5
3
18
8
6
0
3
3
3
3
3
3
3
3
3
3
16
11
0
0
3
3
3
3
4
4
4
4
5
2
10
5
7
2
3
4
2
4
3
3
4
4
4
3
12
9
8
0
4
5
3
3
2
2
4
4
4
5
26
17
16
0
4
4
1
4
3
3
4
4
4
4
14
16
0
This question already has answers here:
Add a sequential counter column on groups to a pandas dataframe
(4 answers)
Closed 9 months ago.
Suppose I have the following dataframe
import pandas as pd
df = pd.DataFrame({'a': [1,1,1,2,2,2,2,2,3,3,3,3,4,4,4,4,4,4],
'b': [3,4,3,7,5,9,4,2,5,6,7,8,4,2,4,5,8,0]})
a b
0 1 3
1 1 4
2 1 3
3 2 7
4 2 5
5 2 9
6 2 4
7 2 2
8 3 5
9 3 6
10 3 7
11 3 8
12 4 4
13 4 2
14 4 4
15 4 5
16 4 8
17 4 0
And I would like to make a new column c with values 1 to n where n depends on the value of column a as follow:
a b c
0 1 3 1
1 1 4 2
2 1 3 3
3 2 7 1
4 2 5 2
5 2 9 3
6 2 4 4
7 2 2 5
8 3 5 1
9 3 6 2
10 3 7 3
11 3 8 4
12 4 4 1
13 4 2 2
14 4 4 3
15 4 5 4
16 4 8 5
17 4 0 6
While I can write it using a for loop, my data frame is huge and it's computationally costly, is there any efficient to generate such column? Thanks.
Use groupby_cumcount:
df['c'] = df.groupby('a').cumcount().add(1)
print(df)
# Output
a b c
0 1 3 1
1 1 4 2
2 1 3 3
3 2 7 1
4 2 5 2
5 2 9 3
6 2 4 4
7 2 2 5
8 3 5 1
9 3 6 2
10 3 7 3
11 3 8 4
12 4 4 1
13 4 2 2
14 4 4 3
15 4 5 4
16 4 8 5
17 4 0 6
I am trying to conduct a mixed model analysis but would like to only include individuals who have data in all timepoints available. Here is an example of what my dataframe looks like:
import pandas as pd
ids = [1,1,1,1,1,1,2,2,2,2,2,2,3,3,3,4,4,4,4,4,4]
timepoint = [1,2,3,4,5,6,1,2,3,4,5,6,1,2,4,1,2,3,4,5,6]
outcome = [2,3,4,5,6,7,3,4,1,2,3,4,5,4,5,8,4,5,6,2,3]
df = pd.DataFrame({'id':ids,
'timepoint':timepoint,
'outcome':outcome})
print(df)
id timepoint outcome
0 1 1 2
1 1 2 3
2 1 3 4
3 1 4 5
4 1 5 6
5 1 6 7
6 2 1 3
7 2 2 4
8 2 3 1
9 2 4 2
10 2 5 3
11 2 6 4
12 3 1 5
13 3 2 4
14 3 4 5
15 4 1 8
16 4 2 4
17 4 3 5
18 4 4 6
19 4 5 2
20 4 6 3
I want to only keep individuals in the id column who have all 6 timepoints. I.e. IDs 1, 2, and 4 (and cut out all of ID 3's data).
Here's the ideal output:
id timepoint outcome
0 1 1 2
1 1 2 3
2 1 3 4
3 1 4 5
4 1 5 6
5 1 6 7
6 2 1 3
7 2 2 4
8 2 3 1
9 2 4 2
10 2 5 3
11 2 6 4
12 4 1 8
13 4 2 4
14 4 3 5
15 4 4 6
16 4 5 2
17 4 6 3
Any help much appreciated.
You can count the number of unique timepoints you have, and then filter your dataframe accordingly with transform('nunique') and loc keeping only the ID's that contain all 6 of them:
t = len(set(timepoint))
res = df.loc[df.groupby('id')['timepoint'].transform('nunique').eq(t)]
Prints:
id timepoint outcome
0 1 1 2
1 1 2 3
2 1 3 4
3 1 4 5
4 1 5 6
5 1 6 7
6 2 1 3
7 2 2 4
8 2 3 1
9 2 4 2
10 2 5 3
11 2 6 4
15 4 1 8
16 4 2 4
17 4 3 5
18 4 4 6
19 4 5 2
20 4 6 3
Here is my code which prints a particular number pattern. I want my number pattern to be in perfect triangular arrangement like:
a = int(input('Enter number: '))
base = a
while base > 0:
for j in range(1, a + 1):
print(' ' * (2 * j - 2), end = '')
for i in range(1, base + 1):
print(str(i), end = ' ')
print()
base -= 1
The output:
Enter number: 5
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Enter number: 7
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
The program works fine for numbers < 10 but when I input a number > 10 it gives a distorted pattern.
For example:
Enter number: 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
So is there a way to make the pattern right?
If you want to have the same result for two digit numbers, you have to format your string. Here how it also works for two digit results:
a = int(input('Enter number: '))
base = a
while base > 0:
for j in range(1, a + 1):
print(' ' * (2 * j - 2), end = '')
for i in range(1, base + 1):
print('{0:>2}'.format(str(i)), end = ' ')
print()
base -= 1
Result for 15:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
Some adjustments and str.rjust will do the trick:
a = base = 15
while base > 0:
for j in range(a):
print(' ' * 3 * j, end='')
for i in range(base):
print(str(i+1).rjust(3), end='')
print()
base -= 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 2 3 4 5 6 7 8 9 10 11 12 13
1 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 7 8 9 10 11
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7
1 2 3 4 5 6
1 2 3 4 5
1 2 3 4
1 2 3
1 2
1
You might use rjust method of str, it does:
Return a right-justified string of length width. Padding is done using
the specified fill character (default is a space).
Simple example usage:
numbers = [1, 10, 10, 1000, 10000]
for n in numbers:
print(str(n).rjust(5))
Output:
1
10
10
1000
10000
Note that rjust requires at least one argument: width, if original str is shorther than width leading spaces (or other characters if specified) will be added to get str of length equal to width, otherwise original str will be returned.
Assume I have a dataframe like this, for example:
0 1 2 3 4 5 6 7 8 9
0 8 9 2 1 6 2 6 8 6 3
1 1 1 8 3 1 6 3 6 3 9
2 1 4 3 5 9 3 5 9 2 3
3 4 6 3 8 4 3 1 5 1 1
4 1 8 5 3 9 6 1 7 2 2
5 6 6 7 9 1 8 2 3 2 8
6 8 3 6 9 9 5 8 4 7 7
7 8 3 3 8 7 1 4 9 7 2
8 7 6 1 4 8 1 6 9 6 6
9 3 3 2 4 8 1 8 1 1 8
10 7 7 5 7 1 4 1 8 8 6
11 6 3 2 7 6 5 7 4 8 7
I would like to put rows to certain "blocks" of given length and the flatten them to single rows. So for example, if the block length would be 3, the result here would be:
0 1 2 3 4 5 6 7 8 9 10 ... 19 20 21 22 23 24 25 26 27 28 29
2 8 9 2 1 6 2 6 8 6 3 1 ... 9 1 4 3 5 9 3 5 9 2 3
5 4 6 3 8 4 3 1 5 1 1 1 ... 2 6 6 7 9 1 8 2 3 2 8
8 8 3 6 9 9 5 8 4 7 7 8 ... 2 7 6 1 4 8 1 6 9 6 6
11 3 3 2 4 8 1 8 1 1 8 7 ... 6 6 3 2 7 6 5 7 4 8 7
How to achieve this?
I think need reshape:
n_blocks =3
df = pd.DataFrame(df.values.reshape(-1, n_blocks *df.shape[1]))
print (df)
0 1 2 3 4 5 6 7 8 9 ... 20 21 22 23 24 25 26 27 \
0 8 9 2 1 6 2 6 8 6 3 ... 1 4 3 5 9 3 5 9
1 4 6 3 8 4 3 1 5 1 1 ... 6 6 7 9 1 8 2 3
2 8 3 6 9 9 5 8 4 7 7 ... 7 6 1 4 8 1 6 9
3 3 3 2 4 8 1 8 1 1 8 ... 6 3 2 7 6 5 7 4
28 29
0 2 3
1 2 8
2 6 6
3 8 7
[4 rows x 30 columns]
I found this solution, maybe someone comes up with a better one:
def toBlocks(df, blocklen):
shifted = [df.shift(periods=p) for p in range(blocklen)]
return pd.concat(shifted, axis=1)[blocklen-1:]