Pandas: row by row operations on multiple columns

Pandas: row by row operations on multiple columns - python

I have 2 variables A and B, both scalars. And a DataFrame df1 with 1000 columns and 86400 rows. In the table below there are just 10 columns for simplicity:
0 1 2 3 4 5 6 7 8 9 f
0 4.000000 23.000000 6.000000 36.000000 37.000000 33.000000 22.000000 28.000000 8.000000 14.000000 50.135
1 4.002361 23.002361 6.002361 36.002361 37.002361 33.002361 22.002361 28.002361 8.002361 14.002361 50.130
2 4.004722 23.004722 6.004722 36.004722 37.004722 33.004722 22.004722 28.004722 8.004722 14.004722 50.120
3 4.007083 23.007083 6.007083 36.007083 37.007083 33.007083 22.007083 28.007083 8.007083 14.007083 50.112
4 4.009444 23.009444 6.009444 36.009444 37.009444 33.009444 22.009444 28.009444 8.009444 14.009444 50.102
5 4.011806 23.011806 6.011806 36.011806 37.011806 33.011806 22.011806 28.011806 8.011806 14.011806 50.097
... ... ... ... ... ... ... ... ... ... ... ...
86387 207.969306 226.969306 209.969306 239.969306 240.969306 236.969306 225.969306 231.969306 211.969306 217.969306 49.920
86388 207.971667 226.971667 209.971667 239.971667 240.971667 236.971667 225.971667 231.971667 211.971667 217.971667 49.920
86389 207.974028 226.974028 209.974028 239.974028 240.974028 236.974028 225.974028 231.974028 211.974028 217.974028 49.920
86390 207.976389 226.976389 209.976389 239.976389 240.976389 236.976389 225.976389 231.976389 211.976389 217.976389 49.920
86391 207.978750 226.978750 209.978750 239.978750 240.978750 236.978750 225.978750 231.978750 211.978750 217.978750 49.917
86392 207.981111 226.981111 209.981111 239.981111 240.981111 236.981111 225.981111 231.981111 211.981111 217.981111 49.917
86393 207.983472 226.983472 209.983472 239.983472 240.983472 236.983472 225.983472 231.983472 211.983472 217.983472 49.915
86394 207.985833 226.985833 209.985833 239.985833 240.985833 236.985833 225.985833 231.985833 211.985833 217.985833 49.915
86395 207.988194 226.988194 209.988194 239.988194 240.988194 236.988194 225.988194 231.988194 211.988194 217.988194 49.915
86396 207.990556 226.990556 209.990556 239.990556 240.990556 236.990556 225.990556 231.990556 211.990556 217.990556 49.912
86397 207.992917 226.992917 209.992917 239.992917 240.992917 236.992917 225.992917 231.992917 211.992917 217.992917 49.915
86398 207.995278 226.995278 209.995278 239.995278 240.995278 236.995278 225.995278 231.995278 211.995278 217.995278 49.917
86399 207.997639 226.997639 209.997639 239.997639 240.997639 236.997639 225.997639 231.997639 211.997639 217.997639 49.917
I would like to perform a row by row Operation:
when f>50: add C=A/B/3600 to the value in columns 1-999.
when f<50: subtract C=A/B/3600 to the value in columns 1-999.
cols = df1.columns[df1.columns.isin(range(0, 999))]
df1[cols] = np.where(df1[cols] > 50,
df1[cols].values - np.arange(len(df1))[:, None] * C,
df1[cols].values + np.arange(len(df1))[:, None] * C)
As can be seen the value goes on increasing even if f<50.
Any Suggestion?
Thank you in advance

Use numpy.where and add numpy.arange by condition if performance is important:
cols = df.columns[df.columns.isin(range(1, 1000))]
df[cols] = np.where(df[cols] > 50,
df[cols].values - np.arange(len(df))[:, None],
df[cols].values + np.arange(len(df))[:, None])
print (df)
0 1 2 3 4 5 6 7 8 9 991 992 993 994 995 996 \
0 18 8 9 5 38 11 26 25 2 30 23 18 34 1 29 34
1 18 9 10 6 39 12 27 26 3 31 24 19 35 2 30 35
2 18 10 11 7 40 13 28 27 4 32 25 20 36 3 31 36
3 18 11 12 8 41 14 29 28 5 33 26 21 37 4 32 37
4 18 12 13 9 42 15 30 29 6 34 27 22 38 5 33 38
5 18 13 14 10 43 16 31 30 7 35 28 23 39 6 34 39
6 18 14 15 11 44 17 32 31 8 36 29 24 40 7 35 40
86393 18 15 16 12 45 18 33 32 9 37 30 25 41 8 36 41
86394 18 16 17 13 46 19 34 33 10 38 31 26 42 9 37 42
86395 18 17 18 14 47 20 35 34 11 39 32 27 43 10 38 43
86396 18 18 19 15 48 21 36 35 12 40 33 28 44 11 39 44
86397 18 19 20 16 49 22 37 36 13 41 34 29 45 12 40 45
86398 18 20 21 17 50 23 38 37 14 42 35 30 46 13 41 46
86399 18 21 22 18 51 24 39 38 15 43 36 31 47 14 42 47
86400 18 22 23 19 52 25 40 39 16 44 37 32 48 15 43 48
997 998 999 f
0 25 15 2 50.135
1 26 16 3 50.130
2 27 17 4 50.120
3 28 18 5 50.112
4 29 19 6 50.102
5 30 20 7 50.097
6 31 21 8 50.095
86393 32 22 9 49.915
86394 33 23 10 49.915
86395 34 24 11 49.915
86396 35 25 12 49.912
86397 36 26 13 49.915
86398 37 27 14 49.917
86399 38 28 15 49.917
86400 39 29 16 49.915
EDIT:
A = 360000
B = 5
C = A/B/3600
cols = df.columns[df.columns.isin(range(1, 1000))]
mask = df[cols] > 50
df[cols] = np.where(mask,
df[cols].values - mask.cumsum().sub(1) * C,
df[cols].values + (~mask).cumsum().sub(1) * C)
print (df)
0 1 2 3 4 5 \
0 4.000000 23.000000 6.000000 36.000000 37.000000 33.000000
1 4.002361 43.002361 26.002361 56.002361 57.002361 53.002361
2 4.004722 63.004722 46.004722 76.004722 77.004722 73.004722
3 4.007083 83.007083 66.007083 96.007083 97.007083 93.007083
4 4.009444 103.009444 86.009444 116.009444 117.009444 113.009444
5 4.011806 123.011806 106.011806 136.011806 137.011806 133.011806
86387 207.969306 226.969306 209.969306 239.969306 240.969306 236.969306
86388 207.971667 206.971667 189.971667 219.971667 220.971667 216.971667
86389 207.974028 186.974028 169.974028 199.974028 200.974028 196.974028
86390 207.976389 166.976389 149.976389 179.976389 180.976389 176.976389
86391 207.978750 146.978750 129.978750 159.978750 160.978750 156.978750
86392 207.981111 126.981111 109.981111 139.981111 140.981111 136.981111
86393 207.983472 106.983472 89.983472 119.983472 120.983472 116.983472
86394 207.985833 86.985833 69.985833 99.985833 100.985833 96.985833
86395 207.988194 66.988194 49.988194 79.988194 80.988194 76.988194
86396 207.990556 46.990556 29.990556 59.990556 60.990556 56.990556
86397 207.992917 26.992917 9.992917 39.992917 40.992917 36.992917
86398 207.995278 6.995278 -10.004722 19.995278 20.995278 16.995278
86399 207.997639 -13.002361 -30.002361 -0.002361 0.997639 -3.002361
6 7 8 9 f
0 22.000000 28.000000 8.000000 14.000000 50.135
1 42.002361 48.002361 28.002361 34.002361 50.130
2 62.004722 68.004722 48.004722 54.004722 50.120
3 82.007083 88.007083 68.007083 74.007083 50.112
4 102.009444 108.009444 88.009444 94.009444 50.102
5 122.011806 128.011806 108.011806 114.011806 50.097
86387 225.969306 231.969306 211.969306 217.969306 49.920
86388 205.971667 211.971667 191.971667 197.971667 49.920
86389 185.974028 191.974028 171.974028 177.974028 49.920
86390 165.976389 171.976389 151.976389 157.976389 49.920
86391 145.978750 151.978750 131.978750 137.978750 49.917
86392 125.981111 131.981111 111.981111 117.981111 49.917
86393 105.983472 111.983472 91.983472 97.983472 49.915
86394 85.985833 91.985833 71.985833 77.985833 49.915
86395 65.988194 71.988194 51.988194 57.988194 49.915
86396 45.990556 51.990556 31.990556 37.990556 49.912
86397 25.992917 31.992917 11.992917 17.992917 49.915
86398 5.995278 11.995278 -8.004722 -2.004722 49.917
86399 -14.002361 -8.002361 -28.002361 -22.002361 49.917

Related

how to split an integer value from one column to two columns in text file using pandas or numpy (python)

I have a text file which has a number of integer values like this.
20180701 20180707 52 11 1 2 4 1 0 0 10 7 1 3 1 0 4 5 2
20180708 20180714 266 8 19 3 2 9 7 25 20 17 12 9 9 27 34 54 11
20180715 20180721 654 52 34 31 20 16 12 25 84 31 38 37 38 69 66 87 14
20180722 201807281017 110 72 46 52 29 29 22 204 41 46 51 57 67 82 92 17
20180729 201808041106 276 37 11 87 20 10 8 284 54 54 72 38 49 41 53 12
20180805 20180811 624 78 19 15 55 16 8 9 172 15 31 35 38 47 29 36 21
20180812 20180818 488 63 17 7 26 10 9 7 116 17 14 39 31 34 27 64 7
20180819 20180825 91 4 7 0 4 5 1 3 16 3 4 5 10 10 7 11 1
20180826 20180901 49 2 2 1 0 4 0 1 2 0 1 4 8 2 6 6 10
I have to make a file by merging several files like this but you guys can see a problem with this data.
In 4 and 5 lines, the first values, 1017 and 1106, right next to period index make a problem.
When I try to read these two lines, I always have had this result.
It came out that first values in first column next to index columns couldn't recognized as first values themselves.
In [14]: fw.iloc[80,:]
Out[14]:
3 72.0
4 46.0
5 52.0
6 29.0
7 29.0
8 22.0
9 204.0
10 41.0
11 46.0
12 51.0
13 57.0
14 67.0
15 82.0
16 92.0
17 17.0
18 NaN
Name: (20180722, 201807281017), dtype: float64
I tried to make it correct with indexing but failed.
The desirable result is,
In [14]: fw.iloc[80,:]
Out[14]:
2 1017.0
3 110.0
4 72.0
5 46.0
6 52.0
7 29.0
8 29.0
9 22.0
10 204.0
11 41.0
12 46.0
13 51.0
14 57.0
15 67.0
16 82.0
17 92.0
18 17.0
Name: (20180722, 201807281017), dtype: float64
How can I solve this problem?
+
I used this code to read this file.
fw = pd.read_csv('warm_patient.txt', index_col=[0,1], header=None, delim_whitespace=True)

A better fit for this would be pandas.read_fwf. For your example:
df = pd.read_fwf(filename, index_col=[0,1], header=None, widths=2*[10]+17*[4])
I don't know if the column widths can be inferred for all your data or need to be hardcoded.

One possibility would be to manually construct the dataframe, this way we can parse the text by splitting the values every 4 characters.
from textwrap import wrap
import pandas as pd
def read_file(f_name):
data = []
with open(f_name) as f:
for line in f.readlines():
idx1 = line[0:8]
idx2 = line[10:18]
points = map(lambda x: int(x.replace(" ", "")), wrap(line.rstrip()[18:], 4))
data.append([idx1, idx2, *points])
return pd.DataFrame(data).set_index([0, 1])

It could be made somewhat more efficient (in particular if this is a particularly long text file), but here's one solution.
fw = pd.read_csv('test.txt', header=None, delim_whitespace=True)
for i in fw[pd.isna(fw.iloc[:,-1])].index:
num_str = str(fw.iat[i,1])
a,b = map(int,[num_str[:-4],num_str[-4:]])
fw.iloc[i,3:] = fw.iloc[i,2:-1]
fw.iloc[i,:3] = [fw.iat[i,0],a,b]
fw = fw.set_index([0,1])
The result of print(fw) from there is
2 3 4 5 6 7 8 9 10 11 12 13 14 15 \
0 1
20180701 20180707 52 11 1 2 4 1 0 0 10 7 1 3 1 0
20180708 20180714 266 8 19 3 2 9 7 25 20 17 12 9 9 27
20180715 20180721 654 52 34 31 20 16 12 25 84 31 38 37 38 69
20180722 20180728 1017 110 72 46 52 29 29 22 204 41 46 51 57 67
20180729 20180804 1106 276 37 11 87 20 10 8 284 54 54 72 38 49
20180805 20180811 624 78 19 15 55 16 8 9 172 15 31 35 38 47
20180812 20180818 488 63 17 7 26 10 9 7 116 17 14 39 31 34
20180819 20180825 91 4 7 0 4 5 1 3 16 3 4 5 10 10
20180826 20180901 49 2 2 1 0 4 0 1 2 0 1 4 8 2
16 17 18
0 1
20180701 20180707 4 5 2.0
20180708 20180714 34 54 11.0
20180715 20180721 66 87 14.0
20180722 20180728 82 92 17.0
20180729 20180804 41 53 12.0
20180805 20180811 29 36 21.0
20180812 20180818 27 64 7.0
20180819 20180825 7 11 1.0
20180826 20180901 6 6 10.0
Here's the result of the print after applying your initial solution of fw = pd.read_csv('test.txt', index_col=[0,1], header=None, delim_whitespace=True) for comparison.
2 3 4 5 6 7 8 9 10 11 12 13 14 \
0 1
20180701 20180707 52 11 1 2 4 1 0 0 10 7 1 3 1
20180708 20180714 266 8 19 3 2 9 7 25 20 17 12 9 9
20180715 20180721 654 52 34 31 20 16 12 25 84 31 38 37 38
20180722 201807281017 110 72 46 52 29 29 22 204 41 46 51 57 67
20180729 201808041106 276 37 11 87 20 10 8 284 54 54 72 38 49
20180805 20180811 624 78 19 15 55 16 8 9 172 15 31 35 38
20180812 20180818 488 63 17 7 26 10 9 7 116 17 14 39 31
20180819 20180825 91 4 7 0 4 5 1 3 16 3 4 5 10
20180826 20180901 49 2 2 1 0 4 0 1 2 0 1 4 8
15 16 17 18
0 1
20180701 20180707 0 4 5 2.0
20180708 20180714 27 34 54 11.0
20180715 20180721 69 66 87 14.0
20180722 201807281017 82 92 17 NaN
20180729 201808041106 41 53 12 NaN
20180805 20180811 47 29 36 21.0
20180812 20180818 34 27 64 7.0
20180819 20180825 10 7 11 1.0
20180826 20180901 2 6 6 10.0

Create bi-weekly and monthly labels with week numbers in pandas

I have a dataframe with profit values, IDs, and week values. It looks a little like this
ID
Week
Profit
A
1
2
A
2
2
A
3
0
A
4
0
I want to create two new columns called "Bi-Weekly" and "Monthly", so week 1 would be label 2, week 2 would also be label 2, but week 3 would be labeled 4, and week 4 would be labeled 4, and they would all be labeled month 1, so I could groupby weekly, bi-weekly, or monthly profit as needed. Right now I've created two functions which work, but the weeks are going to go up to a year (52 weeks) so I was wondering if there's a more efficient way. My bi-weekly function below.
def biweek(prof_calc):
if (prof_calc['week']==2):
return 2
elif (prof_calc['week']==3):
return 2
elif (prof_calc['week']==4):
return 4
elif (prof_calc['week']==5):
return 4
elif (prof_calc['week']==6):
return 6
elif (prof_calc['week']==7):
return 6
elif (prof_calc['week']==8):
return 8
elif (prof_calc['week']==9):
return 8
elif (prof_calc['week']==10):
return 10
elif (prof_calc['week']==11):
return 10
prof_calc['BiWeek'] = prof_calc.apply(biweek, axis=1)

IIUC, you could try:
df["Biweekly"] = (df["Week"]-1)//2+1
df["Monthly"] = (df["Week"]-1)//4+1
>>> df
ID Week Profit Biweekly Monthly
0 A 1 42 1 1
1 A 2 69 1 1
2 A 3 53 2 1
3 A 4 63 2 1
4 A 5 56 3 2
5 A 6 57 3 2
6 A 7 86 4 2
7 A 8 23 4 2
8 A 9 35 5 3
9 A 10 10 5 3
10 A 11 25 6 3
11 A 12 21 6 3
12 A 13 39 7 4
13 A 14 82 7 4
14 A 15 76 8 4
15 A 16 20 8 4
16 A 17 97 9 5
17 A 18 67 9 5
18 A 19 21 10 5
19 A 20 22 10 5
20 A 21 88 11 6
21 A 22 67 11 6
22 A 23 33 12 6
23 A 24 38 12 6
24 A 25 8 13 7
25 A 26 67 13 7
26 A 27 16 14 7
27 A 28 49 14 7
28 A 29 3 15 8
29 A 30 17 15 8
30 A 31 79 16 8
31 A 32 19 16 8
32 A 33 21 17 9
33 A 34 9 17 9
34 A 35 56 18 9
35 A 36 83 18 9
36 A 37 1 19 10
37 A 38 53 19 10
38 A 39 66 20 10
39 A 40 55 20 10
40 A 41 85 21 11
41 A 42 90 21 11
42 A 43 34 22 11
43 A 44 3 22 11
44 A 45 9 23 12
45 A 46 28 23 12
46 A 47 58 24 12
47 A 48 14 24 12
48 A 49 42 25 13
49 A 50 69 25 13
50 A 51 76 26 13
51 A 52 49 26 13

How do you correctly format multiple columns of integers in python?

I have some code here:
for i in range(self.size):
print('{:6d}'.format(self.data[i], end=' '))
if (i + 1) % NUMBER_OF_COLUMNS == 0:
print()
Right now this prints as:
1
1
1
1
1
2
3
3
3
3
(whitespace)
3
3
3
etc.
It creates a new line when it hits 10 digits, but it doens't print the initial 10 in a row...
This is what I want-
1 1 1 1 1 1 1 2 2 3
3 3 3 3 3 4 4 4 4 5
However when it hits two digit numbers it gets messed up -
8 8 8 8 8 9 9 9 9 10
10 10 10 10 10 10 etc.
I want it to be right-aligned like this-
8 8 8 8 8 9
10 10 10 10 11 12 etc.
When I remove the format piece it will print the rows out, but there wont be the extra spacing in there of course!

You can align strings by "padding" values using a string's .rjust method. Using some dummy data:
NUMBER_OF_COLUMNS = 10
for i in range(100):
print("{}".format(i//2).rjust(3), end=' ')
#print("{:3}".format(i//2), end=' ') edit: this also works. Thanks AChampion
if (i + 1) % NUMBER_OF_COLUMNS == 0:
print()
#Output:
0 0 1 1 2 2 3 3 4 4
5 5 6 6 7 7 8 8 9 9
10 10 11 11 12 12 13 13 14 14
15 15 16 16 17 17 18 18 19 19
20 20 21 21 22 22 23 23 24 24
25 25 26 26 27 27 28 28 29 29
30 30 31 31 32 32 33 33 34 34
35 35 36 36 37 37 38 38 39 39
40 40 41 41 42 42 43 43 44 44
45 45 46 46 47 47 48 48 49 49

Another approach is to just chunk up the data into rows and print each row, e.g.:
def chunk(iterable, n):
return zip(*[iter(iterable)]*n)
for row in chunk(self.data, NUMBER_OF_COLUMNS):
print(' '.join(str(data).rjust(6) for data in row))
e.g:
In []:
for row in chunk(range(100), 10):
print(' '.join(str(data//2).rjust(3) for data in row))
Out[]:
0 0 1 1 2 2 3 3 4 4
5 5 6 6 7 7 8 8 9 9
10 10 11 11 12 12 13 13 14 14
15 15 16 16 17 17 18 18 19 19
20 20 21 21 22 22 23 23 24 24
25 25 26 26 27 27 28 28 29 29
30 30 31 31 32 32 33 33 34 34
35 35 36 36 37 37 38 38 39 39
40 40 41 41 42 42 43 43 44 44
45 45 46 46 47 47 48 48 49 49

BFS to find all fixed-length cycles

I have to find all cycles of length 3 in a given graph. I've implemented it using BFS, but so far it works only for relatively small inputs. It still works for bigger ones and gives correct answer, but the time it takes to find an answer is extremely high. Is there any way to improve the following code to make it more efficient?
num_res = 0
adj_list = []
cycles_list = []
def bfs_cycles(start):
queue = [(start, [start])]
depth = 0
while queue and depth <= 3:
(vertex, path) = queue.pop(0)
current_set = set(adj_list[vertex]) - set(path)
if start in set(adj_list[vertex]):
current_set = current_set.union([start])
depth = len(path)
for node in current_set:
if node == start:
if depth == 3 and sorted(path) not in cycles_list:
cycles_list.append(sorted(path))
yield path + [node]
else:
queue.append((node, path + [node]))
if __name__ == "__main__":
num_towns, num_pairs = [int(x) for x in input().split()]
adj_list = [[] for x in range(num_towns)]
adj_matrix = [[0 for x in range(num_towns)] for x in range(num_towns)]
# EDGE LIST TO ADJACENCY LIST
for i in range(num_pairs):
cur_start, cur_end = [int(x) for x in input().split()]
adj_list[cur_start].append(cur_end)
adj_list[cur_end].append(cur_start)
num_cycles = 0
for i in range(num_towns):
my_list = list(bfs_cycles(i))
num_cycles += len(my_list)
print(num_cycles)
Examples of inputs:
6 15
5 4
2 0
3 1
5 1
4 1
5 3
1 0
4 0
4 3
5 2
2 1
3 0
3 2
5 0
4 2
(output: 20; works ok)
52 1051
48 5
41 28
12 4
33 27
12 5
1 0
15 12
50 8
33 8
38 28
26 10
13 7
39 18
31 11
48 19
41 19
40 25
47 45
27 16
46 25
42 6
5 4
51 2
30 21
41 27
26 25
33 11
45 26
16 7
23 15
17 6
45 22
32 6
29 8
36 20
30 1
36 25
41 6
46 4
46 40
18 8
38 1
28 5
43 22
21 11
39 14
31 29
18 9
50 35
32 17
48 27
49 40
16 1
49 47
41 12
30 28
33 14
48 12
37 20
49 20
48 8
48 6
27 17
46 44
31 12
17 9
32 27
14 11
40 23
36 19
38 10
42 2
35 22
26 23
29 23
30 11
11 7
47 12
30 13
38 34
48 11
46 8
42 31
30 4
35 17
50 2
51 1
12 10
44 25
47 17
45 24
25 2
45 11
39 21
39 31
9 6
16 3
10 6
15 11
37 2
23 6
41 40
34 26
45 33
35 23
45 36
11 4
38 7
36 6
10 3
33 12
39 12
41 24
47 8
33 5
44 18
45 8
48 41
44 37
11 3
16 6
21 10
20 0
44 36
29 4
43 33
48 4
46 35
33 6
42 12
45 19
12 8
37 15
43 41
36 11
12 11
50 37
9 7
51 30
36 0
33 17
36 35
50 36
49 37
50 16
46 21
36 22
49 15
46 28
50 27
20 10
23 0
36 29
35 33
42 17
31 16
48 47
48 23
17 2
40 14
10 5
45 7
48 42
39 32
51 4
42 8
38 19
34 10
50 5
51 36
46 26
42 38
20 12
44 32
34 4
49 6
50 45
37 10
45 41
38 11
42 30
21 20
43 23
42 26
33 1
17 7
26 6
16 12
44 16
21 9
36 30
39 24
26 4
47 10
18 7
36 12
26 17
28 13
18 11
23 7
44 4
43 26
26 16
22 21
37 0
36 28
34 5
22 17
41 20
31 8
27 25
12 2
42 11
29 28
39 33
34 12
30 2
22 8
40 15
42 9
28 7
44 41
41 35
44 17
12 7
13 10
23 20
48 38
43 12
32 19
43 30
50 1
10 1
17 12
32 2
26 14
29 12
32 5
7 6
36 16
49 7
31 1
45 17
33 29
28 11
32 0
49 32
42 36
16 4
45 20
21 14
39 15
34 18
13 8
27 15
19 11
37 36
36 14
28 4
36 13
17 11
38 13
35 28
50 10
39 28
40 2
35 8
32 24
47 34
45 27
41 21
21 4
47 27
48 1
35 30
21 5
20 14
27 26
17 1
28 17
43 7
31 6
20 3
34 21
8 2
21 1
32 9
29 1
45 43
50 39
19 15
22 12
48 7
46 18
45 35
50 42
51 17
37 6
24 23
29 3
39 20
51 50
38 6
50 11
38 14
25 24
14 7
45 44
28 14
50 49
42 28
36 7
35 25
13 4
46 1
48 21
51 11
39 11
17 5
31 0
49 36
40 4
37 21
35 1
23 4
43 4
46 36
38 20
37 27
30 0
44 34
49 10
48 14
48 45
38 31
47 29
40 16
51 20
34 17
51 19
24 9
24 5
5 1
15 13
26 2
19 12
50 14
42 7
35 14
46 20
43 28
8 3
38 37
28 1
21 0
51 5
17 16
38 17
34 30
46 12
17 14
50 9
16 13
30 27
45 0
41 16
41 32
48 18
30 8
51 47
11 8
40 13
34 32
23 11
51 28
42 35
36 2
13 11
28 8
15 10
39 35
27 1
50 7
41 23
46 39
38 9
44 10
46 38
6 4
44 27
36 21
35 9
45 30
44 7
37 1
44 28
9 1
32 31
39 16
4 0
44 13
24 0
17 15
15 1
32 8
39 22
42 34
24 6
49 18
36 1
51 42
38 5
14 12
33 3
51 45
24 18
37 32
46 6
44 12
23 10
32 12
50 26
29 20
41 30
6 0
48 31
39 8
21 19
47 6
47 16
18 3
46 27
11 10
36 3
47 2
17 10
43 6
36 8
4 1
14 9
42 1
44 1
46 22
44 23
40 26
30 17
21 17
42 29
45 16
49 45
11 6
35 7
46 42
14 10
26 13
49 44
19 18
26 12
46 2
50 41
43 20
38 24
48 30
34 29
25 19
32 11
46 16
30 25
38 15
50 38
51 23
47 28
14 5
40 12
21 8
47 36
38 32
32 15
28 21
45 10
44 8
34 0
32 14
43 25
32 21
38 2
27 2
24 17
33 31
49 26
22 13
13 1
32 20
43 0
46 0
45 29
40 32
48 44
45 34
29 2
39 27
14 8
26 3
40 19
45 38
40 11
34 6
43 39
40 8
35 0
18 0
47 25
21 18
24 8
18 4
25 14
20 11
18 17
24 14
27 23
47 15
38 21
19 2
6 1
46 11
51 38
6 3
31 17
3 0
13 2
41 1
51 14
19 5
39 2
41 22
16 9
22 3
13 0
42 21
24 16
44 31
51 25
40 33
46 29
47 31
51 35
35 18
43 1
47 22
20 18
48 29
39 23
31 25
32 25
22 10
46 24
32 3
46 13
24 15
34 13
50 18
41 4
41 2
43 27
29 10
30 20
32 7
50 20
42 10
42 24
15 7
48 25
41 39
32 1
40 36
20 7
32 13
27 3
34 7
48 34
47 39
39 36
40 5
19 0
25 20
38 12
27 14
44 3
36 4
37 4
33 28
37 23
34 9
46 45
25 9
30 16
34 14
46 37
28 26
26 22
18 5
16 0
36 27
45 42
38 33
37 22
27 0
44 15
49 42
34 23
29 11
30 12
17 8
48 28
10 4
36 15
44 14
23 19
43 18
27 5
40 1
18 12
34 20
50 23
9 3
35 4
46 15
37 11
27 4
19 3
45 1
47 1
48 17
9 2
39 26
33 10
38 30
45 25
48 24
29 17
37 28
34 31
51 21
43 8
31 4
20 16
39 25
31 13
24 3
50 43
13 9
32 23
40 18
45 40
37 35
47 38
42 13
51 26
43 31
49 23
18 15
15 0
43 9
7 2
48 46
35 11
42 23
47 40
3 1
25 6
46 3
42 19
28 9
15 3
43 3
35 10
42 41
51 46
9 4
46 34
28 0
6 5
45 14
26 11
48 13
33 23
40 9
23 21
18 16
28 12
43 29
35 31
30 14
36 34
49 38
49 22
24 11
23 14
45 13
49 21
48 16
51 10
39 4
50 46
50 48
43 17
31 18
38 23
2 0
41 0
30 19
20 1
29 19
48 32
30 15
40 22
51 12
50 40
24 4
39 10
31 20
7 0
40 17
41 31
37 29
33 32
30 3
40 6
51 15
46 19
31 28
34 22
31 5
33 7
29 14
34 24
44 6
24 2
44 40
35 6
37 18
47 0
43 42
49 30
49 25
19 1
25 3
49 5
40 10
25 21
48 15
35 19
50 6
36 17
44 33
21 13
15 4
36 32
28 6
49 35
47 9
49 46
47 14
25 4
44 29
38 25
23 12
51 41
20 5
39 34
15 6
47 23
21 6
47 11
22 7
41 29
34 2
43 38
6 2
3 2
40 20
40 24
37 16
32 26
49 31
49 16
50 13
31 2
26 1
5 0
19 16
45 32
42 40
16 5
15 8
38 27
12 6
47 4
39 6
31 19
26 9
47 18
42 32
4 2
42 20
46 10
27 6
41 7
49 2
49 28
20 9
46 33
16 11
14 4
34 1
33 2
30 6
47 44
41 8
23 17
33 25
23 5
24 13
33 20
44 35
47 46
47 7
41 25
45 5
28 23
31 15
31 10
39 9
40 7
45 6
43 11
35 26
51 34
44 38
45 3
24 19
51 22
47 42
34 15
37 33
29 9
49 3
14 3
23 2
39 7
46 23
40 31
33 16
44 43
41 36
37 17
43 40
32 18
46 32
26 18
4 3
39 5
44 11
28 20
44 21
41 26
39 38
36 5
7 3
39 0
27 18
26 20
18 2
50 28
37 26
40 27
17 4
50 3
39 30
32 29
50 34
18 1
20 4
36 23
25 15
49 0
45 39
39 1
37 5
23 16
47 20
27 20
38 4
46 43
34 27
15 5
31 23
39 29
46 7
38 35
41 14
45 9
25 22
10 9
35 21
19 14
37 8
47 35
9 0
35 13
21 16
50 32
37 7
19 8
22 5
51 24
51 9
29 0
51 39
44 19
42 5
31 9
40 30
51 37
25 12
26 0
32 16
25 1
41 13
47 43
25 18
35 29
50 44
45 23
44 20
50 47
22 2
45 4
34 19
48 33
34 16
18 10
29 18
37 13
45 2
43 14
48 10
15 2
28 22
29 16
45 15
19 17
35 16
46 9
9 5
35 27
30 5
49 39
32 28
42 3
48 37
43 32
44 30
37 30
14 2
47 32
20 8
18 13
25 5
44 5
29 15
49 11
42 14
30 29
42 27
19 6
51 49
51 13
12 1
40 34
23 13
27 11
51 43
27 24
19 13
26 19
16 10
23 1
46 5
35 15
30 10
48 3
19 9
25 23
16 14
23 3
34 11
27 9
32 30
39 19
50 33
45 21
50 12
13 3
50 15
25 16
49 14
41 17
47 19
43 36
13 12
30 7
49 48
14 0
24 7
49 27
30 26
47 21
14 6
30 22
22 9
29 5
23 22
51 40
42 37
29 6
8 5
51 29
22 4
28 19
21 3
45 12
47 26
43 35
48 43
20 2
24 21
33 22
24 20
41 5
35 3
43 15
43 34
19 10
47 41
49 8
29 21
51 31
43 19
50 17
47 24
(output: 11061; takes around 10 seconds)

A few problems in your code:
the operation sorted(path) not in cycles_list has O(n) complexity, where n is the size of cycles_list
queue.pop(0) has O(n) complexity, where n is the size of the queue. You should use the collections.deque structure, not a list here.
As a general note, unless you really need to solve the question using specifically BFS (e.g. because some asked you to use this method), a simple combination of loops would do the job better. Pseudocode:
num_loops = 0
for a in nodes:
for b in neighbors(a)
if b > a:
for c in neighbors(b):
if c > b and a in neighbors(c):
num_loops += 1
The b > a and c > b checks are added to count each loop only once.

For a small number of steps like 3, you can just check for each node if you can walk away from and back to the node within 3 steps.
This works reasonably fast:
import fileinput
graph = {}
# Recursive function to find a goal in a number of steps
def count_unique_walks(start, goal, length, visited=[]):
if length == 0:
# Out of steps
return 1 if start == goal else 0
if start in visited:
# Already been here
return 0
result = 0
for neighbor in graph[start]:
if neighbor < start and neighbor != goal:
# Count only unique cycles
continue
result += count_unique_walks(neighbor, goal, length-1, visited+[start])
return result
# Read input
for line in fileinput.input():
a, b = map(int, line.split())
if a not in graph:
graph[a] = set()
graph[a].add(b)
if b not in graph:
graph[b] = set()
graph[b].add(a)
# Sum up the cycles of each node
result = 0
for node in graph:
result += count_unique_walks(node, node, 3)
print result

performing differences between rows in pandas based on columns values

I have this dataframe, I'm trying to create a new column where I want to store the difference of products sold based on code and date.
for example this is the starting dataframe:
date code sold
0 20150521 0 47
1 20150521 12 39
2 20150521 16 39
3 20150521 20 38
4 20150521 24 38
5 20150521 28 37
6 20150521 32 36
7 20150521 4 43
8 20150521 8 43
9 20150522 0 47
10 20150522 12 37
11 20150522 16 36
12 20150522 20 36
13 20150522 24 36
14 20150522 28 35
15 20150522 32 31
16 20150522 4 42
17 20150522 8 41
18 20150523 0 50
19 20150523 12 48
20 20150523 16 46
21 20150523 20 46
22 20150523 24 46
23 20150523 28 45
24 20150523 32 42
25 20150523 4 49
26 20150523 8 49
27 20150524 0 39
28 20150524 12 33
29 20150524 16 30
... ... ... ...
150 20150606 32 22
151 20150606 4 34
152 20150606 8 33
153 20150607 0 31
154 20150607 12 30
155 20150607 16 30
156 20150607 20 29
157 20150607 24 28
158 20150607 28 26
159 20150607 32 24
160 20150607 4 30
161 20150607 8 30
162 20150608 0 47
I think this could be a solution...
full_df1=full_df[full_df.date == '20150609'].reset_index(drop=True)
full_df1['code'] = full_df1['code'].astype(float)
full_df1= full_df1.sort(['code'], ascending=[False])
code date sold
8 32 20150609 33
7 28 20150609 36
6 24 20150609 37
5 20 20150609 39
4 16 20150609 42
3 12 20150609 46
2 8 20150609 49
1 4 20150609 49
0 0 20150609 50
full_df1.set_index('code')['sold'].diff().reset_index()
that gives me back this output for a single date 20150609 :
code difference
0 32 NaN
1 28 3
2 24 1
3 20 2
4 16 3
5 12 4
6 8 3
7 4 0
8 0 1
is there a better solution to have the same result in a more pythonic way?
I would like to create a new column [difference] and store the data there having as result 4 columns [date, code, sold, difference]

This exactly the kind of thing that panda's groupby functionality is built for, and I highly recommend reading and working through this documentation: panda's groupby documentation
This code replicates what you are asking for, but for every date.
df = pd.DataFrame({'date':['Mon','Mon','Mon','Tue','Tue','Tue'],'code':[10,21,30,10,21,30], 'sold':[12,13,34,10,15,20]})
df['difference'] = df.groupby('date')['sold'].diff()
df
code date sold difference
0 10 Mon 12 NaN
1 21 Mon 13 1
2 30 Mon 34 21
3 10 Tue 10 NaN
4 21 Tue 15 5
5 30 Tue 20 5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: row by row operations on multiple columns - python

Related

how to split an integer value from one column to two columns in text file using pandas or numpy (python)

Create bi-weekly and monthly labels with week numbers in pandas

How do you correctly format multiple columns of integers in python?

BFS to find all fixed-length cycles

performing differences between rows in pandas based on columns values

Categories

Resources