Trouble in swapping and assignment min and max elements - python

IMPORTANT UPD AT THE END!
The existing code works not for all cases.
def myfunc(x):
a = [int(i) for i in x.split()]
a[a.index(min(a))], a[a.index(max(a))] = a[a.index(max(a))], a[a.index(min(a))]
a = [str(i) for i in a]
return ' '.join(a)
myfunc()
It works for 3 4 5 2 1 and don't work for 1 5 4 3 2.
Why?
!!!UPD: I made some changes and it looks very strange.
I used two different lines separately (with commented one of them). The program gives different results in some cases. BUT THE MOST INTERESTING, when I used two of them, uncommented - the program don't return the income string?
# a[a.index(min(a))], a[a.index(max(a))] = a[a.index(max(a))], a[a.index(min(a))]
a[a.index(max(a))], a[a.index(min(a))] = a[a.index(min(a))], a[a.index(max(a))]
Cases which I use:
#print(myfunc("5 1 4 3 2"))
#print(myfunc("1 5 4 3 2"))
#print(myfunc("3 4 5 2 1"))
#print(myfunc("-30000 30000"))
#print(myfunc("2147483647 -2147483648"))
#print(myfunc("1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 17 16 15 14"))
#print(myfunc("1 2 3 4 5 6 7 8 9 10"))
#print(myfunc("1 9 8 7 6 5 4 3 2 10"))
UPD+=1 Guys I changed code to:
minind = a.index(min(a))
maxind = a.index(max(a))
a[minind], a[maxind] = a[maxind], a[minind]
Now it works for all case. But question about previous cases are still open
Please help. I spend about 2 hours in tries to find some explanation of this...
Please help

The reason it doesn't work is because the assignments are being executed sequentially. When you write:
a[a.index(min(a))], a[a.index(max(a))] = a[a.index(max(a))], a[a.index(min(a))]
it's essentially equivalent to:
tempmax, tempmin = a[a.index(max(a))], a[a.index(min(a))]
a[a.index(min(a))] = tempmax
a[a.index(max(a))] = tempmin
But notice that after doing the tempmax assignment, a.index(max(a)) can change. index() returns the earliest index, so if the minimum element was before the maximum element, this will now return the original minimum element's location (because it now contains the maximum element), and assigns tempmin back to it.
Your code assumes that the indexes to be assigned are computed before any of the assignments are done, but that's not how it actually works.

Your code doesn't work if the minimum is located before the maximum.
For example:
s = "1 5 4 3 2" # this doesn't work
myfunc(s)
>>> '1 5 4 3 2'
s = "5 1 4 3 2" # this works
myfunc(s)
>>> '1 5 4 3 2'
But, as you noticed, if you define indices before swapping, everything works fine.
def myfunc(x):
a = [int(i) for i in x.split()]
mn = a.index(min(a))
mx = a.index(max(a))
a[mn], a[mx] = a[mx], a[mn]
a = [str(i) for i in a]
return ' '.join(a)
s = "1 5 4 3 2"
myfunc(s)
>>> '5 1 4 3 2'
I'm waiting for some illuminati mind to have an answer for this.

Related

Is there a way to reference a previous value in Pandas column efficiently?

I want to do some complex calculations in pandas while referencing previous values (basically I'm calculating row by row). However the loops take forever and I wanted to know if there was a faster way. Everybody keeps mentioning using shift but I don't understand how that would even work.
df = pd.DataFrame(index=range(500)
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
for i in range(len(df):
if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
numpy_ext can be used for expanding calculations
pandas-rolling-apply-using-multiple-columns for reference
I have also included a simpler calc to demonstrate behaviour in simpler way
df = pd.DataFrame(index=range(5000))
df["A"]= 2
df["B"]= 5
df["A"][0]= 1
import numpy_ext as npe
# for i in range(len(df):
# if i != 0: df['A'][i] = (df['A'][i-1] / 3) - df['B'][i-1] + 25
# SO example - function of previous values in A and B
def f(A,B):
r = np.sum(A[:-1]/3) - np.sum(B[:-1] + 25) if len(A)>1 else A[0]
return r
# much simpler example, sum of previous values
def g(A):
return np.sum(A[:-1])
df["AB_combo"] = npe.expanding_apply(f, 1, df["A"].values, df["B"].values)
df["A_running"] = npe.expanding_apply(g, 1, df["A"].values)
print(df.head(10).to_markdown())
sample output
A
B
AB_combo
A_running
0
1
5
1
0
1
2
5
-29.6667
1
2
2
5
-59
3
3
2
5
-88.3333
5
4
2
5
-117.667
7
5
2
5
-147
9
6
2
5
-176.333
11
7
2
5
-205.667
13
8
2
5
-235
15
9
2
5
-264.333
17

Index and save last N points from a list that meets conditions from dataframe Python

I have a DataFrame that contains gas concentrations and the corresponding valve number. This data was taken continuously where we switched the valves back and forth (valves=1 or 2) for a certain amount of time to get 10 cycles for each valve value (20 cycles total). A snippet of the data looks like this (I have 2,000+ points and each valve stayed on for about 90 seconds each cycle):
gas1 valveW time
246.9438 2 1
247.5367 2 2
246.7167 2 3
246.6770 2 4
245.9197 1 5
245.9518 1 6
246.9207 1 7
246.1517 1 8
246.9015 1 9
246.3712 2 10
247.0826 2 11
... ... ...
My goal is to save the last N points of each valve's cycle. For example, the first cycle where valve=1, I want to index and save the last N points from the end before the valve switches to 2. I would then save the last N points and average them to find one value to represent that first cycle. Then I want to repeat this step for the second cycle when valve=1 again.
I am currently converting from Matlab to Python so here is the Matlab code that I am trying to translate:
% NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
ind_noaaHigh_end = zeros(1,length(t_c));
numPoints = 40;
for i = 1:length(valveW_c)-1
if (valveW_c(i) == 1 && valveW_c(i+1) ~= 1)
test = (i-numPoints):i;
ind_noaaHigh_end(test) = 1;
n2o_noaaHigh = [n2o_noaaHigh mean(n2o_c(test))];
co2_noaaHigh = [co2_noaaHigh mean(co2_c(test))];
co_noaaHigh = [co_noaaHigh mean(co_c(test))];
h2o_noaaHigh = [h2o_noaaHigh mean(h2o_c(test))];
end
end
ind_noaaHigh_end = logical(ind_noaaHigh_end);
This is what I have so far for Python:
# NOAA high
n2o_noaaHigh = [];
co2_noaaHigh = [];
co_noaaHigh = [];
h2o_noaaHigh = [];
t_c_High = []; # time
for i in range(len(valveW_c)):
# NOAA HIGH
if (valveW_c[i] == 1):
t_c_High.append(t_c[i])
n2o_noaaHigh.append(n2o_c[i])
co2_noaaHigh.append(co2_c[i])
co_noaaHigh.append(co_c[i])
h2o_noaaHigh.append(h2o_c[i])
Thanks in advance!
I'm not sure if I understood correctly, but I guess this is what you are looking for:
# First we create a column to show cycles:
df['cycle'] = (df.valveW.diff() != 0).cumsum()
print(df)
gas1 valveW time cycle
0 246.9438 2 1 1
1 247.5367 2 2 1
2 246.7167 2 3 1
3 246.677 2 4 1
4 245.9197 1 5 2
5 245.9518 1 6 2
6 246.9207 1 7 2
7 246.1517 1 8 2
8 246.9015 1 9 2
9 246.3712 2 10 3
10 247.0826 2 11 3
Now you can use groupby method to get the average for the last n points of each cycle:
n = 3 #we assume this is n
df.groupby('cycle').apply(lambda x: x.iloc[-n:, 0].mean())
Output:
cycle 0
1 246.9768
2 246.6579
3 246.7269
Let's call your DataFrame df; then you could do:
results = {}
for k, v in df.groupby((df['valveW'].shift() != df['valveW']).cumsum()):
results[k] = v
print(f'[group {k}]')
print(v)
Shift(), as it suggests, shifts the column of the valve cycle allows to detect changes in number sequences. Then, cumsum() helps to give a unique number to each of the group with the same number sequence. Then we can do a groupby() on this column (which was not possible before because groups were either of ones or twos!).
which gives e.g. for your code snippet (saved in results):
[group 1]
gas1 valveW time
0 246.9438 2 1
1 247.5367 2 2
2 246.7167 2 3
3 246.6770 2 4
[group 2]
gas1 valveW time
4 245.9197 1 5
5 245.9518 1 6
6 246.9207 1 7
7 246.1517 1 8
8 246.9015 1 9
[group 3]
gas1 valveW time
9 246.3712 2 10
10 247.0826 2 11
Then to get the mean for each cycle; you could e.g. do:
df.groupby((df['valveW'].shift() != df['valveW']).cumsum()).mean()
which gives (again for your code snippet):
gas1 valveW time
valveW
1 246.96855 2.0 2.5
2 246.36908 1.0 7.0
3 246.72690 2.0 10.5
where you wouldn't care much about the time mean but the gas1 one!
Then, based on results you could e.g. do:
n = 3
mean_n_last = []
for k, v in results.items():
if len(v) < n:
mean_n_last.append(np.nan)
else:
mean_n_last.append(np.nanmean(v.iloc[len(v) - n:, 0]))
which gives [246.9768, 246.65796666666665, nan] for n = 3 !
If your dataframe is sorted by time you could get the last N records for each valve like this.
N=2
valve1 = df[df['valveW']==1].iloc[-N:,:]
valve2 = df[df['valveW']==2].iloc[-N:,:]
If it isn't currently sorted you could easily sort it like this.
df.sort_values(by=['time'])

Python - gap on exit sequence,error on comparing decimal

I enter with numbers out of sequence, subtract the repeated ones, maintaining the order, subtract[']"of list on exit (is there any more pythonic way for this?). I do not know where I'm going wrong, it fails to compare units of the dozen and a space appears on the output.
a = 1 1 4 4 4 8 8 2 14 14 11 11
expected exit b = 1 4 8 2 14 11
wrong output
b = 1 4 8 2
def repeated(s):
t = []
[t.append(item) for item in s if not t.count(item)]
return t
def remove(s,to_remove):
for x in to_remove:
s = s.replace(x, '')
return s
def main():
a = input('a = ')
print('b = ', (remove(str(repeated(a)), "['],")))
main()
exit()
more better use sets
a = set(input().split())
print(' '.join(a))
sets can be contains only unique values, and you don't need to remove values. All values will be unique by default.

What's the cleanest way to print an equally-spaced list in python?

Please close if this is a duplicate, but this answer does not answer my question as I would like to print a list, not elements from a list.
For example, the below does not work:
mylist = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
print(%3s % mylist)
Desired output:
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15]
Basically, if all items in the list are n digits or less, equal spacing would give each item n+1 spots in the printout. Like setw in c++. Assume n is known.
If I have missed a similar SO question, feel free to vote to close.
You can exploit formatting as in the example below. If you really need the square braces then you will have to fiddle a bit
lst = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
frmt = "{:>3}"*len(lst)
print(frmt.format(*lst))
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
items=range(10)
''.join(f'{x:3}' for x in items)
' 0 1 2 3 4 5 6 7 8 9'
If none of the other answers work, try this code:
output = ''
space = ''
output += str(list[0])
for spacecount in range(spacing):
space += spacecharacter
for listnum in range(1, len(list)):
output += space
output += str(list[listnum])
print(output)
I think this is the best yet, as it allows you to manipulate list as you wish. even numerically.
mylist = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
print(*map(lambda x: str(x)+" ",a))

Finding contiguous, non-unique slices in Pandas series without iterating

I'm trying to parse a logfile of our manufacturing process. Most of the time the process is run automatically but occasionally, the engineer needs to switch into manual mode to make some changes and then switches back to automatic control by the reactor software. When set to manual mode the logfile records the step as being "MAN.OP." instead of a number. Below is a representative example.
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
ser_orig = pd.Series(steps)
which results in
0 1
1 2
2 2
3 MAN.OP.
4 MAN.OP.
5 2
6 2
7 3
8 3
9 MAN.OP.
10 MAN.OP.
11 4
12 4
dtype: object
I need to detect the 'MAN.OP.' and make them distinct from each other. In this example, the two regions with values == 2 should be one region after detecting the manual mode section like this:
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
I have code that iterates over this series and produces the correct result when the series is passed to my object. The setter is:
#step_series.setter
def step_series(self, ss):
"""
On assignment, give the manual mode steps a unique name. Leave
the steps done on recipe the same.
"""
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
counter = 0
continuous = False
for i in ss.index:
if continuous and ss.at[i] != manual_mode:
continuous = False
counter += 1
elif not continuous and ss.at[i] == manual_mode:
continuous = True
ss.at[i] = new_manual_mode_text.format(str(counter))
elif continuous and ss.at[i] == manual_mode:
ss.at[i] = new_manual_mode_text.format(str(counter))
self._step_series = ss
but this iterates over the entire dataframe and is the slowest part of my code other than reading the logfile over the network.
How can I detect these non-unique sections and rename them uniquely without iterating over the entire series? The series is a column selection from a larger dataframe so adding extra columns is fine if needed.
For the completed answer I ended up with:
#step_series.setter
def step_series(self, ss):
pd.options.mode.chained_assignment = None
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
newManOp = (ss=='MAN.OP.') & (ss != ss.shift())
ss[ss == 'MAN.OP.'] = 'Manual_Mode_' + (newManOp.cumsum()-1).astype(str)
self._step_series = ss
Here's one way:
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
steps = pd.Series(steps)
newManOp = (steps=='MAN.OP.') & (steps != steps.shift())
steps[steps=='MAN.OP.'] += seq.cumsum().astype(str)
>>> steps
0 1
1 2
2 2
3 MAN.OP.1
4 MAN.OP.1
5 2
6 2
7 3
8 3
9 MAN.OP.2
10 MAN.OP.2
11 4
12 4
dtype: object
To get the exact format you listed (starting from zero instead of one, and changing from "MAN.OP." to "Manual_mode_"), just tweak the last line:
steps[steps=='MAN.OP.'] = 'Manual_Mode_' + (seq.cumsum()-1).astype(str)
>>> steps
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
There a pandas enhancement request for contiguous groupby, which would make this type of task simpler.
There is s function in matplotlib that takes a boolean array and returns a list of (start, end) pairs. Each pair represents a contiguous region where the input is True.
import matplotlib.mlab as mlab
regions = mlab.contiguous_regions(ser_orig == manual_mode)
for i, (start, end) in enumerate(regions):
ser_orig[start:end] = new_manual_mode_text.format(i)
ser_orig
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object

Categories

Resources