python: filter based on IF condition - python

I am operating with simple python condition aimed at filtering of the values > or equal to zero, and store filtered values in the list
# make a object contained all clusters
clustering = d.clusterer.clustering_dict[cut_off]
# list of ignored objects
banned_conf=[]
for clust in clustering:
clustStr = str(clustering.index(clust))
clustStr = int(clustStr) + 1
# get the value of energy for the clust
ener=clust[0].energy
# set up filter to ignore conformations with positive energies
if ener > 0:
print('Conformation in ' + str(clustStr) + ' cluster poses positive energy')
banned_conf.append(ener)
print('Nonsence: It is ignored!')
continue
elif ener == 0:
print('Conformation in ' + str(clustStr) + ' cluster poses ZERO energy')
banned_conf.append(ener)
print('Very rare case: it is ignored!')
continue
#else:
#print("Ain't no wrong conformations in " + str(clustStr) + " cluster")
How would it be possible to ignore all values > or = 0 within the same IF statement (without elif)? Which filtering would be better (with elif or in single IF statement)?

I would use the filter function:
lst = [0,1,-1,2,-2,3,-3,4,-4]
filtered = list(filter(lambda x: x >= 0, lst))
for ele in filtered:
print(f'{ele} is >= 0')
Or if you don't want to use lamda function and filter I would do:
lst = [0,1,-1,2,-2,3,-3,4,-4]
filtered = []
for ele in lst:
if ele >= 0:
filtered.append(ele)
for ele in filtered:
print(f'{ele} is >= 0')
Or you can use list comprehension:
lst = [0,1,-1,2,-2,3,-3,4,-4]
filtered = [for ele in lst if ele >= 0]
for ele in filtered:
print(f'{ele} is >= 0')

You can use >= to test both conditions at once.
for index, clust in enumerate(clustering, 1):
ener = clust[0].energy
if ener >= 0:
print(f'Conformation in {index} cluster poses zero or positive energy, it is ignored')
banned_conf.append(clust)
Your original method is better if you want to show a different message for zero and positive energy.

Related

Average length of sequence with consecutive values >100 (Python)

I am trying to identify the length of consecutive sequences within an array that are >100. I have found the longest sequence using the following code but need to alter to also find the average length.
def getLongestSeq(a, n):
maxIdx = 0
maxLen = 0
currLen = 0
currIdx = 0
for k in range(n):
if a[k] >100:
currLen +=1
# New sequence, store
# beginning index.
if currLen == 1:
currIdx = k
else:
if currLen > maxLen:
maxLen = currLen
maxIdx = currIdx
currLen = 0
if maxLen > 0:
print('Index : ',maxIdx,',Length : ',maxLen,)
else:
print("No positive sequence detected.")
# Driver code
arrQ160=resultsQ1['60s']
n=len(arrQ160)
getLongestSeq(arrQ160, n)
arrQ260=resultsQ2['60s']
n=len(arrQ260)
getLongestSeq(arrQ260, n)
arrQ360=resultsQ3['60s']
n=len(arrQ360)
getLongestSeq(arrQ360, n)
arrQ460=resultsQ4['60s']
n=len(arrQ460)
getLongestSeq(arrQ460, n)
output
Index : 12837 ,Length : 1879
Index : 6179 ,Length : 3474
Index : 1164 ,Length : 1236
Index : 2862 ,Length : 617
This should work:
def get_100_lengths( arr ) :
s = ''.join( ['0' if i < 100 else '1' for i in arr] )
parts = s.split('0')
return [len(p) for p in parts if len(p) > 0]
After that you may calculate an average or do whatever you like.
The result:
>>> get_100_lengths( [120,120,120,90,90,120,90,120,120] )
[3, 1, 2]
that might be a little tricky. You want to use one variable to keep track of sum of length, one variable to keep track of how many times a sequence occurred.
We can determine if a sequence terminated when current number<100 and previous number is greater than 100
def getLongestSeq(array):
total_length = total_ct = 0
last_is_greater = False
for number in array:
if number > 100:
total_length += 1
last_is_greater = True
elif number<100 and last_is_greater:
total_ct += 1
last_is_greater = False
return round(total_length / total_ct)
Did not test this code, please comment if there is any issue
You want to find all the sequences, take their lengths, and get the average. Each of those steps are relatively straightforward.
items = [1, 101, 1, 101, 101, 1, 101, 101, 101, 1]
Finding sequences: use groupby.
from itertools import groupby
groups = groupby(items, lambda x: x > 100) # (False, [1]), (True, [101]), ...
Find lengths (careful, iterable of iterables not a list):
lens = [len(g) for k, g in groups if k] # [1, 2, 3]
Find average (assumes at least one):
avg = float(sum(lens)) / len(lens) # 2.0

Split list according to the rule

I have a list
values_list = [1013.0, 683.0, 336.0, 406.0, 636.0, 1065.0, 1160.0]
Also I have a value
value = 660.6153846153846
This list is based on the assumption that there are 3 stages. First stage should be higher that the value, second - lower, and third is again higher.
I want to split this list into three lists, saving the order of the values like this:
values_list = [[1013.0, 683.0], [336.0, 406.0, 636.0], [1065.0, 1160.0]]
Try this one, using groupby:
from itertools import groupby
values_list = [1013.0, 683.0, 336.0, 406.0, 636.0, 1065.0, 1160.0]
value = 660.6153846153846
result = list(list(b) for a,b in groupby(values_list, lambda x: x < value ))
print (result)
Result:
[[1013.0, 683.0], [336.0, 406.0, 636.0], [1065.0, 1160.0]]
Try this one:
splits = []
splt = []
s = 0
for v in values_list:
if len(splt) > 0:
if v > value and s != 1:
splits.append(splt)
splt = []
elif v <= value and s != -1:
splits.append(splt)
splt = []
splt.append(v)
s = 2*(v > value) - 1
if len(splt) > 0:
splits.append(splt)

Python: find smallest missing positive integer in ordered list

I need to find the first missing number in a list. If there is no number missing, the next number should be the last +1.
It should first check to see if the first number is > 1, and if so then the new number should be 1.
Here is what I tried. The problem is here: if next_value - items > 1:
results in an error because at the end and in the beginning I have a None.
list = [1,2,5]
vlans3=list
for items in vlans3:
if items in vlans3:
index = vlans3.index(items)
previous_value = vlans3[index-1] if index -1 > -1 else None
next_value = vlans3[index+1] if index + 1 < len(vlans3) else None
first = vlans3[0]
last = vlans3[-1]
#print ("index: ", index)
print ("prev item:", previous_value)
print ("-cur item:", items)
print ("nxt item:", next_value)
#print ("_free: ", _free)
#print ("...")
if next_value - items > 1:
_free = previous_value + 1
print ("free: ",_free)
break
print ("**************")
print ("first item:", first)
print ("last item:", last)
print ("**************")
Another method:
L = vlans3
free = ([x + 1 for x, y in zip(L[:-1], L[1:]) if y - x > 1][0])
results in a correct number if there is a gap between the numbers, but if no space left error occurs: IndexError: list index out of range. However I need to specify somehow that if there is no free space it should give a new number (last +1). But with the below code it gives an error and I do not know why.
if free = []:
print ("no free")
else:
print ("free: ", free)
To get the smallest integer that is not a member of vlans3:
ints_list = range(min(vlans3), max(vlans3) + 1)
missing_list = [x for x in ints_list if x not in vlans3]
first_missing = min(missing_list)
However you want to return 1 if the smallest value in your list is greater than 1, and the last value + 1 if there are no missing values, so this becomes:
ints_list = [1] + list(range(min(vlan3), max(vlan3) + 2))
missing_list = [x for x in ints_list if x not in vlan3]
first_missing = min(missing_list)
First avoid using reserved word list for variable.
Second use try:except to quickly and neatly avoid this kind of issues.
def free(l):
if l == []: return 0
if l[0] > 1: return 1
if l[-1] - l[0] + 1 == len(l): return l[-1] + 1
for i in range(len(l)):
try:
if l[i+1] - l[i] > 1: break
except IndexError:
break
return l[i] + 1
How about a numpy solution? Below code works if your input is a sorted integer list with non-duplicating positive values (or is empty).
nekomatic's solution is a bit faster for small inputs, but it's just a fraction of a second, doesn't really matter. However, it does not work for large inputs - e.g. list(range(1,100000)) completely freezes on list comprehension with inclusion check. Below code does not have this issue.
import numpy as np
def first_free_id(array):
array = np.concatenate((np.array([-1, 0], dtype=np.int), np.array(array, dtype=np.int)))
where_sequence_breaks = np.where(np.diff(array) > 1)[0]
return where_sequence_breaks[0] if len(where_sequence_breaks)>0 else array[-1]+1
Prepend the array with -1 and 0 so np.diff works for empty and 1-element lists without breaking existing sequence's continuity.
Compute differences between consecutive values. Seeked discontinuity ("hole") is where the difference is bigger than 1.
If there ary any "holes" return the id of the first one, otherwise return the integer succeeding the last element.

Python Dynamic Knapsack

Right now I am attempting to code the knapsack problem in Python 3.2. I am trying to do this dynamically with a matrix. The algorithm that I am trying to use is as follows
Implements the memoryfunction method for the knapsack problem
Input: A nonnegative integer i indicating the number of the first
items being considered and a nonnegative integer j indicating the knapsack's capacity
Output: The value of an optimal feasible subset of the first i items
Note: Uses as global variables input arrays Weights[1..n], Values[1...n]
and table V[0...n, 0...W] whose entries are initialized with -1's except for
row 0 and column 0 initialized with 0's
if V[i, j] < 0
if j < Weights[i]
value <-- MFKnapsack(i - 1, j)
else
value <-- max(MFKnapsack(i -1, j),
Values[i] + MFKnapsack(i -1, j - Weights[i]))
V[i, j} <-- value
return V[i, j]
If you run the code below that I have you can see that it tries to insert the weight into the the list. Since this is using the recursion I am having a hard time spotting the problem. Also I get the error: can not add an integer with a list using the '+'. I have the matrix initialized to start with all 0's for the first row and first column everything else is initialized to -1. Any help will be much appreciated.
#Knapsack Problem
def knapsack(weight,value,capacity):
weight.insert(0,0)
value.insert(0,0)
print("Weights: ",weight)
print("Values: ",value)
capacityJ = capacity+1
## ------ initialize matrix F ---- ##
dimension = len(weight)+1
F = [[-1]*capacityJ]*dimension
#first column zeroed
for i in range(dimension):
F[i][0] = 0
#first row zeroed
F[0] = [0]*capacityJ
#-------------------------------- ##
d_index = dimension-2
print(matrixFormat(F))
return recKnap(F,weight,value,d_index,capacity)
def recKnap(matrix, weight,value,index, capacity):
print("index:",index,"capacity:",capacity)
if matrix[index][capacity] < 0:
if capacity < weight[index]:
value = recKnap(matrix,weight,value,index-1,capacity)
else:
value = max(recKnap(matrix,weight,value,index-1,capacity),
value[index] +
recKnap(matrix,weight,value,index-1,capacity-(weight[index]))
matrix[index][capacity] = value
print("matrix:",matrix)
return matrix[index][capacity]
def matrixFormat(*doubleLst):
matrix = str(list(doubleLst)[0])
length = len(matrix)-1
temp = '|'
currChar = ''
nextChar = ''
i = 0
while i < length:
if matrix[i] == ']':
temp = temp + '|\n|'
#double digit
elif matrix[i].isdigit() and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i+2
continue
#negative double digit
elif matrix[i] == '-' and matrix[i+1].isdigit() and matrix[i+2].isdigit():
temp = temp + (matrix[i]+matrix[i+1]+matrix[i+2]).center(4)
i = i + 2
continue
#negative single digit
elif matrix[i] == '-' and matrix[i+1].isdigit():
temp = temp + (matrix[i]+matrix[i+1]).center(4)
i = i + 2
continue
elif matrix[i].isdigit():
temp = temp + matrix[i].center(4)
#updates next round
currChar = matrix[i]
nextChar = matrix[i+1]
i = i + 1
return temp[:-1]
def main():
print("Knapsack Program")
#num = input("Enter the weights you have for objects you would like to have:")
#weightlst = []
#valuelst = []
## for i in range(int(num)):
## value , weight = eval(input("What is the " + str(i) + " object value, weight you wish to put in the knapsack? ex. 2,3: "))
## weightlst.append(weight)
## valuelst.append(value)
weightLst = [2,1,3,2]
valueLst = [12,10,20,15]
capacity = 5
value = knapsack(weightLst,valueLst,5)
print("\n Max Matrix")
print(matrixFormat(value))
main()
F = [[-1]*capacityJ]*dimension
does not properly initialize the matrix. [-1]*capacityJ is fine, but [...]*dimension creates dimension references to the exact same list. So modifying one list modifies them all.
Try instead
F = [[-1]*capacityJ for _ in range(dimension)]
This is a common Python pitfall. See this post for more explanation.
for the purpose of cache illustration, I generally use a default dict as follows:
from collections import defaultdict
CS = defaultdict(lambda: defaultdict(int)) #if i want to make default vals as 0
###or
CACHE_1 = defaultdict(lambda: defaultdict(lambda: int(-1))) #if i want to make default vals as -1 (or something else)
This keeps me from making the 2d arrays in python on the fly...
To see an answer to z1knapsack using this approach:
http://ideone.com/fUKZmq
def zeroes(n,m):
v=[['-' for i in range(0,n)]for j in range(0,m)]
return v
value=[0,12,10,20,15]
w=[0,2,1,3,2]
v=zeroes(6,5)
def knap(i,j):
global v
if i==0 or j==0:
v[i][j]= 0
elif j<w[i] :
v[i][j]=knap(i-1,j)
else:
v[i][j]=max(knap(i-1,j),value[i]+knap(i-1,j-w[i]))
return v[i][j]
x=knap(4,5)
print (x)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()
#now these calls are for filling all the boxes in the matrix as in the above call only few v[i][j]were called and returned
knap(4,1)
knap(4,2)
knap(4,3)
knap(4,4)
for i in range (0,len(v)):
for j in range(0,len(v[0])):
print(v[i][j],end="\t\t")
print()
print()

Python: how to find value in list smaller than target

For example I have a non-ordered list of values [10, 20, 50, 200, 100, 300, 250, 150]
I have this code which returns the next greater value:
def GetNextHighTemp(self, temp, templist):
target = int(temp)
list = []
for t in templist:
if t != "":
list.append(int(t))
return str(min((abs(target - i), i) for i in list)[1])
e.g. If temp = 55, it will return '100'.
But how can I get the lesser of the value? That is how to get it to return '50'?
Thank you.
EDIT - now working
def OnTWMatCurrentIndexChanged(self):
self.ClearTWSelectInputs()
material = self.cb_TW_mat.currentText()
temp = self.txt_design_temp.text()
if material != "":
Eref = self.GetMaterialData(material, "25", "elast")
if Eref and Eref != "":
Eref = str(float(Eref) / 1000000000)
self.txt_TW_Eref.setText(Eref)
else:
self.txt_TW_Eref.setText("194.8")
self.ShowMsg("No temperature match found for E<sub>ref</sub> in material data file. Value of 194.8 GPa will be used.", "blue")
if material != "" and temp != "":
if self.CheckTWTemp(material, temp):
dens = self.GetMaterialData(material, temp, "dens")
self.txt_TW_dens.setText(dens)
elast = self.GetMaterialData(material, temp, "elast")
elast = str(float(elast) / 1000000000)
self.txt_TW_Et.setText(elast)
stress = self.GetMaterialData(material, temp, "stress")
stress = str(float(stress) / 1000000)
self.txt_TW_stress_limit.setText(stress)
else:
self.ShowMsg("No temperature match found for " + temp + "° C in material data file. Extrapolated data will be used where possible or add new material data.", "blue")
dens = self.GetExtrapolatedMaterialData(material, temp, "dens")
self.txt_TW_dens.setText(dens)
elast = self.GetExtrapolatedMaterialData(material, temp, "elast")
elast = str(float(elast) / 1000000000)
self.txt_TW_Et.setText(elast)
stress = self.GetExtrapolatedMaterialData(material, temp, "stress")
stress = str(float(stress) / 1000000)
self.txt_TW_stress_limit.setText(stress)
else:
self.ClearTWSelectInputs()
def CheckTWTemp(self, matvar, tempvar):
for material in self.materials:
if material.attrib["name"] == matvar:
temps = material.getiterator("temp")
for temp in temps:
if int(temp.text) == int(tempvar):
return True
return False
def GetMaterialData(self, matvar, tempvar, tag):
for material in self.materials:
if material.attrib["name"] == matvar:
temps = material.getiterator("temp")
for temp in temps:
if temp.text == tempvar:
value = temp.find(tag)
return value.text
def GetExtrapolatedMaterialData(self, matvar, tempvar, tag):
try:
templist = QStringList()
for material in self.materials:
if material.attrib["name"] == matvar:
temps = material.getiterator("temp")
for temp in temps:
templist.append(temp.text)
templist.sort()
target = int(tempvar)
x1 = max(int(t) for t in templist if t != '' and int(t) < target)
x2 = min(int(t) for t in templist if t != '' and int(t) > target)
y1 = float(self.GetMaterialData(matvar, str(x1), tag))
y2 = float(self.GetMaterialData(matvar, str(x2), tag))
x = target
y = y1 - ((y1 - y2) * (x - x1) / (x2 - x1))
return str(y)
except Exception, inst:
return "0"
A better and much faster (code and cpu wise) way is to use bisect module which does binary search but for that you will need to sort the list first, here is the sample usage:
import bisect
mylist = [10, 20, 50, 200, 100, 300, 250, 150]
mylist.sort()
index = bisect.bisect(mylist, 55)
print "Greater than target", mylist[index]
print "Smaller than or equal to target", mylist[index-1]
output:
Greater than target 100
Smaller than or equal to target 50
Also you will need to check the returned index, if it is 0 it means you have passed target lower than the lowest
Edit: Ah, I used templist instead of list -- hence the confusion. I didn't mean it to be a one-line function; you still have to do the conversions. (Of course, as Mike DeSimone rightly points out, using list as a variable name is a terrible idea!! So I had a good reason for being confusing. :)
To be more explicit about it, here's a slightly streamlined version of the function (fixed to test properly for an empty list):
def GetNextHighTemp(self, temp, templist):
templist = (int(t) for t in templist if t != '')
templist = [t for t in templist if t < int(temp)]
if templist: return max(templist)
else: return None # or raise an error
Thanks to Mike for the suggestion to return None in case of an empty list -- I like that.
You could shorten this even more like so:
def GetNextHighTemp(self, temp, templist):
try: return str(max(int(t) for t in templist if t != '' and int(t) < int(temp)))
except ValueError: return None # or raise a different error
nextHighest = lambda seq,x: min([(i-x,i) for i in seq if x<=i] or [(0,None)])[1]
nextLowest = lambda seq,x: min([(x-i,i) for i in seq if x>=i] or [(0,None)])[1]
Here's how this works: Looking at nextHighest, the argument to min is a list comprehension, that calculates the differences between each value in the list and the input x, but only for those values >= x. Since you want the actual value, then we need the list elements to include both the difference to the value, and the actual value. Tuples are compared value by value, left-to-right, so the tuple for each value i in the sequence becomes (i-x,i) - the min tuple will have the actual value in the [1]'th element.
If the input x value is outside the range of values in seq (or if seq is just empty), then the list comprehension will give us an empty list, which will raise a ValueError in min. In case this happens, we add the or [(0,None)] term inside the argument to min. If the list comprehension is empty, it will evaluate to False, in which case min will instead look at the sequence containing the single tuple (0,None). In the case, the [1]'th element is None, indicating that there were no elements in seq higher than x.
Here are some test cases:
>>> t = [10, 20, 50, 200, 100, 300, 250, 150]
>>> print nextHighest(t,55)
100
>>> print nextLowest(t,55)
50
>>> print nextHighest([],55)
None
>>> print nextLowest([],55)
None
>>> print nextHighest(t,550)
None
Let the unordered list be myList:
answer = max(x for x in myList if x < temp)
If I understand you correctly, you want the greatest value that is less than your target; e.g. in your example, if your target is 55, you want 50, but if your target is 35, you want 20. The following function should do that:
def get_closest_less(lst, target):
lst.sort()
ret_val = None
previous = lst[0]
if (previous <= target):
for ndx in xrange(1, len(lst) - 1):
if lst[ndx] > target:
ret_val = previous
break
else:
previous = lst[ndx]
return str(ret_val)
If you need to step through these values, you could use a generator to get the values in succession:
def next_lesser(l, target):
for n in l:
if n < target:
yield str(n)
Both these worked properly from within a simple program.
a=[4,3,8,2,5]
temp=4
def getSmaller(temp,alist):
alist.sort()
for i in range(len(alist)):
if(i>0 and alist[i]==temp):
print alist[i-1]
elif(i==0 and alist[i]==temp):
print alist[i]
getSmaller(temp,a)

Categories

Resources