Deleting matching items from a list - python

I have a list of values. I would like to remove values which cancel to each other (+ and -). The values are randomly in the list, so I first added a new column in excel with the absolute values. I then sorted on the absolute values so the amounts which needs to be cancelled are below each other.
I was thinking to create a for loop and sum up the ifrst row with the second row, and when this sums to zero, delete both rows and start from the top again. Please refer to the picture of an example. I have marked yellow the items which should be deleted. As I only want to delete matching items, the total sum of the amount column should not change after the operation.
Currently I have something like this
for i in df["Amount in Entity Currency"]:
if df["Amount in Entity Currency"][i] + df["Amount in Entity Currency"][i+1] == 0:
df.drop(df[df["Amount in Entity Currency"][i]])
df.drop(df[df["Amount in Entity Currency"][i + 1]])

try sth like this after you have sorted the list (as you already said):
for i,elem in enumerate(yourList):
nextElem = yourList[i+1]
if (elem + nextElem < 0.00000001):
yourList.remove(elem)
yourList.remove(nextElem)

I would base an answer off of several building blocks. First is creating two "iterable" sub-lists. One for positive numbers and the other negative numbers.
Then I would iterate over both of them using next() and as long as one of the two lists had values I would act on the current values as appropriate.
import random
full_data = [random.randint(0, 10) for _ in range(20)] + [-random.randint(0, 10) for _ in range(20)]
zeros = [i for i in full_data if i == 0]
positives = iter(sorted([i for i in full_data if i > 0]))
negatives = iter(sorted([i for i in full_data if i < 0], reverse=True))
## ------------------------------
## prime the list with zero or 1 0s where an odd number of 0s
## results in [0] as the evens cancel each other out.
## ------------------------------
result = [0] * (len(zeros) % 2)
## ------------------------------
pos = next(positives, None)
neg = next(negatives, None)
while pos is not None or neg is not None:
if pos is None:
# we ran out of positive numbers so add any remaining negatives
result.extend([neg] + list(negatives))
break
if neg is None:
# we ran out of negative numbers so add any remaining positives
result.extend([pos] + list(positives))
break
if pos == -neg:
# these results cancel each other
pos = next(positives, None)
neg = next(negatives, None)
elif pos > -neg:
# this positive is "larger" then this negative so add the negative
result.append(neg)
neg = next(negatives, None)
else:
# this positive is "smaller" than this negative so add the positive
result.append(pos)
pos = next(positives, None)
print(f"The original list has {len(full_data)} items and sums to: {sum(full_data)}")
print(f"The filtered list has {len(result)} items and sums to: {sum(result)}")

Related

(Traversal) How to solve this question in python?

Write a function traverse() that takes in a list tb of n strings each containing n lower case characters
(a-z).
tb represents a square table with n rows and n columns. The function returns a string st generated by the procedure below that traverses the grid starting from the top left cell and ending at the bottom right cell.
At every step, the procedure moves either horizontally to the right or vertically down, depending on which of the two cells has a \smaller" letter (i.e., a letter that appears earlier in the alphabetical order).
The letter in the visited cell is then added to st. In case of ties, either direction can be taken.
When the right or bottom edges of the table are reached, there is obviously only a unique next cell to move to. As an example, traverse(["veus", "oxde", "oxlx", "hwuj"]) returns "veudexj"
so the table would look like this:
v o o h
e x x w
u d l u
s e x j
I am new in python and I wrote this code ... but it only prints "veuexj" I would say the problem is in this line if new_list[a - 1][b - 1] == new_list[a - 1][-2]: which force the parameter to skip the 'd' character. #And I don't know how to solve it.
def traverse(tb_list):
new_list = tb_list.copy()
st = new_list[0][0]
parameter = ''
handler = 1
for a in range(1, len(new_list)):
for b in range(handler, len(new_list[a])):
if new_list[a - 1][b - 1] == new_list[a - 1][-2]:
parameter = new_list[a][b]
elif new_list[a - 1][b - 1] > min(new_list[a - 1][b], new_list[a][b - 1]):
parameter = min(new_list[a - 1][b], new_list[a][b - 1])
elif new_list[a - 1][b - 1] < min(new_list[a - 1][b], new_list[a][b - 1]):
parameter = min(new_list[a - 1][b], new_list[a][b - 1])
st += parameter
handler = b
return st
print(traverse(["veus", "oxde", "oxlx", "hwuj"]))
You can try something like this (explanation added as comments):
def traverse(tb_list):
lst = tb_list.copy() #Takes a copy of tb_list
lst = list(zip(*[list(elem) for elem in lst])) #Transposes the list
final = lst[0][0] #Sets final as the first element of the list
index = [0,0] #Sets index to 0,0
while True:
x = index[0] #The x coordinate is the first element of the list
y = index[1] #The y coordinate is the second element of the list
if x == len(lst) - 1: #Checks if the program has reached the right most point of the table
if y == len(list(zip(*lst))) - 1: #Checks if program has reached the bottommost point of the table
return final #If both the conditions are True, it returns final (the final string)
else:
final += lst[x][y+1] #If the program has reached the right most corner, but not the bottommost, then the program moves one step down
index = [x, y+1] #Sets the index to the new coordinates
elif y == len(list(zip(*lst))) - 1: #Does the same thing in the previous if condition button in an opposite way (checks if program has reached bottommost corner first, rightmost corner next)
if x == len(lst) - 1:
return final
else:
final += lst[x + 1][y] #If the program has reached the bottommost corner, but not the rightmost, then the program moves one step right
index = [x + 1, y]
else: #If both conditions are false (rightmost and bottommost)
if lst[x+1][y] < lst[x][y+1]: #Checks if right value is lesser than the bottom value
final += lst[x+1][y] #If True, then it moves one step right and adds the alphabet at that index to final
index = [x+1,y] #Sets the index to the new coords
else: #If the previous if condition is False (bottom val > right val)
final += lst[x][y+1] #Moves one step down and adds the alphabet at that index to final
index = [x,y+1] #Sets the index to the new coords
lst = ["veus", "oxde", "oxlx", "hwuj"]
print(traverse(lst))
Output:
veudexj
I have added the explanation as comments, so take your time to go through it. If you are still not clear with any part of the code, feel free to ask me. Any suggestions to optimize/shorten my code are most welcome.

Find 3 consecutive missing integers from an array

below is the code to find out the missing numbers, but need to select first 3 consecutive numbers from missings
array=[0,1,4,5,9,10]
start=0
end=15
missings=[]
for i in range(start,end):
if i not in array:
missings.append(i)
output: [6,7,8]
Here you go:
array=[0,1,4,5,9,10]
start=0
end=15
missings=[]
for i in range(start,end-1):
if i not in array:
if i+1 not in array:
if i+2 not in array:
missings.append(i)
missings.append(i+1)
missings.append(i+2)
break
Sort the list in ascending order, then compare the values in the array with their neighbor to determine if there is a gap >3.
def find_missing(arr):
sorted_list = sorted(arr)
# Set our lowest value for comparing
curr_value = sorted_list[0]
for i in range(1,len(sorted_list)):
# Compare the previous value to the next value to determine if there is a difference of atleast 4 (6-3 = 3 but we are only missing numbers 4 and 5)
if (sorted_list[i] - curr_value) > 3:
# Return on the first 3 consecutive missing numbers
return [curr_value+1, curr_value+2, curr_value+3]
curr_value = sorted_list[i]
# Return an empty array if there is not 3 consecutive missing numbers
return []
This function works based on the length of the array and the largest number. If there is a need for a specified end value in case all elements in the array do not have a gap of three except for the largest element and the end value, it can be passed as a parameter with some minor modifications.
def find_missing(arr, start_val=0, end_val=0):
# Sorting a list alters the source, so make a copy to not alter the original list
sorted_list = sorted(arr)
curr_value = sorted_list[0]
last_value = sorted_list[-1]
# Make sure start val is the lowest number, otherwise use lowest number
if start_val < curr_value and (curr_value - start_val) > 3:
return [start_val, start_val+1, start_val+2]
for i in range(1,len(sorted_list)):
# Compare the previous value to the next value to determine if there is a difference of atleast 4 (6-3 = 3 but we are only missing numbers 4 and 5)
if (sorted_list[i] - curr_value) > 3:
# Return on the first 3 consecutive missing numbers
return [curr_value+1, curr_value+2, curr_value+3]
curr_value = sorted_list[i]
# If there is an end_value set that has a gap between the largest number and is larger than the last value
if end_val > last_value and (end_val - last_value) > 3:
return [last_value+1, last_value+2, last_value+3]
else:
# Return an empty array if there is not 3 consecutive missing numbers
return []

indexError: list indexing error and wrongful tracking of counters

The goal of the program is to define a procedure that takes in a string of numbers from 1-9 and outputs a list with the following parameters:
Every number in the string should be inserted into the list.
If a number x in the string is less than or equal to the preceding number y, the number x should be inserted into a sublist. Continue adding the following numbers to the sublist until reaching a number z that is greater than the number y.
Then add this number z to the normal list and continue.
#testcases
string = '543987'
numbers_in_lists(string)
result = [5,[4,3],9,[8,7]]
def numbers_in_lists(string):
# Convert the sequence of strings into an array of numbers
i = 0
conv_str_e = []
while i < len(string):
conv_str_e.append(int(string[i]))
i += 1
#official code starts here
normal_list = []
list_of_small_nums = [[]]
# This will help me keep track of elements of the normal_list.
previous_e_pos = -1
e_pos = 0
# this one will be used to decide if the element should go into the
#normal_list or list_of_small_nums
check_point = 0
for e in conv_str_e:
#The first element and every elements bigger the element with
#check_point as it's index
#will be put into the normal_list as long as the list inside
#list_of_small_nums is empty
if e_pos == 0 or e > conv_str_e[check_point]:
#If the list inside list_of_small_nums is not empty
if list_of_small_nums[0] != []:
#concatenate the normal_list and list_of_small_nums
normal_list = normal_list + list_of_small_nums[0]
#Clear the list inside list_of_small_nums
list_of_small_nums[0] = []
#Add the element in the normal_list
normal_list.append(e)
# Update my trackers
e_pos += 1
previous_e_pos += 1
# (not sure this might be the error)
check_point = e_pos
#The curent element is not bigger then the element with the
#check_point as index position
#Therefor it goes into the sublist.
list_of_small_nums[0].append(e)
e_pos += 1
previous_e_pos += 1
return list
What you were doing wrong was exactly what you pointed out in your comments. You just kept increasing e_pos and so check_point eventually was greater than the length of the list.
I took the liberty of changing some things to simplify the process. Simple programs make it easier to figure out what is going wrong with them. Make sure you always try to think about the simplest way first to solve your problem! Here, I replaced the need for e_pos and previous_e_pos by using enumerate:
string = '543987'
# Convert the sequence of strings into an array of numbers
conv_str_e = [int(i) for i in string]
#official code starts here
normal_list = []
list_of_small_nums = []
# this one will be used to decide if the element should go into the
#normal_list or list_of_small_nums
check_point = 0
for ind, e in enumerate(conv_str_e):
#The first element and every elements bigger the element with
#check_point as it's index
#will be put into the normal_list as long as the list inside
#list_of_small_nums is empty
if ind == 0 or e > conv_str_e[check_point]:
#If the list inside list_of_small_nums is not empty
if list_of_small_nums != []:
#concatenate the normal_list and list_of_small_nums
normal_list.append(list_of_small_nums)
#Clear the list inside list_of_small_nums
list_of_small_nums = []
#Add the element in the normal_list
normal_list.append(e)
# Update my trackers
check_point = ind
else:
#The curent element is not bigger then the element with the
#check_point as index position
#Therefore it goes into the sublist.
list_of_small_nums.append(e)
# If there is anything left, add it to the list
if list_of_small_nums != []:
normal_list.append(list_of_small_nums)
print(normal_list)
Result:
[5, [4, 3], 9, [8, 7]]
I am sure you can change it appropriately from here to put it back in your function.

Python - Storing index number of an array with comparison of the value with an integer

I'm trying to figure out how to store the index of a value or both. Steps:
Scan array (one by one)
compare each value is greater than a certain number
If it is greater, save the index number and value
Next, check next number, is it greater than certain number? move to next
if the number is less than certain number, break
y1 = [] % Is a 128 length array
flagcheck = 0 %check for when found a number greater than needed
startofindex = [] %Place to save where the 1st number was found
endofindex = [] %Place where end
for g in y1:
if (y1.count(g) > 0.001):
startofindex.append(enumerate(y1.count(g))) % I know this won't work but I can't seem to find the best sol'n
flagcheck = 1;
while (flagcheck == 1):
for g in y1:
if (y1.count(g) <= 0.001):
endofindex.append(enumerate(y1.count(g)))
break
I want to use index to format a graph's axis for particular points.

How to find the position of first instance of duplicates in two equal length, sorted lists

I have two random lists of same length, in range of 0 to 99.
lista = [12,34,45,56,66,80,89,90]
listb = [13,30,56,59,72,77,80,85]
I need to find the first instance of a duplicate number, and in what list it is from.
In this example, I need to find the number '56' in listb, and get the index i = 2
Thanks.
Update:
After running it a couple of times, I got this error:
if list_a[i] == list_b[j]:
IndexError: list index out of range
like #Asterisk suggested, my two lists are equal length and sorted, both i and j are set to 0 at the beginning.
that bit is part of a genetic crossover code:
def crossover(r1,r2):
i=random.randint(1,len(domain)-1) # Indices not at edges of domain
if set(r1) & set(r2) == set([]): # If lists are different, splice at random
return r1[0:i]+r2[i:]
else: # Lists have duplicates
# Duplicates At Edges
if r1[0] == r2[0]: # If [0] is double, retain r1
return r1[:1]+r2[1:]
if r1[-1] == r2[-1]: # If [-1] is double, retain r2
return r1[:-1]+r2[-1:]
# Duplicates In Middle
else: # Splice at first duplicate point
i1, i2 = 0, 0
index = ()
while i1 < len(r1):
if r1[i1] == r2[i2]:
if i1 < i2:
index = (i1, r1, r2)
else:
index = (i2, r2, r1)
break
elif r1[i1] < r2[i2]:
i1 += 1
else:
i2 += 1
# Return A Splice In Relation To What List It Appeared First
# Eliminates Further Duplicates In Original Lists
return index[2][:index[0]+1]+index[1][index[0]+1:]
The function takes 2 lists and returns one.
domain is a list of 10 tupples: (0,99).
As I said, the error doesn't happen every time, only once in a while.
I appreciate your help.
I'm not a python guy, but this is an algorithm question...
You maintain an index into each list and you look at the elements at those two list positions.
Whichever list has the smallest element at the current position, you move to the next element in that list.
When you find an element that is the same as the other list's current element, that is your smallest duplicate.
If you reach the end of either list, there are no duplicates.
If you're looking for all the duplicates, you can use something like this:
list_a = [12,34,45,56,66,80,89,90]
list_b = [13,30,56,59,72,77,80,85]
set_a = set(list_a)
set_b = set(list_b)
duplicates = set_a.intersection(set_b)
# or just this:
# duplicates = [n for n in list_a if n in list_b]
for duplicate in duplicates:
print list_a.index(duplicate)
To get the smallest index of a duplicate in either list:
a_min = min(map(list_a.index, duplicates))
b_min = min(map(list_b.index, duplicates))
if a_min < b_min:
print 'a', a_min, list_a[a_min]
else:
print 'b', b_min, list_b[b_min]
If not, this should work a bit better:
duplicate = None
for n in set_a:
if n in set_b:
duplicate = n
break
if duplicate is not None:
print list_a.index(duplicate)
lista = [12,34,45,56,66,80,89,90]
listb = [13,30,56,59,72,77,80,85]
i, j = 0, 0
while i < len(lista):
if lista[i] == listb[j]:
if i < j:
print i, lista
else:
print j, listb
break
elif lista[i] < listb[j]:
i += 1
else:
j += 1
>>>
2 [13, 30, 56, 59, 72, 77, 80, 85]
Assumptions: both lists have the same length, and they are sorted
Just scan all the lists at position 0, then 1, then 2, ... Keep track of what you've seen (you can query a set in O(1) time).
def firstDuplicate(*lists):
seen = {}
for i,tup in enumerate(zip(*lists)):
for listNum,value in enumerate(tup):
position = (listNum,i)
if value in seen:
return value, [seen[value], position]
else:
seen[value] = position
Demo:
>>> value,positions = firstDuplicate(lista,listb)
>>> value
56
>>> positions
[(1, 2), (0, 3)]
(Does not generalize to N lists... yet. Would need a minor tweak to use a defaultdict(set), insert all indices as a tuple together, then check for duplicates.)

Categories

Resources