Python Combine Repeating Elements - python

I have a list of stings that have some repeating elements that I want to combine into a shorter list.
The original list contents look something like this:
lst = [['0.1', '0', 'RC', '100'],
['0.2', '10', 'RC', '100'],
['0.3', '5', 'HC', '20'],
['0.4', '5', 'HC', '20'],
['0.5', '5', 'HC', '20'],
['0.6', '5', 'HC', '20'],
['0.7', '5', 'HC', '20'],
['0.8', '5', 'HC', '20'],
['0.9', '10', 'RC', '100'],
['1.0', '0', 'RC', '100']]
After running it through the function it would become:
lst = [['0.1', '0', 'RC', '100'],
['0.2', '10', 'RC', '100'],
['0.3', '5', 'HC', '20'],
['0.9', '10', 'RC', '100'],
['1.0', '0', 'RC', '100']]
The list will always have this general structure, so essentially I want to combine the list based on whether or not the last 3 columns are exactly the same.
I want it to be a callable function so it would look some thing like:
def combine_list(lst):
if sublist[1:3] == next_sublist[1:3]:
let.remove(next_sublist)
My initial research on this showed many methods to remove a sublist based on its index, but that is not necessarily known before hand. I also found the re module, however I have never used it and unsure on how to implement it. Thank you in advanced

If you want to remove sub lists that are the same for the last three elements and consecutive, you would need itertools.groupby keyed on the last three elements:
from itertools import groupby
[next(g) for _, g in groupby(lst, key=lambda x: x[1:])]
#[['0.1', '0', 'RC', '100'],
# ['0.2', '10', 'RC', '100'],
# ['0.3', '5', 'HC', '20'],
# ['0.9', '10', 'RC', '100'],
# ['1.0', '0', 'RC', '100']]

Maybe just use a set to keep track of duplicates?
def combine_list(lst):
out = []
seen = set()
for item in lst:
if not tuple(item[1:]) in seen:
out.append(item)
seen.add(tuple(item[1:]))
return out
Lists are a mutable data structure. And so there is no guarantee that the contents of a list does not change over time. That means it cannot be used in a hashing function (which the set uses). The tuple, on the other hand, is immutable, and hence hashable.

for index in range(len(lst) - 1, 0, -1):
if lst[index][1:] == lst[index - 1][1:]:
lst.pop(index)
By going through the list backwards, we remove the problems with indices changing when we remove elements. This results in an in-place reduction.
If you'd like to make a new list, this can be done via list comprehension following the same idea, but since we're not doing it in place, we don't have to work in reverse:
lst[0] + [lst[ind] for ind in range(1, len(lst)) if lst[ind][1:] != lst[ind-1][1:]]
Again, lst[0] is trivially non-duplicate and therefore automatically included.

def combine_list(ls):
cpy = ls[:]
for i, sub in enumerate(ls[:len(ls) - 1]):
if sub[1:] == ls[i + 1][1:]:
cpy.remove(ls[i + 1])
return cpy
This function should work. It creates a new copy of the list, to avoid modifying the original. Then it iterates over the original list (except the last value), as that stays the same.
It then checks if the last values of the list are equal to the last values of the next list. If they are, the next list is deleted.
The function then returns the new list.

Related

Adding a single 'counter' to each element of an array of lists

I've got a .dat file that i I've pulled the data from and i was using the tabulate plug in to tidy it up and put it into tables. However, part of the question is to add a position column or counter. It should be simple enough to add one element at the start of each list in my array but i am having an absolute nightmare...
the code for pulling in the data from the .dat file is:
def readToDictionary():
global dicts
fi = open('CarRegistry.dat', 'r')
dicts = []
buffer = []
while True:
team = fi.readline()
if not team: break
fields = team.split(',')
buffer.append(fields)
fi.close()
dicts = buffer
print(dicts)
return dicts
Im duplicating the array deliberately as i need to do some other functions on it and want to keep the original data intact.
The raw out put is:
[['1', 'BD61 SLU', 'HONDA', 'CR-V', 'SFDR', '5', '1780', '4510', '130', '39', 'True\n'], ['2', 'CA51 MBE', 'CHEVROLET', 'CORVETTE', 'JTAV', '2', '1877', '1234', '194', '24', 'True\n'], ['3', 'PC14 RSN', 'FORD', 'F-150', 'PQBD', '5', '2121', '5890', '155', '20', 'True\n'], ['4', 'MB19 ORE', 'HONDA', 'ACCORD', 'FDAR', '5', '1849', '4933', '125', '47.3', 'False\n'], ['5', 'BD68 NAP', 'HONDA', 'ACCORD', 'FDAV', '5', '1849', '4933', '171', '37.7', 'False\n']...
what i want to get to is:
[['1', '1', 'BD61 SLU', 'HONDA', 'CR-V', 'SFDR', '5', '1780', '4510', '130', '39', 'True\n'], ['2', '2', 'CA51 MBE', 'CHEVROLET', 'CORVETTE', 'JTAV', '2', '1877', '1234', '194', '24', 'True\n'], ['3', '3', 'PC14 RSN', 'FORD', 'F-150', 'PQBD', '5', '2121', '5890', '155', '20', 'True\n'], ['4', '4', 'MB19 ORE', 'HONDA', 'ACCORD', 'FDAR', '5', '1849', '4933', '125', '47.3', 'False\n'], ['5', '5', 'BD68 NAP', 'HONDA', 'ACCORD', 'FDAV', '5', '1849', '4933', '171', '37.7', 'False\n']...
It's basically a counter at the start of each list.
Ive tried all sorts and just keep getting errors, probably because i cant understand the basics of why i cant just do this:
for i in buffer:
buffer.insert(i, i+1)
to go through each entry in the list and add a value equal to the index +1... I know its probably simple but i've been banging my head off the monitor for a good few hours now...
The key is, you don't want to manipulate buffer. You want to manipulate the individual lists within buffer:
for i,row in enumerate(buffer):
row.insert( 0, str(i+1) )

Why is the set() reversing the list?

When I run the set() function in python to convert from a list to set and to remove duplicates, the set function is reversing the order of my list.
x = ['2','6']
y = list(set(x))
print('x is',str(x),'y is',str(y))
When you build the set, each element from the list is added in the order that the list_iterator (produced by iter(x)) yields them.
A set, though, is unordered, and the insertion order is neither remembered nor used to construct the iteration order of the set itself.
As a result, list(set(x)) produces a list whose order is unrelated to the order of x in any meaningful fashion.
As others mentioned, set doesn't preserve order. A good alternative is OrderedDict, which guarantees unique keys and preserves order.
from collections import OrderedDict
x = ['1','1','2','3','4','5','5','5','6','7','8','9','9']
y = list(OrderedDict.fromkeys(x))
print("x is", x)
print("y is", y)
# x is ['1', '1', '2', '3', '4', '5', '5', '5', '6', '7', '8', '9', '9']
# y is ['1', '2', '3', '4', '5', '6', '7', '8', '9']

Python - List of unique sequences

I have a dictionary with elements as lists of certain sequence:
a = {'seq1':['5', '4', '3', '2', '1', '6', '7', '8', '9'],
'seq2':['9', '8', '7', '6', '5', '4', '3', '2', '1'],
'seq3':['5', '4', '3', '2', '1', '11', '12', '13', '14'],
'seq4':['15', '16', '17'],
'seq5':['18', '19', '20', '21', '22', '23'],
'seq6':['18', '19', '20', '24', '25', '26']}
So there are 6 sequences
What I need to do is:
To find only unique lists (if two lists contains the same elements (regardless of their order), they are not unique) - say I need to get rid of the second list (the first founded unique list will stay)
In unique lists I need to find unique subsequences of elements and print
it
Bounds of unique sequences are found by resemblance of elements order - in the 1st and the 3rd lists the bound ends exactly after element '1', so we get the subsequence ['5','4','3','2','1']
As the result I would like to see elements exactly in the same order as it was in the beginning (if it`s possible at all somehow). So I expect this:
[['5', '4', '3', '2', '1']['6', '7', '8', '9']['11', '12', '13', '14']['15', '16', '17']['18', '19', '20']['21', '22', '23']['24', '25', '26']]
Tried to do it this way:
import itertools
unique_sets = []
a = {'seq1':["5","4","3","2","1","6","7","8","9"], 'seq2':["9","8","7","6","5","4","3","2","1"], 'seq3':["5","4","3","2","1","11","12","13","14"], 'seq4':["15","16","17"], 'seq5':["18","19","20","21","22","23"], 'seq6':["18","19","20","24","25","26"]}
b = []
for seq in a.values():
b.append(seq)
for seq1, seq2 in itertools.combinations(b,2): #searching for intersections
if set(seq1).intersection(set(seq2)) not in unique_sets:
#if set(seq1).intersection(set(seq2)) == set(seq1):
#continue
unique_sets.append(set(seq1).intersection(set(seq2)))
if set(seq1).difference(set(seq2)) not in unique_sets:
unique_sets.append(set(seq1).difference(set(seq2)))
for it in unique_sets:
print(it)
I got this which is a little bit different from my expectations:
{'9', '5', '2', '3', '7', '1', '4', '8', '6'}
set()
{'5', '2', '3', '1', '4'}
{'9', '8', '6', '7'}
{'5', '2', '14', '3', '1', '11', '12', '4', '13'}
{'17', '16', '15'}
{'19', '20', '18'}
{'23', '21', '22'}
Without comment in the code above the result is even worse.
Plus I have the problem with unordered elements in the sets, which I get as the result. Tried to do this with two separate lists:
seq1 = set([1,2,3,4,5,6,7,8,9])
seq2 = set([1,2,3,4,5,10,11,12])
and it worked fine - elements didn`t ever change their position in sets. Where is my mistake?
Thanks.
Updated: Ok, now I have a little bit more complicated task, where offered alghorithm won`t work
I have this dictionary:
precond = {
'seq1': ["1","2"],
'seq2': ["3","4","2"],
'seq3': ["5","4","2"],
'seq4': ["6","7","4","2"],
'seq5': ["6","4","7","2"],
'seq6': ["6","1","8","9","10"],
'seq7': ["6","1","8","11","9","12","13","14"],
'seq8': ["6","1","8","11","4","15","13"],
'seq9': ["6","1","8","16","9","11","4","17","18","2"],
'seq10': ["6","1","8","19","9","4","16","2"],
}
I expect these sequences, containing at least 2 elements:
[1, 2],
[4, 2],
[6, 7],
[6, 4, 7, 2],
[6, 1, 8]
[9,10],
[6,1,8,11]
[9,12,13,14]
[4,15,13]
[16,9,11,4,17,18,2]
[19,9,4,16,2]
Right now I wrote this code:
precond = {
'seq1': ["1","2"],
'seq2': ["3","4","2"],
'seq3': ["5","4","2"],
'seq4': ["6","7","4","2"],
'seq5': ["6","4","7","2"],
'seq6': ["6","1","8","9","10"],
'seq7': ["6","1","8","11","9","12","13","14"],
'seq8': ["6","1","8","11","4","15","13"],
'seq9': ["6","1","8","16","9","11","4","17","18","2"],
'seq10': ["6","1","8","19","9","4","16","2"],
}
seq_list = []
result_seq = []
#d = []
for seq in precond.values():
seq_list.append(seq)
#print(b)
contseq_ind = 0
control_seq = seq_list[contseq_ind]
mainseq_ind = 1
el_ind = 0
#index2 = 0
def compar():
if control_seq[contseq_ind] != seq_list[mainseq_ind][el_ind]:
mainseq_ind += 1
compar()
else:
result_seq.append(control_seq[contseq_ind])
contseq_ind += 1
el_ind += 1
if contseq_ind > len(control_seq):
control_seq = seq_list[contseq_ind + 1]
compar()
else:
compar()
compar()
This code is not complete anyway - I created looking for the same elements from the beginning, so I still need to write a code for searching of sequence in the end of two compared elements.
Right now I have a problem with recursion. Immidiately after first recursed call I have this error:
if control_seq[contseq_ind] != b[mainseq_ind][el_ind]:
UnboundLocalError: local variable 'control_seq' referenced before assignment
How can I fix this? Or maybe you have a better idea, than using recursion? Thank you in advance.
Not sure if this is what you wanted, but it gets the same result:
from collections import OrderedDict
a = {'seq1':["5","4","3","2","1","6","7","8","9"],
'seq2':["9","8","7","6","5","4","3","2","1"],
'seq3':["5","4","3","2","1","11","12","13","14"],
'seq4':["15","16","17"],
'seq5':["18","19","20","21","22","23"],
'seq6':["18","19","20","24","25","26"]}
level = 0
counts = OrderedDict()
# go through each value in the list of values to count the number
# of times it is used and indicate which list it belongs to
for elements in a.values():
for element in elements:
if element in counts:
a,b = counts[element]
counts[element] = a,b+1
else:
counts[element] = (level,1)
level+=1
last = 0
result = []
# now break up the dictionary of unique values into lists according
# to the count of each value and the level that they existed in
for k,v in counts.items():
if v == last:
result[-1].append(k)
else:
result.append([k])
last = v
print(result)
Result:
[['5', '4', '3', '2', '1'],
['6', '7', '8', '9'],
['11', '12', '13', '14'],
['15', '16', '17'],
['18', '19', '20'],
['21', '22', '23'],
['24', '25', '26']]

Filtering out a generator

Whats the best way to filter out some subsets from a generator. For example I have a string "1023" and want to produce all possible combinations of each of the digits. All combinations would be:
['1', '0', '2', '3']
['1', '0', '23']
['1', '02', '3']
['1', '023']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I am not interested in a subset that contains a leading 0 on any of the items, so the valid ones are:
['1', '0', '2', '3']
['1', '0', '23']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I have two questions.
1) If using a generator, whats the best way to filter out the ones with leading zeroes. Currently, I generate all combinations then loop through it afterwards and only continuing if the subset is valid. For simplicity I am only printing the subset in the sample code. Assuming the generator that was created is very long or if it constains a lot of invalid subsets, its almost a waste to loop through the entire generator. Is there a way to stop the generator when it sees an invalid item (one with leading zero) then filter it off 'allCombinations'
2) If the above doesn't exist, whats a better way to generate these combinations (disregarding combinations with leading zeroes).
Code using a generator:
import itertools
def isValid(subset): ## DIGITS WITH LEADING 0 IS NOT VALID
valid = True
for num in subset:
if num[0] == '0' and len(num) > 1:
valid = False
break
return valid
def get_combinations(source, comb):
res = ""
for x, action in zip(source, comb + (0,)):
res += x
if action == 0:
yield res
res = ""
digits = "1023"
allCombinations = [list(get_combinations(digits, c)) for c in itertools.product((0, 1), repeat=len(digits) - 1)]
for subset in allCombinations: ## LOOPS THROUGH THE ENTIRE GENERATOR
if isValid(subset):
print(subset)
Filtering for an easy and obvious condition like "no leading zeros", it can be more efficiently done at the combination building level.
def generate_pieces(input_string, predicate):
if input_string:
if predicate(input_string):
yield [input_string]
for item_size in range(1, len(input_string)+1):
item = input_string[:item_size]
if not predicate(item):
continue
rest = input_string[item_size:]
for rest_piece in generate_pieces(rest, predicate):
yield [item] + rest_piece
Generating every combination of cuts, so long it's not even funny:
>>> list(generate_pieces('10002', lambda x: True))
[['10002'], ['1', '0002'], ['1', '0', '002'], ['1', '0', '0', '02'], ['1', '0', '0', '0', '2'], ['1', '0', '00', '2'], ['1', '00', '02'], ['1', '00', '0', '2'], ['1', '000', '2'], ['10', '002'], ['10', '0', '02'], ['10', '0', '0', '2'], ['10', '00', '2'], ['100', '02'], ['100', '0', '2'], ['1000', '2']]
Only those where no fragment has leading zeros:
>>> list(generate_pieces('10002', lambda x: not x.startswith('0')))
[['10002'], ['1000', '2']]
Substrings that start with a zero were never considered for the recursive step.
One common solution is to try filtering just before using yield. I have given you an example of filtering just before yield:
import itertools
def my_gen(my_string):
# Create combinations
for length in range(len(my_string)):
for my_tuple in itertools.combinations(my_string, length+1):
# This is the string you would like to output
output_string = "".join(my_tuple)
# filter here:
if output_string[0] != '0':
yield output_string
my_string = '1023'
print(list(my_gen(my_string)))
EDIT: Added in a generator comprehension alternative
import itertools
my_string = '1023'
my_gen = ("".join(my_tuple)[0] for length in range(len(my_string))
for my_tuple in itertools.combinations(my_string, length+1)
if "".join(my_tuple)[0] != '0')

Iterating between 2 lists

I have 2 lists: 1 of which is a lists of lists. They are as follows-
lists_of_lists = ['1', '4', '7', '13', '16', '21', '32', '36'],['3', '6', '8', '14', '22', '26', '31', '40']
just_a_list =['THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG', 'IWOULDLOVETOGETOVERWITHTHISASSOONASPOSSIBLE']
The lists_of_lists are used for slicing the elements of just_a_list such that:
['1', '4', '7', '13', '16', '21', '32', '36'] would slice the string 'THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG' as follows
'1' - '4' - 'HEQU'
'7' - '13' - 'KBROWNF'
'16' - '21' - 'JUMPED'
'32' - '36' - 'ZYDOG'
points to note-
Each list in list_of_lists will have an even number of numbers.
The list at i'th position in list_of_lists will belong to the
string present at the i'th position in just_a_list.
Please help me out as to how do I carry out the process described above..
Thanks
Use zip() to combine the string and slice lists, then use a zip() plus iter() trick to pair the start and stop values:
for slicelist, text in zip(lists_of_lists, just_a_list):
for start, stop in zip(*([iter(slicelist)]*2)):
print(text[int(start):int(stop) + 1])
Note that we have to add 1 to the stop index, as your appear to need it to be inclusive, while in Python the stop index is exclusive.
This gives:
>>> for slicelist, text in zip(lists_of_lists, just_a_list):
... for start, stop in zip(*([iter(slicelist)]*2)):
... print(text[int(start):int(stop) + 1])
...
HEQU
KBROWNF
JUMPED
YDOG
ULDL
VETOGET
HTHIS
ONASPOSSIB
If I understand you right:
>>> ls = just_a_list =['THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG', 'IWOULDLOVETOGETOVERWITHTHISASSOONASPOSSIBLE']
>>> ls[0]
'THEQUICKBROWNFOXJUMPEDOVERTHELAZYDOG'
# so we do
# your index was off by one
>>> ls[0][1:5]
'HEQU'
>>> ls[0][7:14]
'KBROWNF'

Categories

Resources