I've got a .dat file that i I've pulled the data from and i was using the tabulate plug in to tidy it up and put it into tables. However, part of the question is to add a position column or counter. It should be simple enough to add one element at the start of each list in my array but i am having an absolute nightmare...
the code for pulling in the data from the .dat file is:
def readToDictionary():
global dicts
fi = open('CarRegistry.dat', 'r')
dicts = []
buffer = []
while True:
team = fi.readline()
if not team: break
fields = team.split(',')
buffer.append(fields)
fi.close()
dicts = buffer
print(dicts)
return dicts
Im duplicating the array deliberately as i need to do some other functions on it and want to keep the original data intact.
The raw out put is:
[['1', 'BD61 SLU', 'HONDA', 'CR-V', 'SFDR', '5', '1780', '4510', '130', '39', 'True\n'], ['2', 'CA51 MBE', 'CHEVROLET', 'CORVETTE', 'JTAV', '2', '1877', '1234', '194', '24', 'True\n'], ['3', 'PC14 RSN', 'FORD', 'F-150', 'PQBD', '5', '2121', '5890', '155', '20', 'True\n'], ['4', 'MB19 ORE', 'HONDA', 'ACCORD', 'FDAR', '5', '1849', '4933', '125', '47.3', 'False\n'], ['5', 'BD68 NAP', 'HONDA', 'ACCORD', 'FDAV', '5', '1849', '4933', '171', '37.7', 'False\n']...
what i want to get to is:
[['1', '1', 'BD61 SLU', 'HONDA', 'CR-V', 'SFDR', '5', '1780', '4510', '130', '39', 'True\n'], ['2', '2', 'CA51 MBE', 'CHEVROLET', 'CORVETTE', 'JTAV', '2', '1877', '1234', '194', '24', 'True\n'], ['3', '3', 'PC14 RSN', 'FORD', 'F-150', 'PQBD', '5', '2121', '5890', '155', '20', 'True\n'], ['4', '4', 'MB19 ORE', 'HONDA', 'ACCORD', 'FDAR', '5', '1849', '4933', '125', '47.3', 'False\n'], ['5', '5', 'BD68 NAP', 'HONDA', 'ACCORD', 'FDAV', '5', '1849', '4933', '171', '37.7', 'False\n']...
It's basically a counter at the start of each list.
Ive tried all sorts and just keep getting errors, probably because i cant understand the basics of why i cant just do this:
for i in buffer:
buffer.insert(i, i+1)
to go through each entry in the list and add a value equal to the index +1... I know its probably simple but i've been banging my head off the monitor for a good few hours now...
The key is, you don't want to manipulate buffer. You want to manipulate the individual lists within buffer:
for i,row in enumerate(buffer):
row.insert( 0, str(i+1) )
When I run the set() function in python to convert from a list to set and to remove duplicates, the set function is reversing the order of my list.
x = ['2','6']
y = list(set(x))
print('x is',str(x),'y is',str(y))
When you build the set, each element from the list is added in the order that the list_iterator (produced by iter(x)) yields them.
A set, though, is unordered, and the insertion order is neither remembered nor used to construct the iteration order of the set itself.
As a result, list(set(x)) produces a list whose order is unrelated to the order of x in any meaningful fashion.
As others mentioned, set doesn't preserve order. A good alternative is OrderedDict, which guarantees unique keys and preserves order.
from collections import OrderedDict
x = ['1','1','2','3','4','5','5','5','6','7','8','9','9']
y = list(OrderedDict.fromkeys(x))
print("x is", x)
print("y is", y)
# x is ['1', '1', '2', '3', '4', '5', '5', '5', '6', '7', '8', '9', '9']
# y is ['1', '2', '3', '4', '5', '6', '7', '8', '9']
I have a dictionary with elements as lists of certain sequence:
a = {'seq1':['5', '4', '3', '2', '1', '6', '7', '8', '9'],
'seq2':['9', '8', '7', '6', '5', '4', '3', '2', '1'],
'seq3':['5', '4', '3', '2', '1', '11', '12', '13', '14'],
'seq4':['15', '16', '17'],
'seq5':['18', '19', '20', '21', '22', '23'],
'seq6':['18', '19', '20', '24', '25', '26']}
So there are 6 sequences
What I need to do is:
To find only unique lists (if two lists contains the same elements (regardless of their order), they are not unique) - say I need to get rid of the second list (the first founded unique list will stay)
In unique lists I need to find unique subsequences of elements and print
it
Bounds of unique sequences are found by resemblance of elements order - in the 1st and the 3rd lists the bound ends exactly after element '1', so we get the subsequence ['5','4','3','2','1']
As the result I would like to see elements exactly in the same order as it was in the beginning (if it`s possible at all somehow). So I expect this:
[['5', '4', '3', '2', '1']['6', '7', '8', '9']['11', '12', '13', '14']['15', '16', '17']['18', '19', '20']['21', '22', '23']['24', '25', '26']]
Tried to do it this way:
import itertools
unique_sets = []
a = {'seq1':["5","4","3","2","1","6","7","8","9"], 'seq2':["9","8","7","6","5","4","3","2","1"], 'seq3':["5","4","3","2","1","11","12","13","14"], 'seq4':["15","16","17"], 'seq5':["18","19","20","21","22","23"], 'seq6':["18","19","20","24","25","26"]}
b = []
for seq in a.values():
b.append(seq)
for seq1, seq2 in itertools.combinations(b,2): #searching for intersections
if set(seq1).intersection(set(seq2)) not in unique_sets:
#if set(seq1).intersection(set(seq2)) == set(seq1):
#continue
unique_sets.append(set(seq1).intersection(set(seq2)))
if set(seq1).difference(set(seq2)) not in unique_sets:
unique_sets.append(set(seq1).difference(set(seq2)))
for it in unique_sets:
print(it)
I got this which is a little bit different from my expectations:
{'9', '5', '2', '3', '7', '1', '4', '8', '6'}
set()
{'5', '2', '3', '1', '4'}
{'9', '8', '6', '7'}
{'5', '2', '14', '3', '1', '11', '12', '4', '13'}
{'17', '16', '15'}
{'19', '20', '18'}
{'23', '21', '22'}
Without comment in the code above the result is even worse.
Plus I have the problem with unordered elements in the sets, which I get as the result. Tried to do this with two separate lists:
seq1 = set([1,2,3,4,5,6,7,8,9])
seq2 = set([1,2,3,4,5,10,11,12])
and it worked fine - elements didn`t ever change their position in sets. Where is my mistake?
Thanks.
Updated: Ok, now I have a little bit more complicated task, where offered alghorithm won`t work
I have this dictionary:
precond = {
'seq1': ["1","2"],
'seq2': ["3","4","2"],
'seq3': ["5","4","2"],
'seq4': ["6","7","4","2"],
'seq5': ["6","4","7","2"],
'seq6': ["6","1","8","9","10"],
'seq7': ["6","1","8","11","9","12","13","14"],
'seq8': ["6","1","8","11","4","15","13"],
'seq9': ["6","1","8","16","9","11","4","17","18","2"],
'seq10': ["6","1","8","19","9","4","16","2"],
}
I expect these sequences, containing at least 2 elements:
[1, 2],
[4, 2],
[6, 7],
[6, 4, 7, 2],
[6, 1, 8]
[9,10],
[6,1,8,11]
[9,12,13,14]
[4,15,13]
[16,9,11,4,17,18,2]
[19,9,4,16,2]
Right now I wrote this code:
precond = {
'seq1': ["1","2"],
'seq2': ["3","4","2"],
'seq3': ["5","4","2"],
'seq4': ["6","7","4","2"],
'seq5': ["6","4","7","2"],
'seq6': ["6","1","8","9","10"],
'seq7': ["6","1","8","11","9","12","13","14"],
'seq8': ["6","1","8","11","4","15","13"],
'seq9': ["6","1","8","16","9","11","4","17","18","2"],
'seq10': ["6","1","8","19","9","4","16","2"],
}
seq_list = []
result_seq = []
#d = []
for seq in precond.values():
seq_list.append(seq)
#print(b)
contseq_ind = 0
control_seq = seq_list[contseq_ind]
mainseq_ind = 1
el_ind = 0
#index2 = 0
def compar():
if control_seq[contseq_ind] != seq_list[mainseq_ind][el_ind]:
mainseq_ind += 1
compar()
else:
result_seq.append(control_seq[contseq_ind])
contseq_ind += 1
el_ind += 1
if contseq_ind > len(control_seq):
control_seq = seq_list[contseq_ind + 1]
compar()
else:
compar()
compar()
This code is not complete anyway - I created looking for the same elements from the beginning, so I still need to write a code for searching of sequence in the end of two compared elements.
Right now I have a problem with recursion. Immidiately after first recursed call I have this error:
if control_seq[contseq_ind] != b[mainseq_ind][el_ind]:
UnboundLocalError: local variable 'control_seq' referenced before assignment
How can I fix this? Or maybe you have a better idea, than using recursion? Thank you in advance.
Not sure if this is what you wanted, but it gets the same result:
from collections import OrderedDict
a = {'seq1':["5","4","3","2","1","6","7","8","9"],
'seq2':["9","8","7","6","5","4","3","2","1"],
'seq3':["5","4","3","2","1","11","12","13","14"],
'seq4':["15","16","17"],
'seq5':["18","19","20","21","22","23"],
'seq6':["18","19","20","24","25","26"]}
level = 0
counts = OrderedDict()
# go through each value in the list of values to count the number
# of times it is used and indicate which list it belongs to
for elements in a.values():
for element in elements:
if element in counts:
a,b = counts[element]
counts[element] = a,b+1
else:
counts[element] = (level,1)
level+=1
last = 0
result = []
# now break up the dictionary of unique values into lists according
# to the count of each value and the level that they existed in
for k,v in counts.items():
if v == last:
result[-1].append(k)
else:
result.append([k])
last = v
print(result)
Result:
[['5', '4', '3', '2', '1'],
['6', '7', '8', '9'],
['11', '12', '13', '14'],
['15', '16', '17'],
['18', '19', '20'],
['21', '22', '23'],
['24', '25', '26']]
Whats the best way to filter out some subsets from a generator. For example I have a string "1023" and want to produce all possible combinations of each of the digits. All combinations would be:
['1', '0', '2', '3']
['1', '0', '23']
['1', '02', '3']
['1', '023']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I am not interested in a subset that contains a leading 0 on any of the items, so the valid ones are:
['1', '0', '2', '3']
['1', '0', '23']
['10', '2', '3']
['10', '23']
['102', '3']
['1023']
I have two questions.
1) If using a generator, whats the best way to filter out the ones with leading zeroes. Currently, I generate all combinations then loop through it afterwards and only continuing if the subset is valid. For simplicity I am only printing the subset in the sample code. Assuming the generator that was created is very long or if it constains a lot of invalid subsets, its almost a waste to loop through the entire generator. Is there a way to stop the generator when it sees an invalid item (one with leading zero) then filter it off 'allCombinations'
2) If the above doesn't exist, whats a better way to generate these combinations (disregarding combinations with leading zeroes).
Code using a generator:
import itertools
def isValid(subset): ## DIGITS WITH LEADING 0 IS NOT VALID
valid = True
for num in subset:
if num[0] == '0' and len(num) > 1:
valid = False
break
return valid
def get_combinations(source, comb):
res = ""
for x, action in zip(source, comb + (0,)):
res += x
if action == 0:
yield res
res = ""
digits = "1023"
allCombinations = [list(get_combinations(digits, c)) for c in itertools.product((0, 1), repeat=len(digits) - 1)]
for subset in allCombinations: ## LOOPS THROUGH THE ENTIRE GENERATOR
if isValid(subset):
print(subset)
Filtering for an easy and obvious condition like "no leading zeros", it can be more efficiently done at the combination building level.
def generate_pieces(input_string, predicate):
if input_string:
if predicate(input_string):
yield [input_string]
for item_size in range(1, len(input_string)+1):
item = input_string[:item_size]
if not predicate(item):
continue
rest = input_string[item_size:]
for rest_piece in generate_pieces(rest, predicate):
yield [item] + rest_piece
Generating every combination of cuts, so long it's not even funny:
>>> list(generate_pieces('10002', lambda x: True))
[['10002'], ['1', '0002'], ['1', '0', '002'], ['1', '0', '0', '02'], ['1', '0', '0', '0', '2'], ['1', '0', '00', '2'], ['1', '00', '02'], ['1', '00', '0', '2'], ['1', '000', '2'], ['10', '002'], ['10', '0', '02'], ['10', '0', '0', '2'], ['10', '00', '2'], ['100', '02'], ['100', '0', '2'], ['1000', '2']]
Only those where no fragment has leading zeros:
>>> list(generate_pieces('10002', lambda x: not x.startswith('0')))
[['10002'], ['1000', '2']]
Substrings that start with a zero were never considered for the recursive step.
One common solution is to try filtering just before using yield. I have given you an example of filtering just before yield:
import itertools
def my_gen(my_string):
# Create combinations
for length in range(len(my_string)):
for my_tuple in itertools.combinations(my_string, length+1):
# This is the string you would like to output
output_string = "".join(my_tuple)
# filter here:
if output_string[0] != '0':
yield output_string
my_string = '1023'
print(list(my_gen(my_string)))
EDIT: Added in a generator comprehension alternative
import itertools
my_string = '1023'
my_gen = ("".join(my_tuple)[0] for length in range(len(my_string))
for my_tuple in itertools.combinations(my_string, length+1)
if "".join(my_tuple)[0] != '0')