get min max values of items with same id if difference exists? - python
I have two files that look like this with some differences between them:
First file:
{16:[3, [-7, 87, 20, 32]]}
{17:[2, [-3, 88, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 28], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 85, 20, 32]]}
{20:[2, [9, 94, 16, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [2, 76, 19, 31]]}
{22:[2, [15, 97, 16, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 16, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 19, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 16, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
Second file:
{16:[3, [-7, 86, 20, 32]]}
{17:[2, [-3, 82, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 27], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 84, 20, 32]]}
{20:[2, [9, 94, 15, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [1, 76, 19, 31]]}
{22:[2, [15, 97, 17, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 18, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 20, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 17, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
I compare them both using difflib and print out the lines that have a difference in them.
What i am trying to do is print out the minimum and maximum frame values that share the same id.
The frame is the key in every line so the frames in this case range from 16 to 26. The id is the value that preceeds every list of 4 values. So the id on the first line is 3. The second line has two id's which are 2 and then 3.
So an example of what i'd like to write out is:
17 - 36
given that one of the frames that share the id 3 is different than the file that i am comparing with.
For every difference like that, i need to write out a new file that only contains the start frame and the end frame, then i'll work on concatenating additional strings to each file.
this is the current difflib usage that prints out each line that has a different:
def compare(f1, f2):
with open(f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
diff = difflib.ndiff(fin1.readlines(), fin2.readlines())
outcome = ''.join(x[2:] for x in diff if x.startswith('- '))
print outcome
How would i be able to achieve what i described above with tweaking this execution block?
Note that both files share the same frame ammount but not the same ids so i would need to write two different files for each difference, possibly into a folder. So if the two files have 20 differences, i need to have two main folders one for each original file that each contain text files for every start and end frame of the same id.
Suppose your list of differences is the file content you give at the beginning of your post. I proceeded in 2 times, 1st get list of frames per id:
>>> from collections import defaultdict
>>> diffs = defaultdict(list)
>>> for line in s.split('\n'):
d = eval(line) # We have a dict
for k in d: # Only one value, k is the frame
# Only get even values for ids
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
>>> diffs # We now have a dict with ids as keys :
defaultdict(<type 'list'>, {10: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 2: [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], 3: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 29: [31, 32, 33, 34, 35, 36]})
Now we get the ranges per id, thanks to this other SO post that helps getting the ranges from a list of indexes:
>>> from operator import itemgetter
>>> from itertools import groupby
>>> for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
id 10 : 23 -> 36
id 2 : 17 -> 33
id 3 : 16 -> 36
id 29 : 31 -> 36
You then have, for each id, the range of differences. I guess that with a little adaptation you can get to what to you want.
EDIT : here is the final answer with the same kind of block:
>>> def compare(f1, f2):
# 2 embedded 'with' because I'm on Python 2.5 :-)
with open(f1+'.txt', 'r') as fin1:
with open(f2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
# Do not forget the strip function to remove unnecessary '\n'
diff_lines = [l.strip() for l in lines1 if l not in lines2]
# Ok, we have our differences (very basic)
diffs = defaultdict(list)
for line in diff_lines:
d = eval(line) # We have a dict
for k in d:
list_ids = d[k] # Only one value, k is the frame
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
>>> compare(r'E:\CFM\Dev\Python\test\f1', r'E:\CFM\Dev\Python\test\f2')
id 2 : 17 -> 24
id 2 : 26 -> 26
id 3 : 16 -> 24
id 3 : 26 -> 26
id 10 : 23 -> 24
id 10 : 26 -> 26
Related
How to match two list with the same length
i have two list with the same length over 11 rows. I would like df[0] to find a match in any position in df2[0] and df[1] to find a match in any position df2[1] and etc.... Instead of me typing one by one is there a easier method? df = [[[1, 5,7,9,12,13,17], [2,17,18,23,32,34,45], [3,5,11,33,34,36,45]], [[6,21,22,50,56,58,72], [7,5,12,13,55,56,74], [8,23,24,32,56,58,64]]] df2 = [[[100,5,12,15,27,32,54], [120,10,17,18,19,43,55], [99,21,32,33,34,36,54]], [[41,16,32,45,66,67,76], [56,10,11,43,54,55,56], [77,12,16,18,19,21,23]]] i would like my output to be like this. output = [[[[5,12,],[17]], [[17,18],[32,34,36]]], [[[55,56],[32]],[[56]]]
As of your reworked question, it is still not quite clear to me what exactly you want to accomplish. I assume you want element based matching. By using this approach we can find the matching sequences of two lists. For the presented case, we just need to iterate over all the elements of your array. The matches function will find all matching sequences. Using it in the nested for loop allows for element wise comparison. The matching sequences are the ten written to matched_sequences which will hold all identified matches. import difflib df = [ [[1, 5, 7, 9, 12, 13, 17], [2, 17, 18, 23, 32, 34, 45], [3, 5, 11, 33, 34, 36, 45]], [[6, 21, 22, 50, 56, 58, 72], [7, 5, 12, 13, 55, 56, 74], [8, 23, 24, 32, 56, 58, 64]], ] df2 = [ [[100, 5, 12, 15, 27, 32, 54], [120, 10, 17, 18, 19, 43, 55], [99, 21, 32, 33, 34, 36, 54]], [[41, 16, 32, 45, 66, 67, 76], [56, 10, 11, 43, 54, 55, 56], [77, 12, 16, 18, 19, 21, 23]], ] def matches(list1, list2): while True: mbs = difflib.SequenceMatcher(None, list1, list2).get_matching_blocks() if len(mbs) == 1: break for i, j, n in mbs[::-1]: if n > 0: yield list1[i : i + n] del list1[i : i + n] del list2[j : j + n] matched_sequences = [] for row_df, row_df2 in zip(df, df2): for el1, el2 in zip(row_df, row_df2): matched_sequences.extend(list(matches(el1, el2))) print(matched_sequences) This will produce as identified matches: [[12], [5], [17, 18], [33, 34, 36], [55, 56], [23]]
group together consecutive numbers in a list
I have an ordered Python list of forms: [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40] How can I group together consecutive numbers in a list. A group like this: [[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23,], [30], [35, 36, 37, 38, 39, 40]] I tried using groupby from here but was not able to tailor it to my need. Thanks,
You could use negative indexing: def group_by_missing(seq): if not seq: return seq grouped = [[seq[0]]] for x in seq[1:]: if x == grouped[-1][-1] + 1: grouped[-1].append(x) else: grouped.append([x]) return grouped Example Usage: >>> lst = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40] >>> group_by_missing(lst) [[1, 2, 3, 4, 5], [12, 13, 14, 15], [20, 21, 22, 23], [30], [35, 36, 37, 38, 39, 40]]
A fancy pythonic way to do it with less lines would be possible with the reduce function from functools and a lambda function with an inline if as a criteria for the reduce: import functools lis = [1, 2, 3, 4, 5, 12, 13, 14, 15, 20, 21, 22, 23, 30, 35, 36, 37, 38, 39, 40] result = functools.reduce(lambda x,y : x[:-1]+[x[-1]+[y]] if (x[-1][-1]+1==y) else [*x,[y]], lis[1:] , [[lis[0]]] ) print(result)
Split a list into multiple lists with each one having a set amount of items
So say if I have a list of 43 items and I want to split that list up. The result would be 5 lists with the first 4 having 10 items and the last having 3 items. Is there a way I can do this?
Python has a feature such that if you ask for the first 10 elements of a list with only 3 elements, it returns all 3 elements instead of causing an error.Formally, if a is a list with 3 elements, a[0:100] returns all 3 elements. We can use that here like so: a = [i for i in range(1, 44] # A list of 43 items from 1 to 43 from math import ceil for i in range(ceil(len(a)/10)): print(a[i*10:i*10+10]) And the output is: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] [11, 12, 13, 14, 15, 16, 17, 18, 19, 20] [21, 22, 23, 24, 25, 26, 27, 28, 29, 30] [31, 32, 33, 34, 35, 36, 37, 38, 39, 40] [41, 42, 43] In the last iteration of the loop, we print a[4*10:4*10 + 10], that is a[40:50], which returns everything from a[40] to the end of a. Bonus: If you want to store the separate lists in another array, you can do this: new_array = [] for i in range(ceil(len(a)/10)): new_array.append(a[i*10:i*10+10]) And new_array will store the 5 lists.
You can use steps in the range to get exactly the start position of the subscript based on the size of the repeating chunks: L = list(range(43)) parts = 5 # if you're looking for a fixed number of parts size = len(L)//parts # 10 (don't need parts if you know the chunk size) R = [L[p:p+size] for p in range(0,len(L),size)] print(R) [[0, 1, 2, 3, 4, 5, 6, 7], [8, 9, 10, 11, 12, 13, 14, 15], [16, 17, 18, 19, 20, 21, 22, 23], [24, 25, 26, 27, 28, 29, 30, 31], [32, 33, 34, 35, 36, 37, 38, 39], [40, 41, 42]]
Not able to iterate key while creating a dictionary out of a csv file
My goal is to read info from a csv file and save that to a dictionary(Thanks to comments). Dictionary will contain name as the key and numbers as the value. Dictionary should be dynamic i.e. should work for n number of row characters. What ı need: be able to change the key values and create only one dictionary not multiple dicts. name,AGATC,AATG,TATC Alice,2,8,3 Bob,4,1,5 Charlie,3,2,5 Alice = [2,8,3] Bob = [4,1,5] etc. The code should work for this too with same logic: name,AGATC,TTTTTTCT,AATG,TCTAG,GATA,TATC,GAAA,TCTG Albus,15,49,38,5,14,44,14,12 Cedric,31,21,41,28,30,9,36,44 Draco,9,13,8,26,15,25,41,39 Fred,37,40,10,6,5,10,28,8 Ginny,37,47,10,23,5,48,28,23 Hagrid,25,38,45,49,39,18,42,30 Harry,46,49,48,29,15,5,28,40 Hermione,43,31,18,25,26,47,31,36 James,46,41,38,29,15,5,48,22 Kingsley,7,11,18,33,39,31,23,14 Lavender,22,33,43,12,26,18,47,41 Lily,42,47,48,18,35,46,48,50 Lucius,9,13,33,26,45,11,36,39 Luna,18,23,35,13,11,19,14,24 Minerva,17,49,18,7,6,18,17,30 Neville,14,44,28,27,19,7,25,20 Petunia,29,29,40,31,45,20,40,35 Remus,6,18,5,42,39,28,44,22 Ron,37,47,13,25,17,6,13,35 Severus,29,27,32,41,6,27,8,34 Sirius,31,11,28,26,35,19,33,6 Vernon,26,45,34,50,44,30,32,28 Zacharias,29,50,18,23,38,24,22,9 Here is my try: def readcsv(n): with open(f'{n}','r') as f: readed = csv.reader(f) for row in readed: key = row[0] value = row[1:] #print(f"{key} and {value}") dic = dict(key = value) print(dic) OUTPUT : {'key': ['AGATC', 'AATG', 'TATC']} {'key': ['2', '8', '3']} {'key': ['4', '1', '5']} {'key': ['3', '2', '5']}
You can read in .csv files easily using the pandas library. What you are looking for is better served with a dictionary where the names are the keys and the lists are the values. If you have a dictionary and want to change the value of a certain key (say, Albus in the dict, d below), it is very straight-forward. # To change the value associated with key="Albus" d["Albus"] = [1,2,3,4,5,6,7,8] # To access the value of key="Albus" d["Albus"] Code import pandas as pd import os # for handling file-paths from io import StringIO # for reading dummy data ## Reading from a file "input.csv" # df = pd.read_csv("input.csv", sep=",").set_index('name') ## Reading from the dummy data as a string df = pd.read_csv(StringIO(s.strip()), sep=",").set_index('name') ## Subsequently process the data to get a # dict of structure (key=name, value=list). df = pd.DataFrame(df.to_numpy().T, columns=df.index) d = df.to_dict(orient='list) # returns a dictionary ==> data-structure print(d) Output: {'Albus': [15, 49, 38, 5, 14, 44, 14, 12], 'Cedric': [31, 21, 41, 28, 30, 9, 36, 44], 'Draco': [9, 13, 8, 26, 15, 25, 41, 39], 'Fred': [37, 40, 10, 6, 5, 10, 28, 8], 'Ginny': [37, 47, 10, 23, 5, 48, 28, 23], 'Hagrid': [25, 38, 45, 49, 39, 18, 42, 30], 'Harry': [46, 49, 48, 29, 15, 5, 28, 40], 'Hermione': [43, 31, 18, 25, 26, 47, 31, 36], 'James': [46, 41, 38, 29, 15, 5, 48, 22], 'Kingsley': [7, 11, 18, 33, 39, 31, 23, 14], 'Lavender': [22, 33, 43, 12, 26, 18, 47, 41], 'Lily': [42, 47, 48, 18, 35, 46, 48, 50], 'Lucius': [9, 13, 33, 26, 45, 11, 36, 39], 'Luna': [18, 23, 35, 13, 11, 19, 14, 24], 'Minerva': [17, 49, 18, 7, 6, 18, 17, 30], 'Neville': [14, 44, 28, 27, 19, 7, 25, 20], 'Petunia': [29, 29, 40, 31, 45, 20, 40, 35], 'Remus': [6, 18, 5, 42, 39, 28, 44, 22], 'Ron': [37, 47, 13, 25, 17, 6, 13, 35], 'Severus': [29, 27, 32, 41, 6, 27, 8, 34], 'Sirius': [31, 11, 28, 26, 35, 19, 33, 6], 'Vernon': [26, 45, 34, 50, 44, 30, 32, 28], 'Zacharias': [29, 50, 18, 23, 38, 24, 22, 9]} Dummy Data s = """ name,AGATC,TTTTTTCT,AATG,TCTAG,GATA,TATC,GAAA,TCTG Albus,15,49,38,5,14,44,14,12 Cedric,31,21,41,28,30,9,36,44 Draco,9,13,8,26,15,25,41,39 Fred,37,40,10,6,5,10,28,8 Ginny,37,47,10,23,5,48,28,23 Hagrid,25,38,45,49,39,18,42,30 Harry,46,49,48,29,15,5,28,40 Hermione,43,31,18,25,26,47,31,36 James,46,41,38,29,15,5,48,22 Kingsley,7,11,18,33,39,31,23,14 Lavender,22,33,43,12,26,18,47,41 Lily,42,47,48,18,35,46,48,50 Lucius,9,13,33,26,45,11,36,39 Luna,18,23,35,13,11,19,14,24 Minerva,17,49,18,7,6,18,17,30 Neville,14,44,28,27,19,7,25,20 Petunia,29,29,40,31,45,20,40,35 Remus,6,18,5,42,39,28,44,22 Ron,37,47,13,25,17,6,13,35 Severus,29,27,32,41,6,27,8,34 Sirius,31,11,28,26,35,19,33,6 Vernon,26,45,34,50,44,30,32,28 Zacharias,29,50,18,23,38,24,22,9 """
Do you mean the variable name by "name"? I suppose it would be possible do something like this globals()[row[0]] = list(row[1:]) However it would be highly unconventional to set a variable name during runtime. A better solution for a namespace would be using a dictionary: rows = {row[0]:row[1:]} for row in readed} You could the access the relevant element by using rows["name of row"]
how to split length list in python [duplicate]
This question already has answers here: How do I split a list into equally-sized chunks? (66 answers) Closed 3 years ago. question my_list = [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,...,50] answer listOne = [0,1,2,....,9 listTwo = [10,11,12,...,19] listThree = [20,21,22,...,29] listFour = [30,31,32,...,39] listFive = [40,41,42,...,49] listSix = [50,51,52,...,59] answer If we do not know the number to show in my list how to split list
def SplitList(given_list, chunk_size): return [given_list[offs:offs+chunk_size] for offs in range(0, len(given_list), chunk_size)] Use this function to pass the list: chunk_list = SplitList(my_list, 10) for lst in chunk_list: print(lst)
You can use mlist[i : i+10] to split every 10 element in a group #populate list mlist = [] for i in range (51): mlist.append(i) print("##########INPUT##########") print(mlist) new = [] for i in range(0, len(mlist), 10): new.append(mlist[i : i+10]) print("##########OUTPUT##########") print("Total Group: "+str(len(new))) for i in range(len(new)): print(new[i]) The output will be like this ##########INPUT########## [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50] ##########OUTPUT########## Total Group: 6 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] [10, 11, 12, 13, 14, 15, 16, 17, 18, 19] [20, 21, 22, 23, 24, 25, 26, 27, 28, 29] [30, 31, 32, 33, 34, 35, 36, 37, 38, 39] [40, 41, 42, 43, 44, 45, 46, 47, 48, 49] [50]