use python to combine data lists - python

If I have two separated lists:
list1 = [['2021-05-24', '31220'],....., ['2021-05-24', '6640'],['2021-05-10', '8830'].....]
list2 = [['2021-05-24', '77860'],.....,['2021-05-24', '438000'],['2021-05-10', '9990'].....]
How could I combine them to
[['2021-05-24', 'add all numbers in '2021-05-24' together'],['2021-05-10', 'add all numbers in '2021-05-10' together']]
, '.....' means there are many list-tuples
I am considering delete the duplicated date in each list and then add two lists up:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(sum(list2, []))
q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in list2]
for i in q:
G.add_edges_from(i)
print([list(i) for i in nx.connected_components(G)])
but my code not only deleted the same dates but also deleted the same numbers.
Thanks in advance.

I'd recommend using a defaultdict:
from collections import defaultdict
result = defaultdict(int)
for k,v in (list1 + list2):
result[k] += v
Then you can convert the dict back to a list. Of course if you have several lists you may want to use itertools.chain instead of list1 + list2

You can go with creating new dictionary and do calculation and then create list out of it. Here is the code for that
result={}
for i in list1+list2: #creating dict and doing calculation
if i[0] in result.keys():
result[i[0]] += int(i[1])
else:
result[i[0]] = int(i[1])
result_list = [[key, result[key]] for key in result] #converting dict to expected list
print(result_list)

Related

get the file names list by another list

I am new to Python and felt some kind of confused...
I have a List like:
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd","asakindsbbss","wooedakse","sdadakindwsd","xxxxsdsd"]
and also a keyword list:
Key = ["woo","kind"]
and I want to get something like this:
[
["aawoobbcc", "aawoobbca","wooedakse"],
["aabbcskindd","asakindsbbss","sdadakindwsd"]
]
I have tried list_sub = [file for file in List_all if Key in file]
or list_sub = [file for file in List_all if k for k in Key in file]
which were not right.
how could I go through the elements in Key for the substring of elements in List?
Thanks a lot!
One approach (O(n^2)), is the following:
res = [[e for e in List_all if k in e] for k in Key]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
A simpler to understand solution (for newbies) is to use nested for loops:
res = []
for k in Key:
filtered = []
for e in List_all:
if k in e:
filtered.append(e)
res.append(filtered)
A more advanced solution, albeit more performant (for really long lists), is to use a regular expression in conjunction with a defaultdict:
import re
from collections import defaultdict
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd", "asakindsbbss", "wooedakse", "sdadakindwsd", "xxxxsdsd"]
Key = ["woo", "kind"]
extract_key = re.compile(f"{'|'.join(Key)}")
table = defaultdict(list)
for word in List_all:
if match := extract_key.search(word):
table[match.group()].append(word)
res = [table[k] for k in Key if k in table]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
Note that this solution consider that each string contains only one key.

associate 2 lists based on ID

I'm trying to merge data from 2 lists by an ID:
list_a = [
(u'65d92438497c', u'compute-0'),
(u'051df48db621', u'compute-4'),
(u'd6160db0cbcd', u'compute-3'),
(u'23fc20b59bd6', u'compute-1'),
(u'0db2e733520d', u'controller-1'),
(u'89334dac8a59', u'compute-2'),
(u'51cf9d50b02e', u'compute-5'),
(u'f4fe106eaeab', u'controller-2'),
(u'06cc124662dc', u'controller-0')
]
list_b = [
(u'65d92438497c', u'p06619'),
(u'051df48db621', u'p06618'),
(u'd6160db0cbcd', u'p06620'),
(u'23fc20b59bd6', u'p06622'),
(u'0db2e733520d', u'p06612'),
(u'89334dac8a59', u'p06621'),
(u'51cf9d50b02e', u'p06623'),
(u'f4fe106eaeab', u'p06611'),
(u'06cc124662dc', u'p06613')
]
list_ab = [
(u'65d92438497c', u'p06619', u'compute-0'),
(u'051df48db621', u'p06618', u'compute-4'),
(u'd6160db0cbcd', u'p06620', u'compute-3'),
(u'23fc20b59bd6', u'p06622', u'compute-1'),
(u'0db2e733520d', u'p06612', u'controller-1'),
(u'89334dac8a59', u'p06621', u'compute-2'),
(u'51cf9d50b02e', u'p06623', u'compute-5'),
(u'f4fe106eaeab', u'p06611', u'controller-2'),
(u'06cc124662dc', u'p06613', u'controller-0')
]
You can see that the first field in an ID, identical between list_a and list_b and I need to merge on this value
I'm not sure what type of data I need for result_ab
The purpose of this is to find 'compute-0' from 'p06619' so maybe there is a better way than merge.
You are using a one-dimensional list containing a tuple, it could be not needed. Anyway, to obtain the output you require:
list_a = [(u'65d92438497c', u'compute-0')]
list_b = [(u'65d92438497c', u'p-06619')]
result_ab = None
if list_a[0][0] == list_b[0][0]:
result_ab = [tuple(list(list_a[0]) + list(list_b[0][1:]))]
Here is my solution :
merge = []
for i in range(0,len(list_a)):
if list_a[i][0] == list_b[i][0]:
merge.append([tuple(list(list_a[i]) + list(list_b[i][1:]))])
The idea is to create a dictionary with the keys as the first element of both the lists and values as the list object with all the elements matching that key.
Next, just iterate over the dictionary and create the required new list object:
from collections import defaultdict
res = defaultdict(list)
for elt in list_a:
res[elt[0]].extend([el for el in elt[1:]])
for elt in list_b:
res[elt[0]].extend([el for el in elt[1:]])
list_ab = []
for key, value in res.items():
elt = tuple([key, *[val for val in value]])
list_ab.append(elt)
print(list_ab)

Python Remove duplicates and original from nested list based on specific key

I m trying to delete all duplicates & original from a nested list based on specific column.
Example
list = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',0988,'another another text'],['poi',1234,'text']]
The key column is the first (abc, def, abc) and based on this I want to remove any item (plus the original) which has the same value with the original.
So the new list should contain:
newlist = [['def',9834,'another text'],['poi',1234,'text']]
I found many similar topics but not for nested lists...
Any help please?
You can construct a list of keys
keys = [x[0] for x in list]
and select only those records for which the key occurs exactly once
newlist = [x for x in list if keys.count(x[0]) == 1]
Use collections.Counter:
from collections import Counter
lst = [['abc',3232,'demo text'],['def',9834,'another text'],['abc',988,'another another text'],['poi',1234,'text']]
d = dict(Counter(x[0] for x in lst))
print([x for x in lst if d[x[0]] == 1])
# [['def', 9834, 'another text'],
# ['poi', 1234, 'text']]
Also note that you shouldn't name your list as list as it shadows the built-in list.
Using a list comprehension.
Demo:
l = [['abc',3232,'demo text'],['def',9834,'another text'],['abc', 988,'another another text'],['poi',1234,'text']]
checkVal = [i[0] for i in l]
print( [i for i in l if not checkVal.count(i[0]) > 1 ] )
Output:
[['def', 9834, 'another text'], ['poi', 1234, 'text']]
Using collections.defaultdict for an O(n) solution:
L = [['abc',3232,'demo text'],
['def',9834,'another text'],
['abc',988,'another another text'],
['poi',1234,'text']]
from collections import defaultdict
d = defaultdict(list)
for key, num, txt in L:
d[key].append([num, txt])
res = [[k, *v[0]] for k, v in d.items() if len(v) == 1]
print(res)
[['def', 9834, 'another text'],
['poi', 1234, 'text']]

Python - Split array into multiple arrays

I have a array contains file names like below:
['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png', ....]
I want to quickly group these files into multiple arrays like this:
[['001_1.png', '001_2.png', '001_3.png'], ['002_1.png', '002_2.png'], ['003_1.png', '003_2.png', '003_3.png', '003_4.png'], ...]
Could anyone tell me how to do it in few lines in python?
If your data is already sorted by the file name, you can use itertools.groupby:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import itertools
keyfunc = lambda filename: filename[:3]
# this creates an iterator that yields `(group, filenames)` tuples,
# but `filenames` is another iterator
grouper = itertools.groupby(files, keyfunc)
# to get the result as a nested list, we iterate over the grouper to
# discard the groups and turn the `filenames` iterators into lists
result = [list(files) for _, files in grouper]
print(list(result))
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Otherwise, you can base your code on this recipe, which is more efficient than sorting the list and then using groupby.
Input: Your input is a flat list, so use a regular ol' loop to iterate over it:
for filename in files:
Group identifier: The files are grouped by the first 3 letters:
group = filename[:3]
Output: The output should be a nested list rather than a dict, which can be done with
result = list(groupdict.values())
Putting it together:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import collections
groupdict = collections.defaultdict(list)
for filename in files:
group = filename[:3]
groupdict[group].append(filename)
result = list(groupdict.values())
print(result)
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Read the recipe answer for more details.
Something like that should work:
import itertools
mylist = [...]
[list(v) for k,v in itertools.groupby(mylist, key=lambda x: x[:3])]
If input list isn't sorted, than use something like that:
import itertools
mylist = [...]
keyfunc = lambda x:x[:3]
mylist = sorted(mylist, key=keyfunc)
[list(v) for k,v in itertools.groupby(mylist, key=keyfunc)]
You can do it using a dictionary.
list = ['001_1.png', '001_2.png', '003_3.png', '002_1.png', '002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
dict = {}
for item in list:
if item[:3] not in dict:
dict[item[:3]] = []
dict[item[:3]].append(item)
Then you have to sort the dictionary by key value.
dict = {k:v for k,v in sorted(dict.items())}
The last step is to use a list comprehension in order to achieve your requirement.
list = [v for k,v in dict.items()]
print(list)
Output
[['001_1.png', '001_2.png'], ['002_1.png', '002_2.png'], ['003_3.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']]
Using a simple iteration and dictionary.
Ex:
l = ['001_1.png', ' 001_2.png', ' 003_3.png', ' 002_1.png', ' 002_2.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png']
r = {}
for i in l:
v = i.split("_")[0][-1]
if v not in r:
r[v] = []
r[v].append(i)
print(r.values())
Output:
[['001_1.png', ' 001_2.png'], [' 003_3.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png'], [' 002_1.png', ' 002_2.png']]
If your list is ordered like this here is a short script for this task.
myList = []
for i in a:
if i[:-4].endswith('1'):
myList.append([i])
else:
myList[-1].append(i)
# [['001_1.png', '001_2.png', '003_3.png'], ['002_1.png', '002_2.png'], ...]
#IYN
mini_list = []
p = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
new_p = []
for index, element in enumerate(p):
if index == len(p)-1:
mini_list.append(element)
new_p.append(mini_list)
break
if element[0:3]==p[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_p.append(mini_list)
mini_list = []
print (new_p)
The code above will cut the initial list into sub lists and append them as individual lists into a resulting, larger list.
Note: not a few lines, but you can convert this to a function.
def list_cutter(ls):
mini_list = []
new_list = []
for index, element in enumerate(ls):
if index == len(ls)-1:
mini_list.append(element)
new_list.append(mini_list)
break
if element[0:3]==ls[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_list.append(mini_list)
mini_list = []
return new_list

divide list and generate series of new lists. one from each list and rest into other

I have three lists and want to sort and generate two new list. Can any one please tell how it can be done?
list1=[12,25,45], list2=[14,69], list3=[54,98,68,78,48]
I want to print the output like
chosen1=[12,14,54], rest1=[25,45,69,98,68,78,48]
chosen2=[12,14,98], rest2=[25,45,69,54,68,78,48]
and so on
(every possible combination for chosen list)
I have tried to write this but I don't know
list1=[12,25,45]
list2=[14,69]
list3=[54,98,68,78,48]
for i in xrange (list1[0],list1[2]):
for y in xrange(list2[0], list2[1]):
for z in xrange(list[0],list[4])
for a in xrange(chosen[0],[2])
chosed1.append()
for a in xrange(chosen[0],[7])
rest1.append()
Print rest1
Print chosen1
itertools.product generates all permutations of selecting one thing each out of different sets of things:
import itertools
list1 = [12,25,45]
list2 = [14,69]
list3 = [54,98,68,78,48]
for i,(a,b,c) in enumerate(itertools.product(list1,list2,list3),1):
# Note: Computing rest this way will *not* work if there are duplicates
# in any of the lists.
rest1 = [n for n in list1 if n != a]
rest2 = [n for n in list2 if n != b]
rest3 = [n for n in list3 if n != c]
rest = ','.join(str(n) for n in rest1+rest2+rest3)
print('chosen{0}=[{1},{2},{3}], rest{0}=[{4}]'.format(i,a,b,c,rest))
Output:
chosen1=[12,14,54], rest1=[25,45,69,98,68,78,48]
chosen2=[12,14,98], rest2=[25,45,69,54,68,78,48]
chosen3=[12,14,68], rest3=[25,45,69,54,98,78,48]
chosen4=[12,14,78], rest4=[25,45,69,54,98,68,48]
chosen5=[12,14,48], rest5=[25,45,69,54,98,68,78]
chosen6=[12,69,54], rest6=[25,45,14,98,68,78,48]
chosen7=[12,69,98], rest7=[25,45,14,54,68,78,48]
chosen8=[12,69,68], rest8=[25,45,14,54,98,78,48]
chosen9=[12,69,78], rest9=[25,45,14,54,98,68,48]
chosen10=[12,69,48], rest10=[25,45,14,54,98,68,78]
chosen11=[25,14,54], rest11=[12,45,69,98,68,78,48]
chosen12=[25,14,98], rest12=[12,45,69,54,68,78,48]
chosen13=[25,14,68], rest13=[12,45,69,54,98,78,48]
chosen14=[25,14,78], rest14=[12,45,69,54,98,68,48]
chosen15=[25,14,48], rest15=[12,45,69,54,98,68,78]
chosen16=[25,69,54], rest16=[12,45,14,98,68,78,48]
chosen17=[25,69,98], rest17=[12,45,14,54,68,78,48]
chosen18=[25,69,68], rest18=[12,45,14,54,98,78,48]
chosen19=[25,69,78], rest19=[12,45,14,54,98,68,48]
chosen20=[25,69,48], rest20=[12,45,14,54,98,68,78]
chosen21=[45,14,54], rest21=[12,25,69,98,68,78,48]
chosen22=[45,14,98], rest22=[12,25,69,54,68,78,48]
chosen23=[45,14,68], rest23=[12,25,69,54,98,78,48]
chosen24=[45,14,78], rest24=[12,25,69,54,98,68,48]
chosen25=[45,14,48], rest25=[12,25,69,54,98,68,78]
chosen26=[45,69,54], rest26=[12,25,14,98,68,78,48]
chosen27=[45,69,98], rest27=[12,25,14,54,68,78,48]
chosen28=[45,69,68], rest28=[12,25,14,54,98,78,48]
chosen29=[45,69,78], rest29=[12,25,14,54,98,68,48]
chosen30=[45,69,48], rest30=[12,25,14,54,98,68,78]
If you need to get 2 digit combinations from the two list and the remaining then this would be the solution:
import itertools
list1 = [12,25,45]
list2 = [14,69]
list3 = [21,34,56,32]
chosen = []
leftover = []
mergedlist = list(set(list1 + list2 + list3))
mergedNewList = [x for x in itertools.permutations(mergedlist,3)]
for i,value in enumerate(mergedNewList):
chosen.append(list(value))
leftover.append([j for j in mergedlist if j not in chosen[i]])
print chosen[i]
print leftover[i]`
I have appended the values in a single variable for chosen and for the rest in leftover as this is the most pythonic way of storing the values.

Categories

Resources