I am new to Python and felt some kind of confused...
I have a List like:
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd","asakindsbbss","wooedakse","sdadakindwsd","xxxxsdsd"]
and also a keyword list:
Key = ["woo","kind"]
and I want to get something like this:
[
["aawoobbcc", "aawoobbca","wooedakse"],
["aabbcskindd","asakindsbbss","sdadakindwsd"]
]
I have tried list_sub = [file for file in List_all if Key in file]
or list_sub = [file for file in List_all if k for k in Key in file]
which were not right.
how could I go through the elements in Key for the substring of elements in List?
Thanks a lot!
One approach (O(n^2)), is the following:
res = [[e for e in List_all if k in e] for k in Key]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
A simpler to understand solution (for newbies) is to use nested for loops:
res = []
for k in Key:
filtered = []
for e in List_all:
if k in e:
filtered.append(e)
res.append(filtered)
A more advanced solution, albeit more performant (for really long lists), is to use a regular expression in conjunction with a defaultdict:
import re
from collections import defaultdict
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd", "asakindsbbss", "wooedakse", "sdadakindwsd", "xxxxsdsd"]
Key = ["woo", "kind"]
extract_key = re.compile(f"{'|'.join(Key)}")
table = defaultdict(list)
for word in List_all:
if match := extract_key.search(word):
table[match.group()].append(word)
res = [table[k] for k in Key if k in table]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
Note that this solution consider that each string contains only one key.
Related
I have a list l:
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
In this list, I need to remove duplicates without considering the extension. The expected output is below.
l = ['Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
I tried:
l = list(set(x.split('.')[0] for x in l))
But getting only unique filenames without extension
How could I achieve it?
You can use a dictionary comprehension that uses the name part as key and the full file name as the value, exploiting the fact that dict keys must be unique:
>>> list({x.split(".")[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the file names can be in more sophisticated formats (such as with directory names, or in the foo.bar.xls format) you should use os.path.splitext:
>>> import os
>>> list({os.path.splitext(x)[0]: x for x in l}.values())
['Abc.csv', 'Wqe.csv', 'Xyz.xlsx']
If the order of the end result doesn't matter, we could split each item on the period. We'll regard the first item in the list as the key and then keep the item if the key is unique.
oldList = l
setKeys = set()
l = []
for item in oldList:
itemKey = item.split(".")[0]
if itemKey in setKeys:
pass
else:
setKeys.add(itemKey)
l.append(item)
Try this
l = ['Abc.xlsx', 'Wqe.csv', 'Abc.csv', 'Xyz.xlsx']
for x in l:
name = x.split('.')[0]
find = 0
for index,d in enumerate(l, start=0):
txt = d.split('.')[0]
if name == txt:
find += 1
if find > 1:
l.pop(index)
print(l)
#Selcuk Definitely the best solution, unfortunately I don't have enough reputation to vote you answer.
But I would rather use el[:el.rfind('.')] as my dictionary key than os.path.splitext(x)[0] in order to handle the case where we have sophisticated formats in the name. that will give something like this:
list({x[:x.rfind('.')]: x for x in l}.values())
If I have two separated lists:
list1 = [['2021-05-24', '31220'],....., ['2021-05-24', '6640'],['2021-05-10', '8830'].....]
list2 = [['2021-05-24', '77860'],.....,['2021-05-24', '438000'],['2021-05-10', '9990'].....]
How could I combine them to
[['2021-05-24', 'add all numbers in '2021-05-24' together'],['2021-05-10', 'add all numbers in '2021-05-10' together']]
, '.....' means there are many list-tuples
I am considering delete the duplicated date in each list and then add two lists up:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(sum(list2, []))
q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in list2]
for i in q:
G.add_edges_from(i)
print([list(i) for i in nx.connected_components(G)])
but my code not only deleted the same dates but also deleted the same numbers.
Thanks in advance.
I'd recommend using a defaultdict:
from collections import defaultdict
result = defaultdict(int)
for k,v in (list1 + list2):
result[k] += v
Then you can convert the dict back to a list. Of course if you have several lists you may want to use itertools.chain instead of list1 + list2
You can go with creating new dictionary and do calculation and then create list out of it. Here is the code for that
result={}
for i in list1+list2: #creating dict and doing calculation
if i[0] in result.keys():
result[i[0]] += int(i[1])
else:
result[i[0]] = int(i[1])
result_list = [[key, result[key]] for key in result] #converting dict to expected list
print(result_list)
I have already sorted and filtered list of files which look similar to this:
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
and I would like to have new list with merged similar values before . as independent lists inside list:
merList = [['aa.001', 'aa.002', 'aa.003'], ['vvv.001', 'vvv.002', 'vvv.003']]
I tried to make loop but without result, so will be great if anyone could help fix it:
merList = []
for name in sortList:
temp_merList = []
for b in range(len(sortList)-1):
if name[b][:-3] == name[b+1][:-3] and name[b] not in merList:
temp_merList.append(name)
else:
merList.append(temp_merList)
print(merList)
You can use itertools.groupby:
from itertools import groupby
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
out = []
for _, g in groupby(sortList, lambda k: k.split('.')[0]):
out.append(list(g))
print(out)
Prints:
[['aa.001', 'aa.002', 'aa.003'], ['vvv.001', 'vvv.002', 'vvv.003']]
EDIT: Another method (using temporay dictionary):
sortList = ['aa.001', 'aa.002', 'aa.003', 'vvv.001', 'vvv.002', 'vvv.003']
tmp = {}
for name in sortList:
tmp.setdefault(name.split('.')[0], []).append(name)
merList = [v for _, v in tmp.items()]
print(merList)
I have an array of dictionaries d which I obtain by parsing a JSON file: d = r.json()
Assuming d then contains
d = [
{'change':'112','end_time':'2020-05-12','hostname':'a,b,c,d,e','ref':'345','start_time':'202-04-2020'},
{'change':'182','end_time':'2020-05-12','hostname':'a1,b1,c1,d1,e1','ref':'325','start_time':'202-04-2020'},
{'change':'122','end_time':'2020-05-12','hostname':'g,h,i,j,k','ref':'315','start_time':'202-04-2020'},
{'change':'112','end_time':'2020-05-12','hostname':'o,t1,h1,e4,n7','ref':'345','start_time':'202-04-2020'},
]
where all the hostnames are different from each other, how can I then perform a search like
if hostname=='a1':
print change (i.e 182)
You need to iterate over the list, split the hostnames into a list and check if the hostname you are searching for exists in that list.
hostname = 'a1'
for row in d:
hostnames = row['hostname'].split(',')
if hostname in hostnames:
print(row['change'])
The Pythonic way of solving this (using a comprehension) is also the easiest.
# for your a1 example
change_a1 = [i['change'] for i in d
if 'a1' in i['hostname']]
For an arbitrary search, just wrap it as a function
def find_change(host):
change = [i['change'] for i in d
if host in i['hostname']]
return change
First of all you have a lot of json structure errors:
d=[{'change':'112','end_time':'2020-05-12','hostname':'a,b,c,d,e','ref':'345','start_time':'202-04-2020'},
{'change':'182','end_time':'2020-05-12','hostname':'a1,b1,c1,d1,e1','ref':'325','start_time':'202-04-2020'},
{'change':'122','end_time':'2020-05-12','hostname':'g,h,i,j,k','ref':'315','start_time':'202-04-2020'},
{'change':'112','end_time':'2020-05-12','hostname':'o,t1,h1,e4,n7','ref':'345','start_time':'202-04-2020'}]
hostname='a1'
for row in d:
arr = row['hostname'].split(",")
if hostname in arr:
print(row['change'])
#parse all the keys for learning.
for row in d:
for k in row.keys():
if k == "hostname":
arr = row[k].split(",")
for s in arr:
#print(s)
if s =='a1':
row['change'] = '777'
print(d)
after that use reverse to re-arrange the tuples in json.
Have fun !
Problem:
Trying to evaluate first 4 characters of each item in list.
If the first 4 chars match another first 4 chars in the list, then append the last three digits to the first four. See example below.
Notes:
The list values are not hard coded.
The list always has this structure "####.###".
Only need to match first 4 chars in each item of list.
Order is not essential.
Code:
Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
Desired Output:
Grid = ["094G.016\019\032", "194P.005\015", "093T.021\102"]
Research:
I know that sets can find duplicates, could I use a set to evaluate only the 1st 4 chars, would I run into a problem since indexing of sets cannot be done?
Would it be better to split the list items into the 2 parts. The four digits before the period ("094G"), and a separate list of the three digits after the period ("093"), compare them, then join them in a new list?
Is there a better way of doing this all together that I'm not realizing?
Here is one straightforward way to do it.
from collections import defaultdict
grid = ['094G.016', '094G.019', '194P.005', '194P.015', '093T.021', '093T.102', '094G.032']
d = defaultdict(list)
for item in grid:
k,v = item.split('.')
d[k].append(v)
result = ['%s.%s' % (k, '/'.join(v)) for k, v in d.items()]
Gives unordered result:
['093T.021/102', '194P.005/015', '094G.016/019/032']
What you'll most likely want is a dictionary mapping the first part of each code to a list of second parts. You can build the dictionary like so:
mappings = {} #Empty dictionary
for code in Grid: #Loop over each code
first, second = code.split('.') #Separate the code into first.second
if first in mappings: #if the first was already found
mappings[first].append(second) #add the second to those already computed
else:
mappings[first] = [second] #otherwise, put it in a new list
Once you have the dictionary, it will be quite simple to loop over it and combine the second parts together (ideally, using '\\'.join)
Sounds like a job for defaultdict.
from containers import defaultdict
grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102"]
d = defaultdict(set)
for item in grid:
prefix, suffix = item.split(".")
d[prefix].add(suffix)
output = [ "%s.%s" % (prefix, "/".join(d[prefix]), ) for prefix in d ]
>>> from itertools import groupby
>>> Grid = ["094G.016", "094G.019", "194P.005", "194P.015", "093T.021", "093T.102", "094G.032"]
>>> Grid = sorted(Grid, key=lambda x:x.split(".")[0])
>>> gen = ((k, g) for k, g in groupby(Grid, key=lambda x:x.split(".")[0]))
>>> gen = ((k,[x.split(".") for x in g]) for k, g in gen)
>>> gen = list((k + '.' + '/'.join(x[1] for x in g) for k, g in gen))
>>> for x in gen:
... print(x)
...
093T.021/102
094G.016/019/032
194P.005/015