Seperating list of strings with similarities into different lists - python

I am trying to separate a list with similar strings into multiple lists in python.
e.g. lets say the list is:
lst = ["asd_A01_000.csv", "asd_A02_000.csv", "asd_A02_001.csv", "asd_A01_001.csv", "asd_A04_000.csv"]
and I want to have new lists with any new codes like "A01" (so would have A01, A02, A04 etc.) meaning the result I want would be
["asd_A01_000.csv","asd_A01_001.csv"]
["asd_A02_000.csv","asd_A02_001.csv"]
["asd_A04_000.csv"]
The numbers do not have to be in order, as long as they are in different lists.
It is pretty easy to just do this one by one using a for loop where "A01" in list, but I have codes ranging from A01-A100.
Is there an easy way to do this without doing tons of for loops?
P.S The strings are actually full file directory paths which also have _'s in them (e.g C:\Users\Name\Documents\0XX_20220719_XX\asd_A001_000.csv)

One approach:
from collections import defaultdict
lst = ["asd_A01_000.csv", "asd_A02_000.csv", "asd_A02_001.csv", "asd_A01_001.csv", "asd_A04_000.csv"]
d = defaultdict(list)
for e in lst:
d[e.split("_")[1]].append(e)
res = list(d.values())
print(res)
Output
[['asd_A01_000.csv', 'asd_A01_001.csv'], ['asd_A02_000.csv', 'asd_A02_001.csv'], ['asd_A04_000.csv']]

You can try itertools.groupby()
import itertools
lst = sorted(lst, key=lambda asd: asd.split("_")[1])
out = [list(g) for _, g in itertools.groupby(lst, lambda asd: asd.split("_")[1])]
print(out)
[['asd_A01_000.csv', 'asd_A01_001.csv'], ['asd_A02_000.csv', 'asd_A02_001.csv'], ['asd_A04_000.csv']]

Related

Python dictionary comprehension to group together equal keys

I have a code snippit that groups together equal keys from a list of dicts and adds the dict with equal ObjectID to a list under that key.
Code bellow works, but I am trying to convert it to a Dictionary comprehension
group togheter subblocks if they have equal ObjectID
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = []
output[row["OBJECTID"]].append(row)
Using a comprehension is possible, but likely inefficient in this case, since you need to (a) check if a key is in the dictionary at every iteration, and (b) append to, rather than set the value. You can, however, eliminate some of the boilerplate using collections.defaultdict:
output = defaultdict(list)
for row in subblkDBF:
output[row['OBJECTID']].append(row)
The problem with using a comprehension is that if really want a one-liner, you have to nest a list comprehension that traverses the entire list multiple times (once for each key):
{k: [d for d in subblkDBF if d['OBJECTID'] == k] for k in set(d['OBJECTID'] for d in subblkDBF)}
Iterating over subblkDBF in both the inner and outer loop leads to O(n^2) complexity, which is pointless, especially given how illegible the result is.
As the other answer shows, these problems go away if you're willing to sort the list first, or better yet, if it is already sorted.
If rows are sorted by Object ID (or all rows with equal Object ID are at least next to each other, no matter the overall order of those IDs) you could write a neat dict comprehension using itertools.groupby:
from itertools import groupby
from operator import itemgetter
output = {k: list(g) for k, g in groupby(subblkDBF, key=itemgetter("OBJECTID"))}
However, if this is not the case, you'd have to sort by the same key first, making this a lot less neat, and less efficient than above or the loop (O(nlogn) instead of O(n)).
key = itemgetter("OBJECTID")
output = {k: list(g) for k, g in groupby(sorted(subblkDBF, key=key), key=key)}
You can adding an else block to safe on time n slightly improve perfomrance a little:
output = {}
subblkDBF : list[dict]
for row in subblkDBF:
if row["OBJECTID"] not in output:
output[row["OBJECTID"]] = [row]
else:
output[row["OBJECTID"]].append(row)

Split list into sub-lists based on integer in string

I have a list of strings as such:
['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg', ...]
In reality, there are more entries than 2 per number but this is the general format. I would like to split this list into list of lists as such:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg'], ...]
These sub-lists being based on the integer after the underscore. My current method to do so is to first sort the list based on the numbers as shown in the first list sample above and then iterate through each index and copy the values into new lists if it matches the value of the previous integer.
I am wondering if there is a simpler more pythonic way of performing this task.
Try:
import re
lst = ["text_1.jpg", "othertext_1.jpg", "text_2.jpg", "othertext_2.jpg"]
r = re.compile(r"_(\d+)\.jpg")
out = {}
for val in lst:
num = r.search(val).group(1)
out.setdefault(num, []).append(val)
print(list(out.values()))
Prints:
[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]
Similiar solution to #Andrej:
import itertools
import re
def find_number(s):
# it is said that python will compile regex automatically
# feel free to compile first
return re.search(r'_(\d+)\.jpg', s).group(1)
l = ['text_1.jpg', 'othertext_1.jpg', 'text_2.jpg', 'othertext_2.jpg']
res = [list(v) for k, v in itertools.groupby(l, find_number)]
print(res)
#[['text_1.jpg', 'othertext_1.jpg'], ['text_2.jpg', 'othertext_2.jpg']]

How can I make a one line generator expression to generate these two different lists

I am writing a program which parses lists like this.
['ecl:gry', 'pid:860033327', 'eyr:2020', 'hcl:#fffffd', 'byr:1937', 'iyr:2017', 'cid:147', 'hgt:183cm']
I want to turn this list into a dictionary of the key value pairs which I have done here:
keys = []
values = []
for string in data:
pair = string.split(':')
keys.append(pair[0])
values.append(pair[1])
zipped = zip(keys, values)
self.dic = dict(zipped)
print(self.dic)
I know that I can use list comprehension to make one of the lists at a time like this
keys = [s.split(':')[0] for s in data]
values = [s.split(':')[1] for s in data]
This requires two loops so the first code example would be better, but is there a way to generate both lists using one generator with unpacking and then zip the two together?
l = ['ecl:gry', 'pid:860033327', 'eyr:2020', 'hcl:#fffffd',
'byr:1937', 'iyr:2017', 'cid:147', 'hgt:183cm']
dict(e.split(':') for e in l)
I did it like this:
self.dic = {}
for e in data:
k, v = e.split(':')
self.dic[k] = v
You can dict comprehension it easily:
your_dict = {x.split(':')[0]: x.split(':')[1] for x in data}
You can also prevent from using split two times and use generator:
your_dict = dict(x.split(':') for x in data)
Which seems even cleaner...

sorting a list by names in python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']
This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.
using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

How to separate one list in two via list comprehension or otherwise

If have a list of dictionary items like so:
L = [{"a":1, "b":0}, {"a":3, "b":1}...]
I would like to split these entries based upon the value of "b", either 0 or 1.
A(b=0) = [{"a":1, "b":1}, ....]
B(b=1) = [{"a":3, "b":2}, .....]
I am comfortable with using simple list comprehensions, and i am currently looping through the list L two times.
A = [d for d in L if d["b"] == 0]
B = [d for d in L if d["b"] != 0]
Clearly this is not the most efficient way.
An else clause does not seem to be available within the list comprehension functionality.
Can I do what I want via list comprehension?
Is there a better way to do this?
I am looking for a good balance between readability and efficiency, leaning towards readability.
Thanks!
update:
thanks everyone for the comments and ideas! the most easiest one for me to read is the one by Thomas. but i will look at Alex' suggestion as well. i had not found any reference to the collections module before.
Don't use a list comprehension. List comprehensions are for when you want a single list result. You obviously don't :) Use a regular for loop:
A = []
B = []
for item in L:
if item['b'] == 0:
target = A
else:
target = B
target.append(item)
You can shorten the snippet by doing, say, (A, B)[item['b'] != 0].append(item), but why bother?
If the b value can be only 0 or 1, #Thomas's simple solution is probably best. For a more general case (in which you want to discriminate among several possible values of b -- your sample "expected results" appear to be completely divorced from and contradictory to your question's text, so it's far from obvious whether you actually need some generality;-):
from collections import defaultdict
separated = defaultdict(list)
for x in L:
separated[x['b']].append(x)
When this code executes, separated ends up with a dict (actually an instance of collections.defaultdict, a dict subclass) whose keys are all values for b that actually occur in dicts in list L, the corresponding values being the separated sublists. So, for example, if b takes only the values 0 and 1, separated[0] would be what (in your question's text as opposed to the example) you want as list A, and separated[1] what you want as list B.

Categories

Resources