sorting a list by names in python - python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']

This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.

using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)

Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

Related

Splitting a semicolon-separated with equal in a string

Below is the code:
s= "Name1=Value1;Name2=Value2;Name3=Value3"
dict(item.split("=") for item in s.split(";"))
I would like to understand how this works. Will it perform for loop first or will it split first?
List of dictionary
s1= "Name1=Value1,Name2=Value2,Name3=Value3;Name1=ValueA,Name2=ValueB,Name3=ValueC"
If you have python installed, I recommend using its interactive repl
With the repl you can run the parts of your program step by step:
s.split(";") will give you ['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
item.split("=") for item in s.split(";") will give you a python generator that iterates on the the list from step 1 and split it off like into smaller lists like this:
[['Name1', 'Value1'], ['Name2', 'Value2'], ['Name3', 'Value3']]
Finally dict(...) on the pairs will turn them into key-value pairs in a python dictionary like this:
{'Name1': 'Value1', 'Name2': 'Value2', 'Name3': 'Value3'}
dict is being passed a generator expression, which produces a sequence of lists by first calling s.split(";"), then yielding the result of item.split("=") for each value in the result of the first split. A more verbose version:
s = "..."
d = dict()
name_value_pairs = s.split(";")
for item in name_value_pairs:
name_value = item.split("=")
d.update([name_value])
I use d.update rather than something simpler like d[x] = y because both dict and d.update can accept the same kind of sequence of key/value pairs as arguments.
From here, we can reconstruct the original by eliminating one temporary variable at a time, from
s = "..."
d = dict()
for item in s.split(";"):
name_value = item.split("=")
d.update(name_value)
to
s = "..."
d = dict()
for item in s.split(";"):
d.update([item.split("=")])
to
s = "..."
d = dict(item.split("=") for item in s.split(";"))
If you write it like that, you might understand better what's happening.
s= "Name1=Value1;Name2=Value2;Name3=Value3"
semicolon_sep = s.split(";")
equal_sep = [item.split("=") for item in semicolon_sep]
a = dict(equal_sep)
print(a["Name1"])
First, it splits the text from wherever there is a semicolon. In this way, we create a list with three elements as "semicolon_sep":
>>> print(semicolon_sep)
['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
Then, it makes a loop over this list to separate each item wherever there is "=". In this way, we have 2 columns for each item (Name and Value). By putting this list (equal_sep) in dict() we change the list to a dictionary.

Sorting a list of string using a custom order stored in another list

I have a list with 2 or 3 character strings with the last character being the same.
example_list = ['h1','ee1','hi1','ol1','b1','ol1','b1']
is there any way to sort this list using the order of another list.
order_list = ['ee','hi','h','b','ol']
So the answer should be something like example_list.sort(use_order_of=order_list)
Which should produce an output like ['ee1','hi1','h1','b1','b1','ol1','ol1']
I have found other questions on StackOverflow but I am still unable find a answer with a good explanation.
You could build an order_map that maps the prefixes to their sorting key, and then use that map for the key when calling sorted:
example_list = ['h1','ee1','hi1','ol1','b1','ol1','b1']
order_list = ['ee','hi','h','b','ol']
order_map = {x: i for i, x in enumerate(order_list)}
sorted(example_list, key=lambda x: order_map[x[:-1]])
This has an advantage over calling order_list.index for each element, as fetching elements from the dictionary is fast.
You can also make it work with elements that are missing from the order_list by using dict.get with a default value. If the default value is small (e.g. -1) then the values that don't appear in order_list will be put at the front of the sorted list. If the default value is large (e.g. float('inf')) then the values that don't appear in order_list will be put at the back of the sorted list.
You can use sorted with key using until the last string of each element in example_list:
sorted(example_list, key=lambda x: order_list.index(x[:-1]))
Ourput:
['ee1', 'hi1', 'h1', 'b1', 'b1', 'ol1', 'ol1']
Note that this assumes all element in example_list without the last character is in order_list
Something like this? It has the advantage of handling duplicates.
sorted_list = [
i
for i, _
in sorted(zip(example_list, order_list), key=lambda x: x[1])
]

groupby iterator not adding to list in dictionary comprehension

I have a db query that returns a list. I then do a a dictionary comprehension like so:
results = {product: [g for g in group] for product, group in groupby(db_results, lambda x: x.product_id)}
The problem is that the value of the dictionary is only returning 1 value. I assume this do to the fact that the group is an iterator.
The following returns each item of the group, so I know that they are there:
groups = groupby(db_results, lambda x: x.product_id)
for k,g in groups:
if k==1001:
print list(g)
I am trying to get all the values of g in the above in a list whose key is the key of dictionary.
I've tried many variations like:
blah = dict((k,list(v)) for k,v in groupby(db_results, key=lambda x: x.product_id))
but I can't get it right.
If you insist on using groupby, then you need to make sure that the input is sorted byt the same key that you group on, however, I think I would suggest that you use defaultdict instead:
from collections import defaultdict
blah = defaultdict(list)
for item in db_results:
blah[item.product_id].append(item)

how can I do reverse sort in python language?

I have write python code,get the key from the log,and do descent sort by advert_sum,when i call sorted function,
sorted(dict, cmp=lambda x,y: cmp(adver_num), reverse=False)
it reports not adver_num. How can i fix it? dict[].adver_num? I try some ways,and it still failed.
import re
dict={}
class log:
def __init__(self,query_num, adver_num):
self.query_num = query_num
self.adver_num = adver_num
f = open('result.txt','w')
for line in open("test.log"):
count_result = 0
query_num = 0
match=re.search('.*qry=(.*?)qi.*rc=(.*?)dis',line).groups()
counts=match[1].split('|')
for count in counts:
count_result += int(count)
if match[0].strip():
if not dict.has_key(match[0]):
dict[match[0]] = log(1,count_result)
else:
query_num = dict[match[0]].query_num+1;
count_result = dict[match[0]].adver_num+count_result;
dict[match[0]] = log(query_num,count_result)
#f.write("%s\t%s\n"%(match[0],count_result))
sorted(dict,cmp=lambda x,y:cmp(adver_num),reverse=False)
for i in dict.keys():
f.write("%s\t%s\t%s\n"%(i,dict[i].query_num,dict[i].adver_num)
First of all, dict can't be sorted, you need to use a list. Second, sorted function does not modify its argument, but returns a new list. Try calling sorted on any dictionary, you'll get a sorted list of keys as a return value.
sorted returns a sorted copy of whatever you give it, which in this case is a list of the keys in dict. I think what you want is this:
s = sorted(dict.iteritems(), key=lambda x: x[1].adver_num, reverse=True)
for (i, _) in s:
…
I'm not sure why you passed reverse=False. That's the default (which means it's redundant, at the very least), and means that you don't want it sorted in reverse order.

python how to build a new list from this one

I have a list in the following format:
['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d',
'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
I want to create a new list which looks like like this:
['CASE_1:a,b,c,d','CASE_2:e,f,g,h']
Any idea how to get this done elegantly??
You can use a defaultdict by treating case as the key, and appending to the list each letter, where case and the letter are obtained by splitting the elements of your list on ':' - such as:
from collections import defaultdict
case_letters = defaultdict(list)
start = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
for el in start:
case, letter = el.split(':')
case_letters[case].append(letter)
result = sorted('{case}:{letters}'.format(case=key, letters=','.join(values)) for key, values in case_letters.iteritems())
print result
As this is homework (edit: or was!!?) - I recommend looking at collections.defaultdict, str.split (and other builtin string methods), at the builtin type list and it's methods (such as append, extend, sort etc...), str.format, the builtin sorted method and generally a dict in general. Use the working example here along with the final manual for reference - all these things will come in handy later on - so it's in your best interest to understand them as best you can.
One other thing to consider is that having something like:
{1: ['a', 'b', 'c', 'd'], 2: ['e', 'f', 'g', 'h']}
is a lot more of a useful format and could be used to recreate your desired list afterwards anyway...
I've deleted my full solution since I realized this is homework, but here's the basic idea:
A dictionary is a better data structure. I would look at a collections.defaultdict. e.g.
yourdict = defaultdict(list)
You can iterate through your list (splitting each element on ':'). Something like:
#only split string once -- resulting in a list of length 2.
case, value = element.split(':',1)
Then you can add these to the dict using the list .append method:
yourdict[case].append(value)
Now, you'll have a dict which maps keys (Case_1, Case_2) to lists (['a','b','c','d'], [...]).
If you really need a list, you can sort the items of the dictionary and join appropriately.
sigh. It looks like the homework tag has been removed (here's my original solution):
from collections import defaultdict
d = defaultdict(list)
for elem in yourlist:
case, value = elem.split(':', 1)
d[case].append(value)
Now you have a dictionary as I described above. If you really want to get your list back:
new_lst = [ case+':'+','.join(values) for case,values in sorted(d.items()) ]
data = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
output = {}
for item in data:
key, value = item.split(':')
if key not in output:
output[key] = []
output[key].append(value)
result = []
for key, values in output.items():
result.append('%s:%s' % (key, ",".join(values)))
print result
outputs
['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']
mydict = {}
for item in list:
key,value = item.split(":")
if key in mydict:
mydict[key].append(value)
else:
mydict[key] = [value]
[key + ":" + ",".join(value) for key, value in mydict.iteritems()]
Not much elegance, to be honest. You know, I'd store your list as a dict, cause it behaves as a dict in fact.
output is ['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']

Categories

Resources