python how to build a new list from this one - python

I have a list in the following format:
['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d',
'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
I want to create a new list which looks like like this:
['CASE_1:a,b,c,d','CASE_2:e,f,g,h']
Any idea how to get this done elegantly??

You can use a defaultdict by treating case as the key, and appending to the list each letter, where case and the letter are obtained by splitting the elements of your list on ':' - such as:
from collections import defaultdict
case_letters = defaultdict(list)
start = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
for el in start:
case, letter = el.split(':')
case_letters[case].append(letter)
result = sorted('{case}:{letters}'.format(case=key, letters=','.join(values)) for key, values in case_letters.iteritems())
print result
As this is homework (edit: or was!!?) - I recommend looking at collections.defaultdict, str.split (and other builtin string methods), at the builtin type list and it's methods (such as append, extend, sort etc...), str.format, the builtin sorted method and generally a dict in general. Use the working example here along with the final manual for reference - all these things will come in handy later on - so it's in your best interest to understand them as best you can.
One other thing to consider is that having something like:
{1: ['a', 'b', 'c', 'd'], 2: ['e', 'f', 'g', 'h']}
is a lot more of a useful format and could be used to recreate your desired list afterwards anyway...

I've deleted my full solution since I realized this is homework, but here's the basic idea:
A dictionary is a better data structure. I would look at a collections.defaultdict. e.g.
yourdict = defaultdict(list)
You can iterate through your list (splitting each element on ':'). Something like:
#only split string once -- resulting in a list of length 2.
case, value = element.split(':',1)
Then you can add these to the dict using the list .append method:
yourdict[case].append(value)
Now, you'll have a dict which maps keys (Case_1, Case_2) to lists (['a','b','c','d'], [...]).
If you really need a list, you can sort the items of the dictionary and join appropriately.
sigh. It looks like the homework tag has been removed (here's my original solution):
from collections import defaultdict
d = defaultdict(list)
for elem in yourlist:
case, value = elem.split(':', 1)
d[case].append(value)
Now you have a dictionary as I described above. If you really want to get your list back:
new_lst = [ case+':'+','.join(values) for case,values in sorted(d.items()) ]

data = ['CASE_1:a','CASE_1:b','CASE_1:c','CASE_1:d', 'CASE_2:e','CASE_2:f','CASE_2:g','CASE_2:h']
output = {}
for item in data:
key, value = item.split(':')
if key not in output:
output[key] = []
output[key].append(value)
result = []
for key, values in output.items():
result.append('%s:%s' % (key, ",".join(values)))
print result
outputs
['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']

mydict = {}
for item in list:
key,value = item.split(":")
if key in mydict:
mydict[key].append(value)
else:
mydict[key] = [value]
[key + ":" + ",".join(value) for key, value in mydict.iteritems()]
Not much elegance, to be honest. You know, I'd store your list as a dict, cause it behaves as a dict in fact.
output is ['CASE_2:e,f,g,h', 'CASE_1:a,b,c,d']

Related

Splitting a semicolon-separated with equal in a string

Below is the code:
s= "Name1=Value1;Name2=Value2;Name3=Value3"
dict(item.split("=") for item in s.split(";"))
I would like to understand how this works. Will it perform for loop first or will it split first?
List of dictionary
s1= "Name1=Value1,Name2=Value2,Name3=Value3;Name1=ValueA,Name2=ValueB,Name3=ValueC"
If you have python installed, I recommend using its interactive repl
With the repl you can run the parts of your program step by step:
s.split(";") will give you ['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
item.split("=") for item in s.split(";") will give you a python generator that iterates on the the list from step 1 and split it off like into smaller lists like this:
[['Name1', 'Value1'], ['Name2', 'Value2'], ['Name3', 'Value3']]
Finally dict(...) on the pairs will turn them into key-value pairs in a python dictionary like this:
{'Name1': 'Value1', 'Name2': 'Value2', 'Name3': 'Value3'}
dict is being passed a generator expression, which produces a sequence of lists by first calling s.split(";"), then yielding the result of item.split("=") for each value in the result of the first split. A more verbose version:
s = "..."
d = dict()
name_value_pairs = s.split(";")
for item in name_value_pairs:
name_value = item.split("=")
d.update([name_value])
I use d.update rather than something simpler like d[x] = y because both dict and d.update can accept the same kind of sequence of key/value pairs as arguments.
From here, we can reconstruct the original by eliminating one temporary variable at a time, from
s = "..."
d = dict()
for item in s.split(";"):
name_value = item.split("=")
d.update(name_value)
to
s = "..."
d = dict()
for item in s.split(";"):
d.update([item.split("=")])
to
s = "..."
d = dict(item.split("=") for item in s.split(";"))
If you write it like that, you might understand better what's happening.
s= "Name1=Value1;Name2=Value2;Name3=Value3"
semicolon_sep = s.split(";")
equal_sep = [item.split("=") for item in semicolon_sep]
a = dict(equal_sep)
print(a["Name1"])
First, it splits the text from wherever there is a semicolon. In this way, we create a list with three elements as "semicolon_sep":
>>> print(semicolon_sep)
['Name1=Value1', 'Name2=Value2', 'Name3=Value3']
Then, it makes a loop over this list to separate each item wherever there is "=". In this way, we have 2 columns for each item (Name and Value). By putting this list (equal_sep) in dict() we change the list to a dictionary.

sorting a list by names in python

I have a list of filenames. I need to group them based on the ending names after underscore ( _ ). My list looks something like this:
[
'1_result1.txt',
'2_result2.txt',
'3_result2.txt',
'4_result3.txt',
'5_result4.txt',
'6_result1.txt',
'7_result2.txt',
'8_result3.txt',
]
My end result should be:
List1 = ['1_result1.txt', '6_result1.txt']
List2 = ['2_result2.txt', '3_result2.txt', '7_result2.txt']
List3 = ['4_result3.txt', '8_result3.txt']
List4 = ['5_result4.txt']
This will come down to making a dictionary of lists, then iterating the input and adding each item to its proper list:
output = {}
for item in inlist:
output.setdefault(item.split("_")[1], []).append(item)
print output.values()
We use setdefault to make sure there's a list for the entry, then add our current filename to the list. output.values() will return just the lists, not the entire dictionary, which appears to be what you want.
using defaultdict from collections module:
from collections import defaultdict
output = defaultdict(list)
for file in data:
output[item.split("_")[1]].append(file)
print output.values()
using groupby from itertools module:
data.sort(key=lambda x: x.split('_')[1])
for key, group in groupby(data, lambda x: x.split('_')[1]):
print list(group)
Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.
The value of the key parameter should be a function that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record.
So if l is the name of your list then you could use something like :
l.sort(key=lambda s: s.split('_')[1])
More information about key functions at here

How to parse Python set-inside-list structure into text?

I have the following strange format I am trying to parse.
The data structure I am trying to parse is a "set" of key-value pairs in a list:
[{'key1:value1', 'key2:value2', 'key3:value3',...}]
That's the only data I have, and it needs to be processed. I don't think this can be described as a Python data structure, but I need to parse this to become a string like
'key1:value1, key2:value2, key3:value3'.
Is this doable?
EDIT: Yes, it is key:value, not key:value
Also, this is Python3.x
Iterating over .items() and formatting differently then previous answers.
If your data is the following: list of dict objects then
>>> data = [{'key1':'value1', 'key2':'value2', 'key3':'value3'}]
>>> ', '.join('{0}:{1}'.format(*item) for item in my_dict.items() for my_dict in data)
'key2:value2, key3:value3, key1:value1'
If you data is the list of set objects then approach is simpler
>>> from itertools import chain
>>> data = [{'key1:value1', 'key2:value2', 'key3:value3'}]
>>> ', '.join(chain.from_iterable(data))
'key1:value1, key2:value2, key3:value3'
UPD
NOTE: order can be changed, because set and dict objects are not ordered.
', '.join('{0}:{1}'.format(key, value) for key, value in my_dict.iteritems() for my_dict in my_list)
where my_list is name of your list variable
Since your structure (let's call it myStruct) is a set rather than a dict, the following code should do what you want:
result = ", ".join([x for x in myStruct[0]])
Beware, a set is not ordered, so you might end up with something like 'key2:value2, key1:value1, key3:value3'.
I would use itertools
d = {'key1':'value1', 'key2':'value2', 'key3':'value3'}
for k, v in d.iteritems():
print k + v + ',',
So, you have a list whose only element is a dictionary, and you want to get all the keys-value pairs from that dictionary and put them into a string?
If so, try something like this:
d = yourList[0]
s = ""
for key in d.keys():
s += key + ":" + d[key] + ", "
s = s[:-2] #To trim off the last comma that gets added
Try this,
print ', '.join(i for x in list_of_set for i in x)
If it's set there is no issue with the parsing. The code is equals to
output = ''
for x in list of_set:
for i in x:
output += i
print output

Correspendence between list indices originated from dictionary

I wrote the below code working with dictionary and list:
d = computeRanks() # dictionary of id : interestRank pairs
lst = list(d) # tuples (id, interestRank)
interestingIds = []
for i in range(20): # choice randomly 20 highly ranked ids
choice = randomWeightedChoice(d.values()) # returns random index from list
interestingIds.append(lst[choice][0])
There seems to be possible error because I'm not sure if there is a correspondence between indices in lst and d.values().
Do you know how to write this better?
One of the policies of dict is that the results of dict.keys() and dict.values() will correspond so long as the contents of the dictionary are not modified.
As #Ignacio says, the index choice does correspond to the intended element of lst, so your code's logic is correct. But your code should be much simpler: d already contains IDs for the elements, so rewrite randomWeightedChoice to take a dictionary and return an ID.
Perhaps it will help you to know that you can iterate over a dictionary's key-value pairs with d.items():
for k, v in d.items():
etc.

Efficient way to either create a list, or append to it if one already exists?

I'm going through a whole bunch of tuples with a many-to-many correlation, and I want to make a dictionary where each b of (a,b) has a list of all the a's that correspond to a b. It seems awkward to test for a list at key b in the dictionary, then look for an a, then append a if it's not already there, every single time through the tuple digesting loop; but I haven't found a better way yet. Does one exist? Is there some other way to do this that's a lot prettier?
See the docs for the setdefault() method:
setdefault(key[, default])
If key is
in the dictionary, return its value.
If not, insert key with a value of
default and return default. default
defaults to None.
You can use this as a single call that will get b if it exists, or set b to an empty list if it doesn't already exist - and either way, return b:
>>> key = 'b'
>>> val = 'a'
>>> print d
{}
>>> d.setdefault(key, []).append(val)
>>> print d
{'b': ['a']}
>>> d.setdefault(key, []).append('zee')
>>> print d
{'b': ['a', 'zee']}
Combine this with a simple "not in" check and you've done what you're after in three lines:
>>> b = d.setdefault('b', [])
>>> if val not in b:
... b.append(val)
...
>>> print d
{'b': ['a', 'zee', 'c']}
Assuming you're not really tied to lists, defaultdict and set are quite handy.
import collections
d = collections.defaultdict(set)
for a, b in mappings:
d[b].add(a)
If you really want lists instead of sets, you could follow this with a
for k, v in d.iteritems():
d[k] = list(v)
And if you really want a dict instead of a defaultdict, you can say
d = dict(d)
I don't really see any reason you'd want to, though.
Use collections.defaultdict
your_dict = defaultdict(list)
for (a,b) in your_list:
your_dict[b].append(a)
you can sort your tuples O(n log n) then create your dictionary O(n)
or simplier O(n) but could impose heavy load on memory in case of many tuples:
your_dict = {}
for (a,b) in your_list:
if b in your_dict:
your_dict[b].append(a)
else:
your_dict[b]=[a]
Hmm it's pretty much the same as you've described. What's awkward about that?
You could also consider using an sql database to do the dirty work.
Instead of using an if, AFAIK it is more pythonic to use a try block instead.
your_list=[('a',1),('a',3),('b',1),('f',1),('a',2),('z',1)]
your_dict={}
for (a,b) in your_list:
try:
your_dict[b].append(a)
except KeyError:
your_dict[b]=[a]
print your_dict
I am not sure how you will get out of the key test, but once they key/value pair has been initialized it is easy :)
d = {}
if 'b' not in d:
d['b'] = set()
d['b'].add('a')
The set will ensure that only 1 of 'a' is in the collection. You need to do the initial 'b' check though to make sure the key/value exist.
Dict get method?
It returns the value of my_dict[some_key] if some_key is in the dictionary, and if not - returns some default value ([] in the example below):
my_dict[some_key] = my_dict.get(some_key, []).append(something_else)
There's another way that's rather efficient (though maybe not as efficient as sets) and simple. It's similar in practice to defaultdict but does not require an additional import.
Granted that you have a dict with empty (None) keys, it means you also create the dict keys somewhere. You can do so with the dict.fromkeys method, and this method also allows for setting a default value to all keys.
keylist = ['key1', 'key2']
result = dict.fromkeys(keylist, [])
where result will be:
{'key1': [], 'key2': []}
Then you can do your loop and use result['key1'].append(..) directly

Categories

Resources