Union Operators to Dict - python

How can I convert
x = 'a=b&c=d'
to
{'a':'b', 'c':'d'}
without explicitly looking and replacing & and = to , and : respectively?

You have a DSL (domain specific language) for defining a dict. You have to parse the string, then evaluate the result. Luckily, that's as simple as
d = dict(kv.split("=", 1) for kv in x.split("&"))

I might be misinterpreting the meaning of "explicitly looking and replacing". I would argue you can't do it without doing so.
But I also see no reason to avoid that.
x = 'a=b&c=d'
mydict = dict(map(lambda y: y.split('='), x.split('&')))
What this does is best understood reading from the inside out.
It takes the string 'a=b&c=d' and turns it into a list of substrings that used to have an ampersand in between them.
It takes one such 'a=b' string and splits again, this time on the equal sign.
This operation is performed on the whole list, so we end up with a list of lists.
[['a', 'b'], ['c', 'd']]
dict() turns that into a dictionary

>>> x
'a=b&c=d'
>>> mydict[x[0]]=x[2]
>>> mydict[x[4]]=x[6]
>>> mydict
{'a': 'b', 'c': 'd'}
>>> #note that '&' and '=' are not "looked for" nor replaced

Related

How to arrange the output of set based on predefined list

list1=['f','l','a','m','e','s'] #This is the predefined list
list2=['e','e','f','a','s','a'] #This is the list with repitition
x=list(set(list2)) # I want to remove duplicates
print(x)
Here I want the variable x to retain the order which list1 has. For example, if at one instance set(list2) produces the output as ['e','f','a','s'], I want it to produce ['f','a','e','s'] (Just by following the order of list1).
Can anyone help me with this?
Construct a dictionary that maps characters to their position in list1. Use its get method as the sort-key.
>>> dict1 = dict(zip(list1, range(len(list1))))
>>> sorted(set(list2), key=dict1.get)
['f', 'a', 'e', 's']
This is one way using dictionary:
list1=['f','l','a','m','e','s'] #This is the predefined list
list2=['e','e','f','a','s','a'] #This is the list with repitition
x=list(set(list2)) # I want to remove duplicates
d = {key:value for value, key in enumerate(list1)}
x.sort(key=d.get)
print(x)
# ['f', 'a', 'e', 's']
Method index from the list class can do the job:
sorted(set(list2), key=list1.index)
What is best usually depends on actual use. With this problem it is important to know the expected sizes of the lists to choose the most efficient approach. If we are keeping much of the dictionary the following query works well and has the additional benefit that it is easy to read.
set2 = set(list2)
x = [i for i in list1 if i in set2]
It would also work without turning list2 into a set first. However, this would run much slower with a large list2.

Conversion of Strings to list

Can someone please help me with a simple code that returns a list as an output to list converted to string? The list is NOT like this:
a = u"['a','b','c']"
but the variable is like this:
a = '[a,b,c]'
So,
list(a)
would yield the following output
['[', 'a', ',', 'b', ',', 'c', ']']
instead I want the input to be like this:
['a', 'b', 'c']
I have even tried using the ast.literal_eval() function - on using which I got a ValueError exception stating the argument is a malformed string.
There is no standard library that'll load such a list. But you can trivially do this with string processing:
a.strip('[]').split(',')
would give you your list.
str.strip() will remove any of the given characters from the start and end; so it'll remove any and all [ and ] characters from the start until no such characters are found anymore, then remove the same characters from the end. That suffices nicely for your input sample.
str.split() then splits the remainder (minus the [ and ] characters at either end) into separate strings at any point there is a comma:
>>> a = '[a,b,c]'
>>> a.strip('[]')
'a,b,c'
>>> a.strip('[]').split(',')
['a', 'b', 'c']
Let us use hack.
import string
x = "[a,b,c]"
for char in x:
if char in string.ascii_lowercase:
x = x.replace(char, "'%s'" % char)
# Now x is "['a', 'b', 'c']"
lst = eval(x)
This checks if a character is in the alphabet(lowercase) if it is, it replaces it with a character with single quotes around it.
Why not use this solution ?:
Fails for duplicate elements
Fails for elements with more than single characters.
You need to be careful about confusing single quote and double quotes
Why use this solution ?:
There are no reasons to use this solution rather than Martijn's. But it was fun coding it anyway.
I wish you luck in your problem.

Sort strings based on the number of distinct characters

I am confused why the code below, which is looking to sort strings based on their number of distinct alphabets, requires the set() and list() portions.
strings = ['foo', 'card', 'bar', 'aaaa', 'abab']
strings.sort(key = lambda x: len(set(list(x))))
print(strings)
Thanks
In fact, the key of that code is the set() function. Why? Because it will return a set with not-repeated elements. For example:
set('foo') -> ['f', 'o']
set('aaaa') -> ['a']
set('abab') -> ['a', 'b']
Then, in order to sort based on the number of distinct alphabets, the len() function is used.
Nice question! Let's peel the layers off the sort() call.
According to the Python docs on sort and sorted,
key specifies a function of one argument that is used to extract a comparison key from each list element: key=str.lower. The default value is None (compare the elements directly).
That is, sort takes a keyword argument key and expects it to be a function. Specifically, it wants a key(x) function that will be used to generate a key value for each string in strings list, instead of the usual lexical ordering. In the Python shell:
>>> key = lambda x: len(set(list(x)))
>>> ordering = [key(x) for x in strings]
>>> ordering
[2, 3, 1, 2, 2, 4]
This could be any ordering scheme you like. Here, we want to order by the number of unique letters. That's where set and list come in. list("foo") will result in ['f', 'o', 'o']. Then we get len(list('foo')) == 3 -- the length of the word. Not the number of unique characters.
>>> key2 = lambda x: len(list(x))
>>> ordering2 = [key2(x) for x in strings]
>>> ordering2
[3, 3, 4, 4, 4, 4]
So we use set and list to get a set of characters. A set is like a list, except they only include the unique elements of a list. For instance we can make a list of characters for any word like this:
>>> list(strings[0])
['f', 'o', 'o']
And a set:
>>> set(list(strings[0]))
set(['o', 'f'])
The len() of that set is 2, so when sort goes to compare the "foo" in strings[0] to all the other strings[x] in strings, it uses this list. For example:
>>> (len(set(strings[0][:])) < len(set(strings[1][:])))
True
Which gives us the ordering we want.
EDIT: #PeterGibson pointed out above that list(string[i]) isn't needed. This is true because strings are iterable in Python, just like lists:
>>> set("foo")
set(['o', 'f'])

Python slice first and last element in list

Is there a way to slice only the first and last item in a list?
For example; If this is my list:
>>> some_list
['1', 'B', '3', 'D', '5', 'F']
I want to do this (obviously [0,-1] is not valid syntax):
>>> first_item, last_item = some_list[0,-1]
>>> print first_item
'1'
>>> print last_item
'F'
Some things I have tried:
In [3]: some_list[::-1]
Out[3]: ['F', '5', 'D', '3', 'B', '1']
In [4]: some_list[-1:1:-1]
Out[4]: ['F', '5', 'D', '3']
In [5]: some_list[0:-1:-1]
Out[5]: []
...
One way:
some_list[::len(some_list)-1]
A better way (Doesn't use slicing, but is easier to read):
[some_list[0], some_list[-1]]
Python 3 only answer (that doesn't use slicing or throw away the rest of the list, but might be good enough anyway) is use unpacking generalizations to get first and last separate from the middle:
first, *_, last = some_list
The choice of _ as the catchall for the "rest" of the arguments is arbitrary; they'll be stored in the name _ which is often used as a stand-in for "stuff I don't care about".
Unlike many other solutions, this one will ensure there are at least two elements in the sequence; if there is only one (so first and last would be identical), it will raise an exception (ValueError).
Just thought I'd show how to do this with numpy's fancy indexing:
>>> import numpy
>>> some_list = ['1', 'B', '3', 'D', '5', 'F']
>>> numpy.array(some_list)[[0,-1]]
array(['1', 'F'],
dtype='|S1')
Note that it also supports arbitrary index locations, which the [::len(some_list)-1] method would not work for:
>>> numpy.array(some_list)[[0,2,-1]]
array(['1', '3', 'F'],
dtype='|S1')
As DSM points out, you can do something similar with itemgetter:
>>> import operator
>>> operator.itemgetter(0, 2, -1)(some_list)
('1', '3', 'F')
first, last = some_list[0], some_list[-1]
Some people are answering the wrong question, it seems. You said you want to do:
>>> first_item, last_item = some_list[0,-1]
>>> print first_item
'1'
>>> print last_item
'F'
Ie., you want to extract the first and last elements each into separate variables.
In this case, the answers by Matthew Adams, pemistahl, and katrielalex are valid. This is just a compound assignment:
first_item, last_item = some_list[0], some_list[-1]
But later you state a complication: "I am splitting it in the same line, and that would have to spend time splitting it twice:"
x, y = a.split("-")[0], a.split("-")[-1]
So in order to avoid two split() calls, you must only operate on the list which results from splitting once.
In this case, attempting to do too much in one line is a detriment to clarity and simplicity. Use a variable to hold the split result:
lst = a.split("-")
first_item, last_item = lst[0], lst[-1]
Other responses answered the question of "how to get a new list, consisting of the first and last elements of a list?" They were probably inspired by your title, which mentions slicing, which you actually don't want, according to a careful reading of your question.
AFAIK are 3 ways to get a new list with the 0th and last elements of a list:
>>> s = 'Python ver. 3.4'
>>> a = s.split()
>>> a
['Python', 'ver.', '3.4']
>>> [ a[0], a[-1] ] # mentioned above
['Python', '3.4']
>>> a[::len(a)-1] # also mentioned above
['Python', '3.4']
>>> [ a[e] for e in (0,-1) ] # list comprehension, nobody mentioned?
['Python', '3.4']
# Or, if you insist on doing it in one line:
>>> [ s.split()[e] for e in (0,-1) ]
['Python', '3.4']
The advantage of the list comprehension approach, is that the set of indices in the tuple can be arbitrary and programmatically generated.
What about this?
>>> first_element, last_element = some_list[0], some_list[-1]
You can do it like this:
some_list[0::len(some_list)-1]
You can use something like
y[::max(1, len(y)-1)]
if you really want to use slicing. The advantage of this is that it cannot give index errors and works with length 1 or 0 lists as well.
Actually, I just figured it out:
In [20]: some_list[::len(some_list) - 1]
Out[20]: ['1', 'F']
This isn't a "slice", but it is a general solution that doesn't use explicit indexing, and works for the scenario where the sequence in question is anonymous (so you can create and "slice" on the same line, without creating twice and indexing twice): operator.itemgetter
import operator
# Done once and reused
first_and_last = operator.itemgetter(0, -1)
...
first, last = first_and_last(some_list)
You could just inline it as (after from operator import itemgetter for brevity at time of use):
first, last = itemgetter(0, -1)(some_list)
but if you'll be reusing the getter a lot, you can save the work of recreating it (and give it a useful, self-documenting name) by creating it once ahead of time.
Thus, for your specific use case, you can replace:
x, y = a.split("-")[0], a.split("-")[-1]
with:
x, y = itemgetter(0, -1)(a.split("-"))
and split only once without storing the complete list in a persistent name for len checking or double-indexing or the like.
Note that itemgetter for multiple items returns a tuple, not a list, so if you're not just unpacking it to specific names, and need a true list, you'd have to wrap the call in the list constructor.
How about this?
some_list[:1] + some_list[-1:]
Result: ['1', 'F']
More General Case: Return N points from each end of list
The answers work for the specific first and last, but some, like myself, may be looking for a solution that can be applied to a more general case in which you can return the top N points from either side of the list (say you have a sorted list and only want the 5 highest or lowest), i came up with the following solution:
In [1]
def GetWings(inlist,winglen):
if len(inlist)<=winglen*2:
outlist=inlist
else:
outlist=list(inlist[:winglen])
outlist.extend(list(inlist[-winglen:]))
return outlist
and an example to return bottom and top 3 numbers from list 1-10:
In [2]
GetWings([1,2,3,4,5,6,7,8,9,10],3)
#Out[2]
#[1, 2, 3, 8, 9, 10]
Fun new approach to "one-lining" the case of an anonymously split thing such that you don't split it twice, but do all the work in one line is using the walrus operator, :=, to perform assignment as an expression, allowing both:
first, last = (split_str := a.split("-"))[0], split_str[-1]
and:
first, last = (split_str := a.split("-"))[::len(split_str)-1]
Mind you, in both cases it's essentially exactly equivalent to doing on one line:
split_str = a.split("-")
then following up with one of:
first, last = split_str[0], split_str[-1]
first, last = split_str[::len(split_str)-1]
including the fact that split_str persists beyond the line it was used and accessed on. It's just technically meeting the requirements of one-lining, while being fairly ugly. I'd never recommend it over unpacking or itemgetter solutions, even if one-lining was mandatory (ruling out the non-walrus versions that explicitly index or slice a named variable and must refer to said named variable twice).

Filtering lists

I want to filter repeated elements in my list
for instance
foo = ['a','b','c','a','b','d','a','d']
I am only interested with:
['a','b','c','d']
What would be the efficient way to do achieve this ?
Cheers
list(set(foo)) if you are using Python 2.5 or greater, but that doesn't maintain order.
Cast foo to a set, if you don't care about element order.
Since there isn't an order-preserving answer with a list comprehension, I propose the following:
>>> temp = set()
>>> [c for c in foo if c not in temp and (temp.add(c) or True)]
['a', 'b', 'c', 'd']
which could also be written as
>>> temp = set()
>>> filter(lambda c: c not in temp and (temp.add(c) or True), foo)
['a', 'b', 'c', 'd']
Depending on how many elements are in foo, you might have faster results through repeated hash lookups instead of repeated iterative searches through a temporary list.
c not in temp verifies that temp does not have an item c; and the or True part forces c to be emitted to the output list when the item is added to the set.
>>> bar = []
>>> for i in foo:
if i not in bar:
bar.append(i)
>>> bar
['a', 'b', 'c', 'd']
this would be the most straightforward way of removing duplicates from the list and preserving the order as much as possible (even though "order" here is inherently wrong concept).
If you care about order a readable way is the following
def filter_unique(a_list):
characters = set()
result = []
for c in a_list:
if not c in characters:
characters.add(c)
result.append(c)
return result
Depending on your requirements of speed, maintanability, space consumption, you could find the above unfitting. In that case, specify your requirements and we can try to do better :-)
If you write a function to do this i would use a generator, it just wants to be used in this case.
def unique(iterable):
yielded = set()
for item in iterable:
if item not in yielded:
yield item
yielded.add(item)
Inspired by Francesco's answer, rather than making our own filter()-type function, let's make the builtin do some work for us:
def unique(a, s=set()):
if a not in s:
s.add(a)
return True
return False
Usage:
uniq = filter(unique, orig)
This may or may not perform faster or slower than an answer that implements all of the work in pure Python. Benchmark and see. Of course, this only works once, but it demonstrates the concept. The ideal solution is, of course, to use a class:
class Unique(set):
def __call__(self, a):
if a not in self:
self.add(a)
return True
return False
Now we can use it as much as we want:
uniq = filter(Unique(), orig)
Once again, we may (or may not) have thrown performance out the window - the gains of using a built-in function may be offset by the overhead of a class. I just though it was an interesting idea.
This is what you want if you need a sorted list at the end:
>>> foo = ['a','b','c','a','b','d','a','d']
>>> bar = sorted(set(foo))
>>> bar
['a', 'b', 'c', 'd']
import numpy as np
np.unique(foo)
You could do a sort of ugly list comprehension hack.
[l[i] for i in range(len(l)) if l.index(l[i]) == i]

Categories

Resources