Related
I just by accident found this out:
>>> l = []
>>> l + 'a'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "str") to list
>>> l += 'a'
>>> l
['a']
>>> l + 'abcd'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "str") to list
>>> l += 'abcd'
>>> l
['a', 'a', 'b', 'c', 'd']
Is this expected behaviour? I can't find an explanation for this anywhere, and it seems really weird to me
Now testing further...
>>> l += 0
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
This led me to believe that str only works because it is an iterable, so I tried:
>>> class Test:
... l = [1, 2, 3]
...
... def __iter__(self):
... for i in self.l:
... yield i
...
>>> l += Test()
>>> l
['a', 'a', 'b', 'c', 'd', 1, 2, 3]
>>> l + Test()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: can only concatenate list (not "Test") to list
Now this seems pretty weird and not very pythonic, I was wondering if this might be a bug, any thoughts?
I would like to create some basic statistics for several lists of data and store them in a dictionary:
>>> from statistics import mean,median
>>> a,b,c=[1,2,3],[4,5,6],[7,8,9]
The following list comprehension works and outputs stats for "a":
>>> [eval("{}({})".format(op,a)) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
Assigning the list's variable name (a) to another object (dta) and evaluating "dta" in a list comprehension also works:
>>> dta="a"
>>> [eval("{}({})".format(op,eval("dta"))) for op in ['mean','median','min','max']]
[2, 2, 1, 3]
But when I try to tie this all together in a dictionary comprehension, it does not work:
>>> {k:[eval("{}({})".format(op,eval("k"))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <dictcomp>
File "<stdin>", line 1, in <listcomp>
File "<string>", line 1, in <module>
NameError: name 'k' is not defined
My guess is that the eval is processed before the comprehension, which is why 'k' is not yet defined? Any suggestions for how to get this work or a different routine that would accomplish the same output?
Do not quote the k in the inner eval:
{k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
^
Or drop eval altogether:
[[mean(k), median(k), min(k), max(k)] for k in [a, b, c]]
You can do a simple workaround with the keys to change this to a dictionary comprehension.
Try removing the quotation marks around k in your call to eval in the format function.
I ran the following commands:
> from statistics import mean,median
> a,b,c=[1,2,3],[4,5,6],[7,8,9]
> {k:[eval("{}({})".format(op,eval(k))) for op in ['mean','median','min','max']] for k in ['a','b','c']}
and got the following output:
{'a': [2.0, 2, 1, 3], 'c': [8.0, 8, 7, 9], 'b': [5.0, 5, 4, 6]}
this is mi first attempt with Python. I'm trying to use python with Apache Spark.
This is what i want to do:
l = sc.textFile("/user/cloudera/dataset.txt")
l = l.map(lambda x: map(int, x))
then i use cartesian function to obtain all possible combination of elements
lc = l.cartesian(l)
now for every couple i apply a function:
output = lc.map(lambda x: str(x[0]) + ";" + str(x[1]) + ";" + str(cosineSim(x[0], x[1])))`
my objective is to obtain strings like:
element1; element1; similarity
element1; element2; similarity
...
and so on..
when i call output.first() this is my output:
[45, 12, 7, 2, 2, 2, 2, 4, 7];[45, 12, 7, 2, 2, 2, 2, 4, 7];1.0
this is a string, indeed if i a do:
s = output.first()
type(s)
<type 'str'>
but if i execute output.collect() or output.saveAsTextFile(path) i have this error:
15/02/13 06:06:18 WARN TaskSetManager: Lost task 1.0 in stage 61.0 (TID 183, 10.39.127.148): org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/worker.py", line 107, in main
process()
File "/usr/lib/spark/python/pyspark/worker.py", line 98, in process
serializer.dump_stream(func(split_index, iterator), outfile)
File "/usr/lib/spark/python/pyspark/serializers.py", line 227, in dump_stream
vs = list(itertools.islice(iterator, batch))
File "<stdin>", line 2, in <lambda>
ValueError: invalid literal for int() with base 10: ''
what's wrong?
I think there must be an error in this formula:
l = l.map(lambda x: map(int, x))
Can you check that the l RDD always has values (no '')? If it doesn't, you get a typical Python error:
> In [32]: int('')
--------------------------------------------------------------------------- ValueError Traceback (most recent call
last) in ()
----> 1 int('')
ValueError: invalid literal for int() with base 10: ''
Moving forward, keep in mind that map are lazy evaluations which means that they are not computed until the next action (collect and save are actions) is instructed.
I started learning python today and found this very nice code visualization tool pythontutor.com, the problem is that I still don't quite get some of the syntax on the example code.
def listSum(numbers):
if not numbers:
return 0
else:
(f, rest) = numbers
return f + listSum(rest)
myList = (1, (2, (3, None)))
total = listSum(myList)
What does (f, rest) = numbers means?
It's tuple unpacking.
There needs to be 2 items in the tuple when used in this way. More or less will result in an exception, as shown below.
>>> numbers = (1, 2, 3, 4, 5)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
>>> numbers = (1, 2)
>>> (f, rest) = numbers
>>> print f
1
>>> print rest
2
>>> numbers = (1)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'int' object is not iterable
>>> numbers = (1,)
>>> (f, rest) = numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 1 value to unpack
Note that (1) and (1, )are syntactically different, with only the latter being a tuple.
See the Python Doc on Tuples and Sequences for more details.
(f, rest) = numbers
unpacks the tuple. That is, it takes the two values stored in numbers and stores them in f and rest, respectively. Note that the number of variables you unpack into must be the same as the number of values in the tuple, or else an exception will be thrown.
Tupple is a data structure in which you can store multiple items under one name.
Lets say that we have a tupple(t) with two items.
Then t[0] = first_item and t[1] = sencond_item
Another way of accessing the tupple item is:
(f, rest) = numbers
In this syntax numbers (tupple) must have 2 items only otherwise it is an exception
f = numbers[0]
rest = numbers[1]
I need to have a dictionary which might have same names for some keys and return a list of values when referencing the key in that case.
For example
print mydict['key']
[1,2,3,4,5,6]
For consistency, you should have the dictionary map keys to lists (or sets) of values, of which some can be empty. There is a nice idiom for this:
from collections import defaultdict
d = defaultdict(set)
d["key"].add(...)
(A defaultdict is like a normal dictionary, but if a key is missing it will call the argument you passed in when you instantiated it and use the result as the default value. So this will automatically create an empty set of values if you ask for a key which isn't already present.)
If you need the object to look more like a dictionary (i.e. to set a value by d["key"] = ...) you can do the following. But this is probably a bad idea, because it goes against the normal Python syntax, and is likely to come back and bite you later. Especially if someone else has to maintain your code.
class Multidict(defaultdict):
def __init__(self):
super(Multidict, self).__init__(set)
def __setitem__(self, key, value):
if isinstance(value, (self.default_factory)): # self.default_factory is `set`
super().__setitem__(key, value)
else:
self[key].append(value)
I haven't tested this.
You can also try paste.util.multidict.MultiDict
$ easy_install Paste
Then:
from paste.util.multidict import MultiDict
d = MultiDict()
d.add('a', 1)
d.add('a', 2)
d.add('b', 3)
d.mixed()
>>> {'a': [1, 2], 'b': 3}
d.getall('a')
>>> [1, 2]
d.getall('b')
>>> [3]
Web frameworks like Pylons are using this library to handle HTTP query string/post data, which can have same-name keys.
You can use:
myDict = {'key': []}
Then during runtime:
if newKey in myDict:
myDict[newKey].append(value)
else:
myDict[newKey] = [value]
Edited as per #Ben's comment:
myDict = {}
myDict.setdefault(newKey, []).append(value)
This is an ideal place to use a defaultdict object from the collections library
from collections import defaultdict
mydict = defaultdict(set)
mydict['key'] += set([1,2,3,4])
mydict['key'] += set([4,5,6])
print(mydict['key'])
returns [1,2,3,4,5,6]
In the case where a key is referenced that has not been implicitly assigned, an empty set is returned.
print(mydict['bad_key'])
returns []
Using setdefault on a dict from the standard library would require a significant change in your syntax when assigning values and can get rather messy. I've never used Multidict, but it also looks like a significant change in the way assignments are made. Using this method, you simply assume that there may already be a value associated with this key in the dictionary and slightly modify your assignment operator by using the '+=' operator when assigning key values.
FYI - I am a big fan of using the NoneType as the default which results in any access of an invalid key returning None. This behaves properly in most cases including iterating and json dumps, but for your specific need the default should be of type set unless you want to enable having duplicate values stored in the key. Then use a list. In fact, anytime you have a homogenous dictionary the default should be of that type.
mydict = defaultdict(lambda: None)
I'm unsatisfied with all the proposed solutions, so this is my solution. This is for Python 3. Code is below.
EXAMPLES
(code is below)
>>> a = MultiDict({0: [0]})
>>> a
MultiDict({0: [0]})
>>> a[0] = (1, 7)
>>> a
MultiDict({0: [1, 7]})
>>> a.add(0, 2)
>>> a
MultiDict({0: [1, 7, 2]})
>>> a.add(1, 2)
>>> a
MultiDict({0: [1, 7, 2], 1: [2]})
>>> a.getfirst(0)
1
>>> a.getfirst(3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 61, in getfirst
File "<stdin>", line 17, in __getitem__
KeyError: 3
>>> len(a)
2
>>> tuple(a.items())
((0, [1, 7, 2]), (1, [2]))
>>> tuple(a.values())
([1, 7, 2], [2])
>>> a.get(0)
[1, 7, 2]
>>> tuple(a.multiitems())
((0, 1), (0, 7), (0, 2), (1, 2))
>>> tuple(a.multikeys())
(0, 0, 0, 1)
>>> tuple(a.multivalues())
(1, 7, 2, 2)
>>> a.remove(0, 1)
>>> a
MultiDict({0: [7, 2], 1: [2]})
>>> a.remove(3, 5)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 53, in remove
File "<stdin>", line 17, in __getitem__
KeyError: 3
>>> a.remove(0, 5)
Traceback (most recent call last):
File "<stdin>", line 53, in remove
ValueError: list.remove(x): x not in list
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 56, in remove
ValueError: No element with value 5 for key 0
>>> b = MultiDict({0: [7, 2], 1: [2]})
>>> b == a
True
>>> c = MultiDict(a)
>>> c
MultiDict({0: [7, 2], 1: [2]})
>>> d = MultiDict({0: 0})
Traceback (most recent call last):
File "<stdin>", line 30, in __init__
TypeError: 'int' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 33, in __init__
TypeError: Values must be iterables, found 'int' for key 0
>>> a.pop(0)
[7, 2]
>>> a
MultiDict({1: [2]})
>>> c.popitem()
(0, [7, 2])
>>> c.setdefault(0, [1])
[1]
>>> c
MultiDict({0: [1], 1: [2]})
>>> c.setdefault(0, [2])
[1]
>>> c
MultiDict({0: [1], 1: [2]})
>>> c.setdefault(3)
[]
>>> c
MultiDict({0: [1], 1: [2], 3: []})
>>> c.getfirst(3)
Traceback (most recent call last):
File "<stdin>", line 61, in getfirst
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 63, in getfirst
IndexError: No values in key 3
>>> c.clear()
>>> c
MultiDict({})
>>> c.update(b)
>>> c
MultiDict({0: [7, 2], 1: [2]})
>>> d = c.copy()
>>> d == c
True
>>> id(d) == id(c)
False
>>> MultiDict.fromkeys((0, 1), [5])
MultiDict({0: [5], 1: [5]})
>>> MultiDict.fromkeys((0, 1))
MultiDict({0: [], 1: []})
CODE
try:
from collections.abc import MutableMapping
except ImportError: # python < 3.3
from collections import MutableMapping
class MultiDict(MutableMapping):
#classmethod
def fromkeys(cls, seq, value=None, *args, **kwargs):
if value is None:
v = []
else:
v = value
return cls(dict.fromkeys(seq, v, *args, **kwargs))
def __setitem__(self, k, v):
self._dict[k] = list(v)
def __getitem__(self, k):
return self._dict[k]
def __iter__(self):
for k in self._dict:
yield k
def __init__(self, *args, **kwargs):
self._dict = dict(*args, **kwargs)
for k, v in self._dict.items():
try:
self._dict[k] = list(v)
except TypeError:
err_str = "Values must be iterables, found '{t}' for key {k}"
raise TypeError(err_str.format(k=k, t=type(v).__name__))
def __delitem__(self, k):
del self._dict[k]
def __len__(self):
return len(self._dict)
def add(self, k, v):
if not k in self:
self[k] = []
self[k].append(v)
def remove(self, k, v):
try:
self[k].remove(v)
except ValueError:
err_str = "No element with value {v} for key {k}"
raise ValueError(err_str.format(v=v, k=k))
def getfirst(self, k):
try:
res = self[k][0]
except IndexError:
raise IndexError("No values in key {k}".format(k=k))
return self[k][0]
def multiitems(self):
for k, v in self.items():
for vv in v:
yield (k, vv)
def multikeys(self):
for k, v in self.items():
for vv in v:
yield k
def multivalues(self):
for v in self.values():
for vv in v:
yield vv
def setdefault(self, k, default=None):
if default is None:
def_val = []
else:
def_val = default
if k not in self:
self[k] = def_val
return self[k]
def copy(self):
return self.__class__(self)
def __repr__(self):
return (
self.__class__.__name__ +
"({{{body}}})".format(body=self._dict)
)
SOME VERBOSE EXPLAINATION
For simplicity, the constructor is the same as dict. All values passed to the constructor, or assigned directly to a key, must be iterables.
All the values of my MultiDict are lists, even if value is only one. This is to avoid confusion.
I added also a remove method to delete a single entry from the MultiDict. Furthermore I added a multiitems, that iters over the couple (key, value) over all the values of the dictionary. multikeys and multivalues are similar.
ALTERNATIVES
You can also use aiohttp, WebOp or Werkzeug implementations of MultiDict.
def toMultiDict(items):
def insertMulti(d, kv):
k, v = kv
d.setdefault(k, []).append(v)
return d
return reduce(insertMulti, [{}] + items)
should create a dict from key to a list of values:
In [28]: toMultiDict(zip([1,2,1], [4,5,6]))
Out[28]: {1: [4, 6], 2: [5]}
I couldn't put insertMulti into a lambda, because the lambda needs to return the dict again.