Python: How to write this code from python shell into a function? - python

What's happening here is that the first and second element of every tuple are getting multiplied, and it adds all of the products in the end. I know how to enter it in the Python shell, but how do I write it out as a function? Thanks for the help.
>>> x = [(70.9, 1, 24.8),
(15.4, 2, 70.5),
(30.0, 3, 34.6),
(25.0, 4, 68.4),
(45.00, 5, 99.0)]
>>> result = (a[0]*a[1] for a in x)
>>> sum(result)
>>> 516.7

Create the function:
def my_func(x):
result = (a[0]*a[1] for a in x)
return sum(result)
Call the function:
x = [(70.9, 1, 24.8),
(15.4, 2, 70.5),
(30.0, 3, 34.6),
(25.0, 4, 68.4),
(45.00, 5, 99.0)]
my_func(x)
Result will be 516.7

using numpy packege dot product also we can archive this easily
import numpy as np
x = [(70.9, 1, 24.8),(15.4, 2, 70.5),(30.0, 3, 34.6),(25.0, 4, 68.4),(45.00, 5, 99.0)]
def func(list):
nmpyArray = np.array(list)
mul = np.dot(nmpyArray[:, 0], nmpyArray[:, 1])
print(mul)
return mul
func(x)

Related

Python: how to extending the "ans = ops[op](*nums)" woking for more than two elements in nums

below is the background context of the question:
....
ops = {'+':add,'-':sub}
op = choice('+-')
nums = [randint(1,10) for i in range(2)]
....
ans = ops[op](*nums)
Can anyone help me out here?
You can use reduce, which accepts a function and an iterable, and apply the function cumulatively to the iterable.
>>> from operator import add, sub
>>> from random import choice, randint
>>> ops = {'+':add, '-':sub}
>>> op = choice('+-')
>>> op
'+'
>>> nums = [randint(1,10) for i in range(10)]
>>> nums
[7, 8, 1, 4, 10, 7, 10, 7, 10, 1]
>>> reduce(ops[op], nums)
65
reduce(add, [1, 2, 3]) => add(add(1, 2), 3)
If you want to use reduce in Python 3.x, import it from functools:
from functools import reduce

PySpark's reduceByKey not working as expected

I'm writing a large PySpark program and I've recently run into trouble when using reduceByKey on an RDD. I've been able to recreate the problem with a simple test program. The code is:
from pyspark import SparkConf, SparkContext
APP_NAME = 'Test App'
def main(sc):
test = [(0, [i]) for i in xrange(100)]
test = sc.parallelize(test)
test = test.reduceByKey(method)
print test.collect()
def method(x, y):
x.append(y[0])
return x
if __name__ == '__main__':
# Configure Spark
conf = SparkConf().setAppName(APP_NAME)
conf = conf.setMaster('local[*]')
sc = SparkContext(conf=conf)
main(sc)
I would expect the output to be (0, [0,1,2,3,4,...,98,99]) based on the Spark documentation. Instead, I get the following output:
[(0, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 24, 36, 48, 60, 72, 84])]
Could someone please help me understand why this output is being generated?
As a side note, when I use
def method(x, y):
x = x + y
return x
I get the expected output.
First of all it looks like you actually want groupByKey not reduceByKey:
rdd = sc.parallelize([(0, i) for i in xrange(100)])
grouped = rdd.groupByKey()
k, vs = grouped.first()
assert len(list(vs)) == 100
Could someone please help me understand why this output is being generated?
reduceByKey assumes that f is associative and your method is clearly not. Depending on the order of operations the output is different. Lets say you start with following data for a certain key:
[1], [2], [3], [4]
Now add lets add some parentheses:
((([1], [2]), [3]), [4])
(([1, 2], [3]), [4])
([1, 2, 3], [4])
[1, 2, 3, 4]
and with another set of parentheses
(([1], ([2], [3])), [4])
(([1], [2, 3]), [4])
([1, 2], [4])
[1, 2, 4]
When you rewrite it as follows:
method = lambda x, y: x + y
or simply
from operator import add
method = add
you get an associative function and it works as expected.
Generally speaking for reduce* operations you want functions which are both associative and commutative.

Python yield a list with generator

I was getting confused by the purpose of "return" and "yield"
def countMoreThanOne():
return (yy for yy in xrange(1,10,2))
def countMoreThanOne():
yield (yy for yy in xrange(1,10,2))
What is the difference on the above function?
Is it impossible to access the content inside the function using yield?
In first you return a generator
from itertools import chain
def countMoreThanOne():
return (yy for yy in xrange(1,10,2))
print list(countMoreThanOne())
>>>
[1, 3, 5, 7, 9]
while in this you are yielding a generator so that a generator within the generator
def countMoreThanOne():
yield (yy for yy in xrange(1,10,2))
print list(countMoreThanOne())
print list(chain.from_iterable(countMoreThanOne()))
[<generator object <genexpr> at 0x7f0fd85c8f00>]
[1, 3, 5, 7, 9]
if you use list comprehension then difference can be clearly seen:-
in first:-
def countMoreThanOne():
return [yy for yy in xrange(1,10,2)]
print countMoreThanOne()
>>>
[1, 3, 5, 7, 9]
def countMoreThanOne1():
yield [yy for yy in xrange(1,10,2)]
print countMoreThanOne1()
<generator object countMoreThanOne1 at 0x7fca33f70eb0>
>>>
After reading your other comments I think you should write the function like this:
def countMoreThanOne():
return xrange(1, 10, 2)
>>> print countMoreThanOne()
xrange(1, 11, 2)
>>> print list(countMoreThanOne())
[1, 3, 5, 7, 9]
or even better, to have some point in making it a function:
def oddNumbersLessThan(stop):
return xrange(1, stop, 2)
>>> print list(oddNumbersLessThan(15))
[1, 3, 5, 7, 9, 11, 13]

Sorting in Sparse Matrix

I have a sparse matrix. I need to sort this matrix row-by-row and create another [sparse] matrix.
Code may explain it better:
# for `rand` function, you need newer version of scipy.
from scipy.sparse import *
m = rand(6,6, density=0.6)
d = m.getrow(0)
print d
Output1
(0, 5) 0.874881629788
(0, 4) 0.352559852239
(0, 2) 0.504791645463
(0, 1) 0.885898140175
I have this m matrix. I want to create a new matrix with sorted version of m. The new matrix
contains 0'th row like this.
new_d = new_m.getrow(0)
print new_d
Output2
(0, 1) 0.885898140175
(0, 5) 0.874881629788
(0, 2) 0.504791645463
(0, 4) 0.352559852239
So I can obtain which column is bigger etc:
print new_d.indices
Output3
array([1, 5, 2, 4])
Of course every row should be sorted like above independently.
I have one solution for this problem but it is not elegant.
If you're willing to ignore the zero-value elements of the matrix, the code below should work. It is also much faster than implementations that use the getrow method, which is rather slow.
from itertools import izip
def sort_coo(m):
tuples = izip(m.row, m.col, m.data)
return sorted(tuples, key=lambda x: (x[0], x[2]))
For example:
>>> from numpy.random import rand
>>> from scipy.sparse import coo_matrix
>>>
>>> d = rand(10, 20)
>>> d[d > .05] = 0
>>> s = coo_matrix(d)
>>> sort_coo(s)
[(0, 2, 0.004775589084940246),
(3, 12, 0.029941507166614145),
(5, 19, 0.015030386789436245),
(7, 0, 0.0075044957259399192),
(8, 3, 0.047994403933129481),
(8, 5, 0.049401058471327031),
(9, 15, 0.040011608000125043),
(9, 8, 0.048541825332137023)]
Depending on your needs you may want to tweak the sort keys in the lambda or further process the output. If you want everything in a row indexed dictionary you could do:
from collections import defaultdict
sorted_rows = defaultdict(list)
for i in sort_coo(m):
sorted_rows[i[0]].append((i[1], i[2]))
My bad solution is like this:
from scipy.sparse import coo_matrix
import numpy as np
a = []
for i in xrange(m.shape[0]): # assume m is square matrix.
d = m.getrow(i)
n = len(d.indices)
s = zip([i]*n, d.indices, d.data)
sorted_s = sorted(s, key=lambda v: v[2], reverse=True)
a.extend(sorted_s)
a = np.array(a)
new_m = coo_matrix((a[:,2], (a[:,0], a[:,1])), m.shape)
There can be some simple mistakes above because I have not checked it yet. But the idea is intuitive, I guess. Is there any good solution?
Edit
This new matrix creation may be useless because if you call getrow method then the order is broken again.
Only coo_matrix.col keeps the order.
Another Solution
This one is not exact solution but it may be helpful:
def sortSparseMatrix(m, rev=True, only_indices=True):
""" Sort a sparse matrix and return column index dictionary
"""
col_dict = dict()
for i in xrange(m.shape[0]): # assume m is square matrix.
d = m.getrow(i)
s = zip(d.indices, d.data)
sorted_s = sorted(s, key=lambda v: v[1], reverse=True)
if only_indices:
col_dict[i] = [element[0] for element in sorted_s]
else:
col_dict[i] = sorted_s
return col_dict
>>> print sortSparseMatrix(m)
{0: [5, 1, 0],
1: [1, 3, 5],
2: [1, 2, 3, 4],
3: [1, 5, 2, 4],
4: [0, 3, 5, 1],
5: [3, 4, 2]}

Cycle through list starting at a certain element

Say I have a list:
l = [1, 2, 3, 4]
And I want to cycle through it. Normally, it would do something like this,
1, 2, 3, 4, 1, 2, 3, 4, 1, 2...
I want to be able to start at a certain point in the cycle, not necessarily an index, but perhaps matching an element. Say I wanted to start at whatever element in the list ==4, then the output would be,
4, 1, 2, 3, 4, 1, 2, 3, 4, 1...
How can I accomplish this?
Look at itertools module. It provides all the necessary functionality.
from itertools import cycle, islice, dropwhile
L = [1, 2, 3, 4]
cycled = cycle(L) # cycle thorugh the list 'L'
skipped = dropwhile(lambda x: x != 4, cycled) # drop the values until x==4
sliced = islice(skipped, None, 10) # take the first 10 values
result = list(sliced) # create a list from iterator
print(result)
Output:
[4, 1, 2, 3, 4, 1, 2, 3, 4, 1]
Use the arithmetic mod operator. Suppose you're starting from position k, then k should be updated like this:
k = (k + 1) % len(l)
If you want to start from a certain element, not index, you can always look it up like k = l.index(x) where x is the desired item.
I'm not such a big fan of importing modules when you can do things by your own in a couple of lines. Here's my solution without imports:
def cycle(my_list, start_at=None):
start_at = 0 if start_at is None else my_list.index(start_at)
while True:
yield my_list[start_at]
start_at = (start_at + 1) % len(my_list)
This will return an (infinite) iterator looping your list. To get the next element in the cycle you must use the next statement:
>>> it1 = cycle([101,102,103,104])
>>> next(it1), next(it1), next(it1), next(it1), next(it1)
(101, 102, 103, 104, 101) # and so on ...
>>> it1 = cycle([101,102,103,104], start_at=103)
>>> next(it1), next(it1), next(it1), next(it1), next(it1)
(103, 104, 101, 102, 103) # and so on ...
import itertools as it
l = [1, 2, 3, 4]
list(it.islice(it.dropwhile(lambda x: x != 4, it.cycle(l)), 10))
# returns: [4, 1, 2, 3, 4, 1, 2, 3, 4, 1]
so the iterator you want is:
it.dropwhile(lambda x: x != 4, it.cycle(l))
Hm, http://docs.python.org/library/itertools.html#itertools.cycle doesn't have such a start element.
Maybe you just start the cycle anyway and drop the first elements that you don't like.
Another weird option is that cycling through lists can be accomplished backwards. For instance:
# Run this once
myList = ['foo', 'bar', 'baz', 'boom']
myItem = 'baz'
# Run this repeatedly to cycle through the list
if myItem in myList:
myItem = myList[myList.index(myItem)-1]
print myItem
Can use something like this:
def my_cycle(data, start=None):
k = 0 if not start else start
while True:
yield data[k]
k = (k + 1) % len(data)
Then run:
for val in my_cycle([0,1,2,3], 2):
print(val)
Essentially the same as one of the previous answers. My bad.

Categories

Resources