How to (log) transform *args arguments without losing structure - python

I am attempting to apply statistical tests to some datasets with variable numbers of groups. This causes a problem when I try to perform a log transformation for said groups while maintaining the ability to perform the test function (in this case scipy's kruskal()), which takes a variable number of arguments, one for each group of data.
The code below is an idea of what I want. Naturally stats.kruskal([np.log(i) for i in args]) does not work, as kruskal() does not expect a list of arrays, but one argument for each array. How do I perform log transformation (or any kind of alteration, really), while still being able to use the function?
import scipy.stats as stats
import numpy as np
def t(*args):
test = stats.kruskal([np.log(i) for i in args])
return test
a = [11, 12, 4, 42, 12, 1, 21, 12, 6]
b = [1, 12, 4, 3, 14, 8, 8, 6]
c = [2, 2, 3, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8]
print(t(a, b, c))

IIUC, * in front of the list you are forming while calling kruskal should do the trick:
test = stats.kruskal(*[np.log(i) for i in args])
Asterisk unpacks the list and passes each entry of the list as arguments to the function being called i.e. kruskal here.

Related

Dataframe with fixed length (over writing)

I write a code that generates a mass amount of data in each round. So, I need to only store data for the last 10 rounds. How can I create a dataframe which erases the oldest object when I add a need object (over-writing)? The order of observations -from old to new- should be maintained. Is there any simple function or data format to do this?
Thanks in advance!
You could use this function:
def ins(arr, item):
if len(arr) < 10:
arr.insert(0, item)
else:
arr.pop()
arr.insert(0, item)
ex = [1, 2, 3, 4, 5, 6, 7, 8, 9]
ins(ex, 'a')
print(ex)
# ['a', 1, 2, 3, 4, 5, 6, 7, 8, 9]
ins(ex, 'b')
print(ex)
# ['b', 'a', 1, 2, 3, 4, 5, 6, 7, 8]
In order for this to work you MUST pass a list as argument to the function ins(), so that the new item is inserted and the 10th is removed (if there is one).
(I considered that the question is not pandas specific, but rather a way to store a maximum amount of items in an array)

How to do i print some numbers using .sample() from the random built in module in python

I working on a problem where I'm supposed to generate ten random but unique numbers that range from 1 to 15 inclusive. The thing is, I'm supposed to write everything in one line and to also get this output:
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
Below I have some code I wrote but, it's not getting the output I want. What am I doing wrong and can I perhaps see a solution with a break so I know how to do this going down the road?
import random
print(sorted(random.sample(range(1,16),15)))
Output:
[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
The output I want is:
[2,4,6,7,8,9,11,12,13,15]
How do I get this in one line of code?
>>> help(random.sample)
sample(population, k): method of random.Random instance
Chooses k unique random elements from a population sequence or set.
I'm supposed to write everything in one line and to also get this output:
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
>>> sorted(__import__('random').Random(4225).sample(range(1, 16), 10))
[2, 4, 6, 7, 8, 9, 11, 12, 13, 15]
If you want to generate ten numbers in range 1-15, change
print(sorted(random.sample(range(1,16),15)))
to
print(sorted(random.sample(range(1,16),10)))
# From the documentation :
# random.sample(population, k)
import random
population = range(16)
how_may_sample = 10
random.sample(population, how_many_sample)
# Now in one line
random.sample(range(16), 10)

Python multiprocessing with large objects: prevent copying/serialization of object

I have implemented multiprocessing for some problem with larger objects like the following:
import time
import pathos.multiprocessing as mp
from functools import partial
from random import randrange
class RandomNumber():
def __init__(self, object_size=100):
self.size = bytearray(object_size*10**6) # 100 MB size
self.foo = None
def do_something(self, *args, **kwargs):
self.foo = randrange(1, 10)
time.sleep(0.5) # wait for 0.5 seconds
return self
def wrapper(random_number, *args, **kwargs):
return random_number.do_something(*args, **kwargs)
if __name__ == '__main__':
# create data
numbers = [RandomNumber() for m in range(0, 9)]
kwds = {'add': randrange(1, 10)}
# calculate
pool = mp.Pool(processes=mp.cpu_count())
result = pool.map_async(partial(wrapper, **kwds), numbers)
try:
result = result.get()
except:
pass
# print result
my_results = [i.foo for i in result]
print(my_results)
pool.close()
pool.join()
which yields something like:
[8, 7, 8, 3, 1, 2, 6, 4, 8]
Now the problem is that I have a massive improvement in performance compared to using a list comprehension when the objects are very small and this improvement turns into the opposite with larger object sizes e.g. 100 MB and larger.
From the documentation and other questions I have discovered that this caused by the use of pickle/dill for the serialization of single objects in order to pass them to the workers within the pool. In other words: the objects are copied and this IO operation becomes a bottleneck as it is more time consuming than the actual calculation.
I have alread tried to work on the same object using a multiprocessing.Manager but this resulted in even higher runtimes.
The problem is that I am bound to a specific class structure (here represented through RandomNumber()) which I cannot change..
Now my question is: Are there any ways or concepts to circumvent this behaviour and only get my calls on do_something() without the overhead of serialization or copying?
Any hints are welcome. Thanks in advance!
You need to use Batch processing.Do not create destroy workers for each number.
Make limited workers based on cpu_count.Then pass a list to each worked and process them .Use map and pass a list containing batches of numbers.
I have found a solution using multiprocessing or multithreading from the concurrent.futures library which does not require to pickle the objects. In my case, multithreading using ThreadPoolExecutor brings a clear advantage over multiprocessing via ProcessPoolExecutor.
import time
from random import randrange
import concurrent.futures as cf
class RandomNumber():
def __init__(self, object_size=100):
self.size = bytearray(object_size*10**6) # 100 MB size
self.foo = None
def do_something(self, *args, **kwargs):
self.foo = randrange(1, 10)
time.sleep(0.5) # wait for 0.5 seconds
return self
def wrapper(random_number, *args, **kwargs):
return random_number.do_something(*args, **kwargs)
if __name__ == '__main__':
# create data
numbers = [RandomNumber() for m in range(0, 100)]
kwds = {'add': randrange(1, 10)}
# run
with cf.ThreadPoolExecutor(max_workers=3) as executor:
result = executor.map(wrapper, numbers, timeout=5*60)
# print result
my_results = [i.foo for i in result]
print(my_results)
yields:
[3, 3, 1, 1, 3, 7, 7, 6, 7, 5, 9, 5, 6, 5, 6, 9, 1, 5, 1, 7, 5, 3, 6, 2, 9, 2, 1, 2, 5, 1, 7, 9, 2, 9, 4, 9, 8, 5, 2, 1, 7, 8, 5, 1, 4, 5, 8, 2, 2, 5, 3, 6, 3, 2, 5, 3, 1, 9, 6, 7, 2, 4, 1, 5, 4, 4, 4, 9, 3, 1, 5, 6, 6, 8, 4, 4, 8, 7, 5, 9, 7, 8, 6, 2, 3, 1, 7, 2, 4, 8, 3, 6, 4, 1, 7, 7, 3, 4, 1, 2]
real 0m21.100s
user 0m1.100s
sys 0m2.896s
Nonetheless, this still leads to memory leakage in cases where I have too much objects (here numbers) and does not prevent this by going into some "batch mode" if memory has to be swapped i.e. the system freezes until the task has finished.
Any hints on how to prevent this?

Printing top n distinct values of a list

I want to print the top 10 distinct elements from a list:
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
for i in range(0,top):
if test[i]==1:
top=top+1
else:
print(test[i])
It is printing:
2,3,4,5,6,7,8
I am expecting:
2,3,4,5,6,7,8,9,10,11
What I am missing?
Using numpy
import numpy as np
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
test=np.unique(np.array(test))
test[test!=1][:top]
Output
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Since you code only executes the loop for 10 times and the first 3 are used to ignore 1, so only the following 3 is printed, which is exactly happened here.
If you want to print the top 10 distinct value, I recommand you to do this:
# The code of unique is taken from [remove duplicates in list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists)
def unique(l):
return list(set(l))
def print_top_unique(List, top):
ulist = unique(List)
for i in range(0, top):
print(ulist[i])
print_top_unique([1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 10)
My Solution
test = [1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
uniqueList = [num for num in set(test)] #creates a list of unique characters [1,2,3,4,5,6,7,8,9,10,11,12,13]
for num in range(0,11):
if uniqueList[num] != 1: #skips one, since you wanted to start with two
print(uniqueList[num])

Custom list traversal and modification

I'm traversing a two-dimensional list (my representation of a matrix) in an unusual order: counterclockwise around the outside starting with the top-left element.
I need to do this more than once, but each time I do it, I'd like to do something different with the values I encounter. The first time, I want to note down the values so that I can modify them. (I can't modify them in place.) The second time, I want to traverse the outside of the matrix and modify the values of the matrix as I go, perhaps getting my new values from some generator.
Is there a way I can abstract this traversal to a function and still achieve my goals? I was thinking that this traverse-edge function could take a function and a matrix and apply the function to each element on the edge of the matrix. However, the problems with this are two-fold. If I do this, I don't think I can modify the matrix that's given as an argument, and I can't yield the values one by one because yield isn't a function.
Edit: I want to rotate a matrix counterclockwise (not 90 degrees) where one rotation moves, for example, the top-left element down one spot. To accomplish this, I'm rotating one "level" (or shell) of the matrix at a time. So if I'm rotating the outermost level, I want to traverse it once to build a list which I can shift to the left, then I want to traverse the outermost level again to assign it those new values which I calculated.
Just create 4 loops, one for each side of the array, that counts through the values of the index that changes for that side. For example, the first side, whose x index is always 0, could vary the y from 0 to n-2 (from the top-left corner to just shy of the bottom-left); repeat for the other sides.
I think there are two approaches you can take to solving your problem.
The first option is to create a function that returns an iterable of indexes into the matrix. Then you'd write your various passes over the matrix with for loops:
for i, j in matrix_border_index_gen(len(matrix), len(matrix[0])): # pass in dimensions
# do something with matrix[i][j]
The other option is to write a function that works more like map that applies a given function to each appropriate value of the matrix in turn. If you sometimes need to replace the current values with new ones, I'd suggest doing that all the time (the times when you don't want to replace the value, you can just have your function return the previous value):
def func(value):
# do stuff with value from matrix
return new_value # new_value can be the same value, if you don't want to change it
matrix_border_map(func, matrix) # replace each value on border of matrix with func(value)
I have added a few lines of python 3 code here. It has the mirror function and a spiral iterator (not sure, if that's what you meant). No doc strings (sorry). It is readable though. Change print statement for python 2.
EDIT : FIXED A BUG
class Matrix():
def __init__(self, rows=5, cols=5):
self.cells = [[None for c in range(cols)] for r in range(rows)]
def transpose(self):
self.cells = list(map(list, zip(*self.cells)))
def mirror(self):
for row in self.cells:
row.reverse()
def invert(self):
self.cells.reverse()
def rotate(self, clockwise=True):
self.transpose()
self.mirror() if clockwise else self.invert()
def iter_spiral(self, grid=None):
grid = grid or self.cells
next_grid = []
for cell in reversed(grid[0]):
yield cell
for row in grid[1:-1]:
yield row[0]
next_grid.append(row[1:-1])
if len(grid) > 1:
for cell in grid[-1]:
yield cell
for row in reversed(grid[1:-1]):
yield row[-1]
if next_grid:
for cell in self.iter_spiral(grid=next_grid):
yield cell
def show(self):
for row in self.cells:
print(row)
def test_matrix():
m = Matrix()
m.cells = [[1,2,3,4],
[5,6,7,8],
[9,10,11,12],
[13,14,15,16]]
print("We expect the spiral to be:", "4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11", sep='\n')
print("What the iterator yields:")
for cell in m.iter_spiral():
print(cell, end=', ')
print("\nThe matrix looks like this:")
m.show()
print("Now this is how it looks rotated 90 deg clockwise")
m.rotate()
m.show()
print("Now we'll rotate it back")
m.rotate(clockwise=False)
m.show()
print("Now we'll transpose it")
m.transpose()
m.show()
print("Inverting the above")
m.invert()
m.show()
print("Mirroring the above")
m.mirror()
m.show()
if __name__ == '__main__':
test_matrix()
This is the output:
We expect the spiral to be:
4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11
What the iterator yields:
4, 3, 2, 1, 5, 9, 13, 14, 15, 16, 12, 8, 7, 6, 10, 11,
The matrix looks like this:
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
Now this is how it looks rotated 90 deg clockwise
[13, 9, 5, 1]
[14, 10, 6, 2]
[15, 11, 7, 3]
[16, 12, 8, 4]
Now we'll rotate it back
[1, 2, 3, 4]
[5, 6, 7, 8]
[9, 10, 11, 12]
[13, 14, 15, 16]
Now we'll transpose it
[1, 5, 9, 13]
[2, 6, 10, 14]
[3, 7, 11, 15]
[4, 8, 12, 16]
Inverting the above
[4, 8, 12, 16]
[3, 7, 11, 15]
[2, 6, 10, 14]
[1, 5, 9, 13]
Mirroring the above
[16, 12, 8, 4]
[15, 11, 7, 3]
[14, 10, 6, 2]
[13, 9, 5, 1]
I would go with generator functions. They can be used to create iterators over which we can iterate. An Example of a generator function -
def genfunc():
i = 0
while i < 10:
yield i
i = i + 1
>>> for x in genfunc():
... print(x)
...
0
1
2
3
4
5
6
7
8
9
When calling the generator function, it returns a generator object -
>>> genfunc()
<generator object genfunc at 0x00553AD0>
It does not start going over the function at that point. When you start iterating over the generator object, calling for its first element, it starts going over the function, untill it reaches the first yield statement, and at that point it returns the value (in above case, it returns value of i) . And it also saves the state of the function at that point (that is it saves at what point the execution was when the value was yielded, what were the values for the variables in the local namespace, etc).
Then when it tries to get the next value, again execution starts from where it stopped last time, till it again yield another value. And this continues on.

Categories

Resources