Aggregating groups into row vectors (rather than scalars) - python

I want to apply a function to every group in a groupby object, so that the function operates on multiple columns of each group, and returns a 1 x n "row vector" as result. I want the n entries of these row vectors to form the contents of n new columns in the resulting DataFrame.
Here's an example.
import pandas as pd
import numpy as np
df = pd.DataFrame.from_records([(0, 0, 0.616, 0.559),
(0, 0, 0.976, 0.942),
(0, 0, 0.363, 0.223),
(0, 0, 0.033, 0.225),
(0, 0, 0.950, 0.351),
(0, 1, 0.272, 0.004),
(0, 1, 0.167, 0.177),
(0, 1, 0.520, 0.157),
(0, 1, 0.435, 0.547),
(0, 1, 0.266, 0.850),
(1, 0, 0.368, 0.544),
(1, 0, 0.067, 0.064),
(1, 0, 0.566, 0.533),
(1, 0, 0.102, 0.431),
(1, 0, 0.240, 0.997),
(1, 1, 0.867, 0.793),
(1, 1, 0.519, 0.477),
(1, 1, 0.110, 0.853),
(1, 1, 0.160, 0.155),
(1, 1, 0.735, 0.515)],
columns=list('vwxy'))
grouped = df.groupby(list('vw'))
def example(group):
X2 = np.var(group['x'])
Y2 = np.var(group['y'])
X = np.sqrt(X2)
Y = np.sqrt(Y2)
R2 = X2 + Y2
M = 1.0/(R2 + 1)
return (M * 2 * X, M * 2 * Y, M * (R2 - 1))
This gets close:
grouped.apply(example).reset_index()
# v w 0
# 0 0 0 (0.596122357697, 0.450073544336, -0.664884906839)
# 1 0 1 (0.229241003533, 0.555057863705, -0.799599481139)
# 2 1 0 (0.326212671335, 0.53100544639, -0.782060425392)
# 3 1 1 (0.523276087715, 0.433768876798, -0.733503031723)
...but what I'm after is this:
# v w a b c
# 0 0 0 0.596122 0.450074 -0.664885
# 1 0 1 0.229241 0.555058 -0.799599
# 2 1 0 0.326213 0.531005 -0.782060
# 3 1 1 0.523276 0.433769 -0.733503
How can I achieve this?
It's OK to modify the example function, as long as it continues to return all 3 values in some form. IOW, I don't want a solution based on replacing example with 3 separate functions, one for each of the output columns.

Try returning a pandas Series instead of a tuple from example:
def example(group):
....
return pd.Series([M * 2 * X, M * 2 * Y, M * (R2 - 1)], index=list('abc'))

Related

Getting all possible combination for [1,0] with length 3 [0,0,0] to [1,1,1]

from itertools import combinations
def n_length_combo(arr, n):
# using set to deal
# with duplicates
return list(combinations(arr, n))
# Driver Function
if __name__ == "__main__":
arr = '01'
n = 3
print (n_length_combo([x for x in arr], n) )
Expected Output
wanted 3 combination of 0 and 1 .Tried with above example but it is not working
You're looking for a Cartesian product, not a combination or permutation of [0, 1]. For that, you can use itertools.product.
from itertools import product
items = [0, 1]
for item in product(items, repeat=3):
print(item)
This produces the output you're looking for (albeit in a slightly different order):
(0, 0, 0)
(0, 0, 1)
(0, 1, 0)
(0, 1, 1)
(1, 0, 0)
(1, 0, 1)
(1, 1, 0)
(1, 1, 1)

Calculate the average of sections of a column with condition met to create new dataframe

I have the below data table
A = [2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
B = [0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df = pd.DataFrame({'A':A, 'B':B})
I'd like to calculate the average of column A when consecutive rows see column B equal to 1. All rows where column B equal to 0 are neglected and subsequently create a new dataframe like below:
Thanks for your help!
Keywords: groupby, shift, mean
Code:
df_result=df.groupby((df['B'].shift(1,fill_value=0)!= df['B']).cumsum()).mean()
df_result=df_result[df_result['B']!=0]
df_result
A B
1 2.0 1.0
3 3.0 1.0
As you might noticed, you need first to determine the consecutive rows blocks having the same values.
One way to do so is by shifting B one row and then comparing it with itself.
df['B_shifted']=df['B'].shift(1,fill_value=0) # fill_value=0 to return int and replace Nan with 0's
df['A'] =[2, 3, 1, 2, 4, 1, 5, 3, 1, 7, 5]
df['B'] =[0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0]
df['B_shifted'] =[0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0]
(df['B_shifted'] != df['B'])=[F, T, F, F, T, F, T, F, F, T, F]
[↑ ][↑ ][↑ ][↑ ]
Now we can use the groupby pandas method as follows:
df_grouped=df.groupby((df['B_shifted'] != df['B']).cumsum())
Now if we looped in the DtaFrameGroupBy object df_grouped
we'll see the following tuples:
(0, A B B_shifted
0 2 0 0)
(1, A B B_shifted
1 3 1 0
2 1 1 1
3 2 1 1)
(2, A B B_shifted
4 4 0 1
5 1 0 0)
(3, A B B_shifted
6 5 1 0
7 3 1 1
8 1 1 1)
(4, A B B_shifted
9 7 0 1
10 5 0 0)
We can simply calculate the mean and filter the zero values now as follow
df_result=df_grouped.mean()
df_result=df_result[df_result['B']!=0][['A','B']]
References:(link, link).
Try:
m = (df.B != df.B.shift(1)).cumsum() * df.B
df_out = df.groupby(m[m > 0])["A"].mean().reset_index(drop=True).to_frame()
df_out["B"] = 1
print(df_out)
Prints:
A B
0 2 1
1 3 1
df1 = df.groupby((df['B'].shift() != df['B']).cumsum()).mean().reset_index(drop=True)
df1 = df1[df1['B'] == 1].astype(int).reset_index(drop=True)
df1
Output
A B
0 2 1
1 3 1
Explanation
We are checking if each row's value of B is not equal to next value using pd.shift, if so then we are grouping those values and calculating its mean and assigning it to new dataframe df1.
Since we have mean of groups of all consecutive 0s and 1s, so we are then filtering only values of B==1.

list comprehension variable assignment [duplicate]

This question already has answers here:
How to get the cartesian product of multiple lists
(17 answers)
Closed 3 years ago.
I am attempting to create a 4d array and assign variables to each of the cells.
Typically I would use four "for loops" but this is very messy and takes up a lot of space.
What i'm currently doing:
for x in range(2):
for y in range(2):
for j in range(2):
for k in range(2):
array[x,y,j,k] = 1 #will be a function in reality
I've tried using list comprehension but this only creates the list and does not assign variables to each cell.
Are there space-efficient ways to run through multiple for loops and assign variables with only a few lines of code?
Assuming you've already created an empty (numpy?) array, you can use itertools.product to fill it with values:
import itertools
for x, y, j, k in itertools.product(range(2), repeat=4):
arr[x,y,j,k] = 1
If not all of the array's dimensions are equal, you can list them individually:
for x, y, j, k in itertools.product(range(2), range(2), range(2), range(2)):
arr[x,y,j,k] = 1
You may however be wondering how itertools.product does the trick. Or maybe you want to encode a different transformation in your recursive expansion. Below, I'll share one possible solution using Python's generators -
def product (*iters):
def loop (prod, first = [], *rest):
if not rest:
for x in first:
yield prod + (x,)
else:
for x in first:
yield from loop (prod + (x,), *rest)
yield from loop ((), *iters)
for prod in product ("ab", "xyz"):
print (prod)
# ('a', 'x')
# ('a', 'y')
# ('a', 'z')
# ('b', 'x')
# ('b', 'y')
# ('b', 'z')
Because product accepts a a list of iterables, any iterable input can be used in the product. They can even be mixed as demonstrated here -
print (list (product (['#', '%'], range (2), "xy")))
# [ ('#', 0, 'x')
# , ('#', 0, 'y')
# , ('#', 1, 'x')
# , ('#', 1, 'y')
# , ('%', 0, 'x')
# , ('%', 0, 'y')
# , ('%', 1, 'x')
# , ('%', 1, 'y')
# ]
We could make a program foo that provides the output posted in your question -
def foo (n, m):
ranges = [ range (m) ] * n
yield from product (*ranges)
for prod in foo (4, 2):
print (prod)
# (0, 0, 0, 0)
# (0, 0, 0, 1)
# (0, 0, 1, 0)
# (0, 0, 1, 1)
# (0, 1, 0, 0)
# (0, 1, 0, 1)
# (0, 1, 1, 0)
# (0, 1, 1, 1)
# (1, 0, 0, 0)
# (1, 0, 0, 1)
# (1, 0, 1, 0)
# (1, 0, 1, 1)
# (1, 1, 0, 0)
# (1, 1, 0, 1)
# (1, 1, 1, 0)
# (1, 1, 1, 1)
Or use destructuring assignment to create bindings for individual elements of the product. In your program, simply replace print with your real function -
for (w, x, y, z) in foo (4, 2):
print ("w", w, "x", x, "y", y, "z", z)
# w 0 x 0 y 0 z 0
# w 0 x 0 y 0 z 1
# w 0 x 0 y 1 z 0
# w 0 x 0 y 1 z 1
# w 0 x 1 y 0 z 0
# w 0 x 1 y 0 z 1
# w 0 x 1 y 1 z 0
# w 0 x 1 y 1 z 1
# w 1 x 0 y 0 z 0
# w 1 x 0 y 0 z 1
# w 1 x 0 y 1 z 0
# w 1 x 0 y 1 z 1
# w 1 x 1 y 0 z 0
# w 1 x 1 y 0 z 1
# w 1 x 1 y 1 z 0
# w 1 x 1 y 1 z 1
Because product is defined as a generator, we are afforded much flexibility even when writing more complex programs. Consider this program that finds right triangles made up whole numbers, a Pythagorean triple. Also note that product allows you to repeat an iterable as input as see in product (r, r, r) below
def is_triple (a, b, c):
return a * a + b * b == c * c
def solver (n):
r = range (1, n)
for p in product (r, r, r):
if is_triple (*p):
yield p
print (list (solver (20)))
# (3, 4, 5)
# (4, 3, 5)
# (5, 12, 13)
# (6, 8, 10)
# (8, 6, 10)
# (8, 15, 17)
# (9, 12, 15)
# (12, 5, 13)
# (12, 9, 15)
# (15, 8, 17)
For additional explanation and a way to see how to do this without using generators, view this answer.

How to add column values based on the two dates of an array?

I am new in programming. I came across this requirement. I have an array,
data= ['2016-1-01', '2016-1-08', '2016-1-15', '2016-1-22', '2016-1-29', '2016-02-05', '2016-02-12', '2016-02-19', '2016-02-26']
I have query result as following:
date a b c
2016-01-19 3 1 5
2016-01-20 10 4 5
2016-01-30 1 4 6
I am trying to generate the weekly report data.
In this example date '2016-01-19' and '2016-01-20' from above query lies between the '2016-01-15' and '2016-01-22' of data array so a, b & c is to be added.
The final output should like this:
2016-1-01 0 0 0
2016-1-08 0 0 0
2016-1-15 13 5 10
2016-1-22 0 0 0
2016-1-29 1 4 6
2016-2-05 0 0 0
2016-2-12 0 0 0
2016-2-19 0 0 0
2016-2-26 0 0 0
Assuming data is always sorted and has no repeated elements (you can do data = sorted(set(data)) if that is not the case), you can do something like this:
import datetime
data = ['2016-1-01', '2016-1-08', '2016-1-15', '2016-1-22', '2016-1-29', '2016-02-05', '2016-02-12', '2016-02-19', '2016-02-26']
query = [(datetime.date(2016, 1, 19), 3, 1, 5), (datetime.date(2016, 1, 20), 10, 4, 5), (datetime.date(2016, 1, 30), 1, 4, 6)]
# Convert data to datetime objects
data = [datetime.datetime.strptime(d, '%Y-%m-%d').date() for d in data]
output = []
query_it = iter(query)
next_date = data[0]
next_nums = (0, 0, 0)
# Iterate through date ranges
for d_start, d_end in zip(data, data[1:] + [datetime.date.max]):
# If the next interesting date is in range
if next_date < d_end:
nums = next_nums
next_nums = (0, 0, 0)
for q in query_it:
q_date, q_nums = q[0], q[1:]
if q_date < d_start:
# Ignore dates before the first date in data
continue
elif q_date < d_end:
# Add query numbers to count if in range
nums = tuple(n1 + n2 for n1, n2 in zip(nums, q_nums))
else:
# When out of range save numbers for next
next_date = q_date
next_nums = q_nums
break
else:
# Default to zero when no query dates in range
nums = (0, 0, 0)
# Add result to output
output.append((d_start,) + nums)
for out in output:
print(out)
Output:
(datetime.date(2016, 1, 1), 0, 0, 0)
(datetime.date(2016, 1, 8), 0, 0, 0)
(datetime.date(2016, 1, 15), 13, 5, 10)
(datetime.date(2016, 1, 22), 0, 0, 0)
(datetime.date(2016, 1, 29), 1, 4, 6)
(datetime.date(2016, 2, 5), 0, 0, 0)
(datetime.date(2016, 2, 12), 0, 0, 0)
(datetime.date(2016, 2, 19), 0, 0, 0)
(datetime.date(2016, 2, 26), 0, 0, 0)
This assumes that data is in order, otherwise use sorted(data).
import datetime
data = [
'2016-1-01', '2016-1-08', '2016-1-15',
'2016-1-22', '2016-1-29', '2016-02-05',
'2016-02-12', '2016-02-19', '2016-02-26'
]
query_result = [
(datetime.date(2016, 1, 19), 3, 1, 5),
(datetime.date(2016, 1, 20), 10, 4, 5),
(datetime.date(2016, 1, 30), 1, 4, 6)
]
# Convert string dates to datetime.date
date_data = [ datetime.datetime.strptime(date, '%Y-%m-%d').date()
for date in data ]
res = []
# zip the dates together in pairs
for start, end in zip(date_data, date_data[1:]):
tally_a = tally_b = tally_c = 0
for date, a, b, c in query_result:
# if date is in between add values
if start <= date <= end:
tally_a += a
tally_b += b
tally_c += c
res.append( (start, tally_a, tally_b, tally_c) )
# Output
for d, a, b, c in res:
print(d, a, b, c, sep = '\t')
2016-01-01 0 0 0
2016-01-08 0 0 0
2016-01-15 13 5 10
2016-01-22 0 0 0
2016-01-29 1 4 6
2016-02-05 0 0 0
2016-02-12 0 0 0
2016-02-19 0 0 0

Python, finding neighbors in a 2-d list

So here's the issue, I have a 2-d list of characters 'T' and 'F', and given coordinates I need to get all of its neighbors. I have this:
from itertools import product, starmap
x, y = (5, 5)
cells = starmap(lambda a, b: (x + a, y + b), product((0, -1, +1), (0, -1, +1)))
from determining neighbors of cell two dimensional list But it will only give me a list of coordinantes, so i still fetch the values afterwords. I'd like the search and retrieval done in one step, so findNeighbors(5,5) would return F,T,F,F,... instead of (5, 4), (5, 6), (4, 5), (4, 4)... Is there a fast way of doing this? The solutin can include a structure other than a list to hold the initial information
The following should work, with just a minor adaptation to the current code:
from itertools import product, starmap, islice
def findNeighbors(grid, x, y):
xi = (0, -1, 1) if 0 < x < len(grid) - 1 else ((0, -1) if x > 0 else (0, 1))
yi = (0, -1, 1) if 0 < y < len(grid[0]) - 1 else ((0, -1) if y > 0 else (0, 1))
return islice(starmap((lambda a, b: grid[x + a][y + b]), product(xi, yi)), 1, None)
For example:
>>> grid = [[ 0, 1, 2, 3],
... [ 4, 5, 6, 7],
... [ 8, 9, 10, 11],
... [12, 13, 14, 15]]
>>> list(findNeighbors(grid, 2, 1)) # find neighbors of 9
[8, 10, 5, 4, 6, 13, 12, 14]
>>> list(findNeighbors(grid, 3, 3)) # find neighbors of 15
[14, 11, 10]
For the sake of clarity, here is some equivalent code without all of the itertools magic:
def findNeighbors(grid, x, y):
if 0 < x < len(grid) - 1:
xi = (0, -1, 1) # this isn't first or last row, so we can look above and below
elif x > 0:
xi = (0, -1) # this is the last row, so we can only look above
else:
xi = (0, 1) # this is the first row, so we can only look below
# the following line accomplishes the same thing as the above code but for columns
yi = (0, -1, 1) if 0 < y < len(grid[0]) - 1 else ((0, -1) if y > 0 else (0, 1))
for a in xi:
for b in yi:
if a == b == 0: # this value is skipped using islice in the original code
continue
yield grid[x + a][y + b]

Categories

Resources