sliding window data with pandas dataframe - python

I have a dataset that looks something like this:
df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))
what I want is a function that can take a window size as input and give me something like this:
function : def make_sliding_df(data, size)
If I do make_sliding_df(df, 1) the output should be a Dataframe like this:
If I do make_sliding_df(df, 2) the output should be a Dataframe like this:
I have tried a bunch of things but none have helped me so far, any help would be appreciated.(I have checked few other similar questions, but none helped out)

Here's one way using shift, applymap and reduce
In [2007]: def make_sliding(df, N):
...: dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
...: return reduce(lambda x, y: x.add(y), dfs)
...:
In [2008]: make_sliding(df, 1)
Out[2008]:
a b month
0 [2, 4.0] [3, 5.0] [1, 2.0]
1 [4, 2.0] [5, 6.0] [2, 3.0]
2 [2, 4.0] [6, 3.0] [3, 4.0]
3 [4, 2.0] [3, 4.0] [4, 5.0]
4 [2, 4.0] [4, 6.0] [5, 6.0]
5 [4, nan] [6, nan] [6, nan]
In [2009]: make_sliding(df, 2)
Out[2009]:
a b month
0 [2, 4.0, 2.0] [3, 5.0, 6.0] [1, 2.0, 3.0]
1 [4, 2.0, 4.0] [5, 6.0, 3.0] [2, 3.0, 4.0]
2 [2, 4.0, 2.0] [6, 3.0, 4.0] [3, 4.0, 5.0]
3 [4, 2.0, 4.0] [3, 4.0, 6.0] [4, 5.0, 6.0]
4 [2, 4.0, nan] [4, 6.0, nan] [5, 6.0, nan]
5 [4, nan, nan] [6, nan, nan] [6, nan, nan]

This by using numpy, this may look ugly, but it is my first try with numpy...
def make_sliding_df(df,step=1,width=2):
l=[]
for x in df.columns:
a=df[x]
a=np.array(a)
b=np.append(a,[np.nan]*(width-1))
l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
newdf=pd.DataFrame(data=l).T
newdf.columns=df.columns
return(newdf)
make_sliding_df(df,step=1,width=2)
Out[157]:
a b month
0 [2.0, 4.0] [3.0, 5.0] [1.0, 2.0]
1 [4.0, 2.0] [5.0, 6.0] [2.0, 3.0]
2 [2.0, 4.0] [6.0, 3.0] [3.0, 4.0]
3 [4.0, 2.0] [3.0, 4.0] [4.0, 5.0]
4 [2.0, 4.0] [4.0, 6.0] [5.0, 6.0]
5 [4.0, nan] [6.0, nan] [6.0, nan]
make_sliding_df(df,step=1,width=3)
Out[158]:
a b month
0 [2.0, 4.0, 2.0] [3.0, 5.0, 6.0] [1.0, 2.0, 3.0]
1 [4.0, 2.0, 4.0] [5.0, 6.0, 3.0] [2.0, 3.0, 4.0]
2 [2.0, 4.0, 2.0] [6.0, 3.0, 4.0] [3.0, 4.0, 5.0]
3 [4.0, 2.0, 4.0] [3.0, 4.0, 6.0] [4.0, 5.0, 6.0]
4 [2.0, 4.0, nan] [4.0, 6.0, nan] [5.0, 6.0, nan]
5 [4.0, nan, nan] [6.0, nan, nan] [6.0, nan, nan]

Related

How to insert an element in each part of 2 level list?(python)

I have two-level list with 3 float element in each part of this list, it looks like that:
[[0.0, 0.0, 0.0], [0.0, 5.0, 0.0], [2.53188872, 2.16784954, 9.49026489], [5.0, 0.0, 0.0]....]
I need to insert a number at the beginning of each element of this list) so that it looks like this:
[[1, 0.0, 0.0, 0.0], [2, 0.0, 5.0, 0.0], [3, 2.53188872, 2.16784954, 9.49026489], [4, 5.0, 0.0, 0.0]....]
I tried using a for loop:
for i in range(len(additional_nodes)):
additional_nodes[i].insert(0, i+1) print(additional_nodes)
but i got something like this:
[[31, 28, 25, 0, 0.0, 0.0, 0.0], [16, 12, 10, 4, 1, 0.0, 5.0, 0.0], [53, 50, 47, 44, 41, 38, 35, 32, 29, 26, 23, 20, 17, 14, 11, 8, 5, 2, 2.53188872, 2.16784954, 9.49026489]...]
what's my problem?
Try this, you have an error in your loop:
for i in range(len(additional_nodes)):
additional_nodes[i].insert(0, i+1)
Or if you want , better enumerate:
for i, lst in enumerate(additional_nodes, start=1):
lst.insert(0, i)
Best to use enumerate like this:
mlist = [[0.0, 0.0, 0.0], [0.0, 5.0, 0.0], [2.53188872, 2.16784954, 9.49026489], [5.0, 0.0, 0.0]]
for i, e in enumerate(mlist, 1):
e.insert(0, i)
print(mlist)
Output:
[[1, 0.0, 0.0, 0.0], [2, 0.0, 5.0, 0.0], [3, 2.53188872, 2.16784954, 9.49026489], [4, 5.0, 0.0, 0.0]]
You can try as below by loop over multi lists
Code:
ls = [[0.0, 0.0, 0.0], [0.0, 5.0, 0.0], [2.53188872, 2.16784954, 9.49026489], [5.0, 0.0, 0.0]]
[[idx, *val] for idx,val in enumerate(ls)]
Output:
[[0, 0.0, 0.0, 0.0],
[1, 0.0, 5.0, 0.0],
[2, 2.53188872, 2.16784954, 9.49026489],
[3, 5.0, 0.0, 0.0]]
I am not sure where it went wrong. Coz it works for me fine.
If u are sure that it is not working and in immediate need of a solution, try reverting and appending and then reverting again. Lol
l = [[0.0, 0.0, 0.0], [0.0, 5.0, 0.0], [2.53188872, 2.16784954, 9.49026489], [5.0, 0.0, 0.0]]
for i in range(len(l)):
l[i] = l[i][::-1]
l[i].append(i+1)
l[i] = l[i][::-1]
print(l)

How can i calculate linear interpolation between two numbers using given steps?

How can i calculate interpolation between start and stop?
example: interpolation (start, stop, step)
interpolation(1, 5, 1) -> [1.0]
interpolation(1, 5, 2) -> [1.0, 5.0]
interpolation(1, 5, 3) -> [1.0, 3.0, 5.0]
interpolation(1, 5, 4) -> [1.0, 2.333333333333333, 3.6666666666666665, 5.0]
interpolation(1, 5, 5) -> [1.0, 2.0, 3.0, 4.0, 5.0]
interpolation(5, 1, 5) -> [5.0, 4.0, 3.0, 2.0, 1.0]
You can use numpy.linspace:
import numpy as np
np.linspace(5, 1, 5)
# array([5., 4., 3., 2., 1.])
np.linspace(1, 5, 4)
# array([1. , 2.33333333, 3.66666667, 5. ])
or with pure python:
def interpolation(start, stop, step):
if step == 1:
return [start]
return [start+(stop-start)/(step-1)*i for i in range(step)]
interpolation(5, 1, 5)
# [5.0, 4.0, 3.0, 2.0, 1.0]
interpolation(1, 5, 4)
# [1.0, 2.333333333333333, 3.6666666666666665, 5.0]

Python How to Decompress a dictionary

I have a dictionary with:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
d = {'inds': inds, 'vals': vals}
print(d) will get me: {'inds': [0, 3, 7, 3, 3, 5, 1], 'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0,
7.0]}
As you can see, inds(keys) are not ordered, there are dupes, and there are missing ones: range is 0 to 7 but there are only 0,1,3,5,7 distinct integers. I want to write a function that takes the dictionary (d) and decompresses this into a full vector like shown below. For any repeated indices (3 in this case), I'd like to sum the corresponding values, and for the missing indices, want 0.0.
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Trying to write a function that returns me a final list... something like this:
def decompressor (d, n=None):
final_list=[]
for i in final_list:
final_list.append()
return(final_list)
# final_list.index: 0 1 2 3* 4 5 6 7
# final_list = [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Try it,
xyz = [0.0 for x in range(max(inds)+1)]
for i in range(max(inds)):
if xyz[inds[i]] != 0.0:
xyz[inds[i]] += vals[i]
else:
xyz[inds[i]] = vals[i]
Some things are still not clear to me but supposing you are trying to make a list in which the maximum index is the one you can find in your inds list, and you want a list as a result you can do something like this:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
#initialize a list of zeroes with lenght max index
res=[float(0)]*(max(inds)+1)
#[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
#Loop indexes and values in pairs
for i, v in zip(inds, vals):
#Add the value to the corresponding index
res[i] += v
print (res)
#[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
first you have to initialise the dictionary , ranging from min to max value in the inds list
max_id = max(inds)
min_id = min(inds)
my_dict={}
i = min_id
while i <= max_id:
my_dict[i] = 0.0
i = i+1
for i in range(len(inds)):
my_dict[inds[i]] += vals[i]
my_dict = {0: 1.0, 1: 7.0, 2: 0, 3: 11.0, 4: 0, 5: 6.0, 6: 0, 7: 3.0}

How do I create a vector which shows the sum of the individual values whenever there are repeated indices?

I have a dictionary ā€˜dā€™ which stores a list of indices (d['inds']) and a list of values (d['vals']). For example:
d['inds'] == [0, 3, 7, 3, 3, 5, 1]
d['vals'] == [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
In the above example, the index 3 appears three times. How do I create a vector which shows the sum of the individual values whenever there are repeated indices? In other words, the vector corresponding to this example of d would be:
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
You can create a dictionary where the keys are the indices and the values are the total sum:
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
result = {}
for i, v in zip(d['inds'], d['vals']):
if i not in result:
result[i] = 0
result[i] += v
print(result)
Output
{0: 1.0, 1: 7.0, 3: 11.0, 5: 6.0, 7: 3.0}
If the list (vector) is mandatory can be done in the following way:
result = [0]*(max(d['inds']) + 1)
for i, v in zip(d['inds'], d['vals']):
result[i] += v
print(result)
Output
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]

Equating the lengths of the arrays in an array of arrays

Given an array of arrays with different lengths. Is there a cleaner (shorter) way to equate the lengths of the arrays by filling the shorter ones with zeros other than the following code:
a = [[1.0, 2.0, 3.0, 4.0],[2.0, 3.0, 1.0],[5.0, 5.0, 5.0, 5.0],[1.0, 1.0]]
max =0
for x in a:
if len(x) > max:
max = len(x)
print max
new = []
for x in a:
if len(x)<max:
x.extend([0.0]* (max-len(x)) )
new.append(x)
print new
You can find the length of the largest list within a using either:
len(max(a, key=len))
or
max(map(len, a))
and also use a list comprehension to construct a new list:
>>> a = [[1.0, 2.0, 3.0, 4.0], [2.0, 3.0, 1.0], [5.0, 5.0, 5.0, 5.0], [1.0, 1.0]]
>>> m = len(max(a, key=len))
>>> new = [x + [0]*(m - len(x)) for x in a]
>>> new
[[1.0, 2.0, 3.0, 4.0], [2.0, 3.0, 1.0, 0], [5.0, 5.0, 5.0, 5.0], [1.0, 1.0, 0, 0]]
In: b = [i+[0.]*(max(map(len,a))-len(i)) for i in a]
In: b
Out:
[[1.0, 2.0, 3.0, 4.0],
[2.0, 3.0, 1.0, 0.0],
[5.0, 5.0, 5.0, 5.0],
[1.0, 1.0, 0.0, 0.0]]

Categories

Resources