Equating the lengths of the arrays in an array of arrays - python

Given an array of arrays with different lengths. Is there a cleaner (shorter) way to equate the lengths of the arrays by filling the shorter ones with zeros other than the following code:
a = [[1.0, 2.0, 3.0, 4.0],[2.0, 3.0, 1.0],[5.0, 5.0, 5.0, 5.0],[1.0, 1.0]]
max =0
for x in a:
if len(x) > max:
max = len(x)
print max
new = []
for x in a:
if len(x)<max:
x.extend([0.0]* (max-len(x)) )
new.append(x)
print new

You can find the length of the largest list within a using either:
len(max(a, key=len))
or
max(map(len, a))
and also use a list comprehension to construct a new list:
>>> a = [[1.0, 2.0, 3.0, 4.0], [2.0, 3.0, 1.0], [5.0, 5.0, 5.0, 5.0], [1.0, 1.0]]
>>> m = len(max(a, key=len))
>>> new = [x + [0]*(m - len(x)) for x in a]
>>> new
[[1.0, 2.0, 3.0, 4.0], [2.0, 3.0, 1.0, 0], [5.0, 5.0, 5.0, 5.0], [1.0, 1.0, 0, 0]]

In: b = [i+[0.]*(max(map(len,a))-len(i)) for i in a]
In: b
Out:
[[1.0, 2.0, 3.0, 4.0],
[2.0, 3.0, 1.0, 0.0],
[5.0, 5.0, 5.0, 5.0],
[1.0, 1.0, 0.0, 0.0]]

Related

How to find part of series in some series

The question is simple.
Suppose we have Series with this values:
srs = pd.Series([7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0])
How can I find place (index) of subseries 1.0, 2.0, 3.0?
Using a rolling window we can find the first occurrence of a list a.It puts a 'marker' (e.g. 0, any non-Nan value will be fine) at the end (right border) of the window. Then we use first_valid_index to find the index of this element and correct this value by the window size:
a = [1.0, 2.0, 3.0]
srs.rolling(len(a)).apply(lambda x: 0 if (x == a).all() else np.nan).first_valid_index()-len(a)+1
Output:
2
The simplest solution might be to use list comprehension:
a = srs.tolist() # [7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0]
b = [1.0, 2.0, 3.0]
[x for x in range(len(a)) if a[x:x+len(b)] == b]
# [2]
One naive way is to iterate over the series, subset the n elements and compare if they are equal to the given list:
Here the code:
srs = pd.Series([7.0, 2.0, 1.0, 2.0, 3.0, 5.0, 4.0])
sub_list = [1.0, 2.0, 3.0]
n = len(sub_list)
index_matching = []
for i in range(srs.shape[0] - n + 1):
sub_srs = srs.iloc[i: i+n]
if (sub_srs == sub_list).all():
index_matching.append(sub_srs.index)
print(index_matching)
# [RangeIndex(start=2, stop=5, step=1)]
Or in one line with list comprehension:
out = [srs.iloc[i:i+n].index for i in range(srs.shape[0] - n + 1) if (srs.iloc[i: i+n] == sub_list).all()]
print(out)
# [RangeIndex(start=2, stop=5, step=1)]
If you want an explicit list:
real_values = [[i for i in idx] for idx in out]
print(real_values)
# [[2, 3, 4]]

Python How to Decompress a dictionary

I have a dictionary with:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
d = {'inds': inds, 'vals': vals}
print(d) will get me: {'inds': [0, 3, 7, 3, 3, 5, 1], 'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0,
7.0]}
As you can see, inds(keys) are not ordered, there are dupes, and there are missing ones: range is 0 to 7 but there are only 0,1,3,5,7 distinct integers. I want to write a function that takes the dictionary (d) and decompresses this into a full vector like shown below. For any repeated indices (3 in this case), I'd like to sum the corresponding values, and for the missing indices, want 0.0.
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Trying to write a function that returns me a final list... something like this:
def decompressor (d, n=None):
final_list=[]
for i in final_list:
final_list.append()
return(final_list)
# final_list.index: 0 1 2 3* 4 5 6 7
# final_list = [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
Try it,
xyz = [0.0 for x in range(max(inds)+1)]
for i in range(max(inds)):
if xyz[inds[i]] != 0.0:
xyz[inds[i]] += vals[i]
else:
xyz[inds[i]] = vals[i]
Some things are still not clear to me but supposing you are trying to make a list in which the maximum index is the one you can find in your inds list, and you want a list as a result you can do something like this:
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
#initialize a list of zeroes with lenght max index
res=[float(0)]*(max(inds)+1)
#[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
#Loop indexes and values in pairs
for i, v in zip(inds, vals):
#Add the value to the corresponding index
res[i] += v
print (res)
#[1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
inds = [0, 3, 7, 3, 3, 5, 1]
vals = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
first you have to initialise the dictionary , ranging from min to max value in the inds list
max_id = max(inds)
min_id = min(inds)
my_dict={}
i = min_id
while i <= max_id:
my_dict[i] = 0.0
i = i+1
for i in range(len(inds)):
my_dict[inds[i]] += vals[i]
my_dict = {0: 1.0, 1: 7.0, 2: 0, 3: 11.0, 4: 0, 5: 6.0, 6: 0, 7: 3.0}

How do I create a vector which shows the sum of the individual values whenever there are repeated indices?

I have a dictionary ā€˜dā€™ which stores a list of indices (d['inds']) and a list of values (d['vals']). For example:
d['inds'] == [0, 3, 7, 3, 3, 5, 1]
d['vals'] == [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
In the above example, the index 3 appears three times. How do I create a vector which shows the sum of the individual values whenever there are repeated indices? In other words, the vector corresponding to this example of d would be:
# ind: 0 1 2 3* 4 5 6 7
x == [1.0, 7.0, 0.0, 11.0, 0.0, 6.0, 0.0, 3.0]
You can create a dictionary where the keys are the indices and the values are the total sum:
d = {
'inds': [0, 3, 7, 3, 3, 5, 1],
'vals': [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0]
}
result = {}
for i, v in zip(d['inds'], d['vals']):
if i not in result:
result[i] = 0
result[i] += v
print(result)
Output
{0: 1.0, 1: 7.0, 3: 11.0, 5: 6.0, 7: 3.0}
If the list (vector) is mandatory can be done in the following way:
result = [0]*(max(d['inds']) + 1)
for i, v in zip(d['inds'], d['vals']):
result[i] += v
print(result)
Output
[1.0, 7.0, 0, 11.0, 0, 6.0, 0, 3.0]

sliding window data with pandas dataframe

I have a dataset that looks something like this:
df = DataFrame(dict(month = [1,2,3,4,5,6], a = [2,4,2,4,2,4], b = [3,5,6,3,4,6]))
what I want is a function that can take a window size as input and give me something like this:
function : def make_sliding_df(data, size)
If I do make_sliding_df(df, 1) the output should be a Dataframe like this:
If I do make_sliding_df(df, 2) the output should be a Dataframe like this:
I have tried a bunch of things but none have helped me so far, any help would be appreciated.(I have checked few other similar questions, but none helped out)
Here's one way using shift, applymap and reduce
In [2007]: def make_sliding(df, N):
...: dfs = [df.shift(-i).applymap(lambda x: [x]) for i in range(0, N+1)]
...: return reduce(lambda x, y: x.add(y), dfs)
...:
In [2008]: make_sliding(df, 1)
Out[2008]:
a b month
0 [2, 4.0] [3, 5.0] [1, 2.0]
1 [4, 2.0] [5, 6.0] [2, 3.0]
2 [2, 4.0] [6, 3.0] [3, 4.0]
3 [4, 2.0] [3, 4.0] [4, 5.0]
4 [2, 4.0] [4, 6.0] [5, 6.0]
5 [4, nan] [6, nan] [6, nan]
In [2009]: make_sliding(df, 2)
Out[2009]:
a b month
0 [2, 4.0, 2.0] [3, 5.0, 6.0] [1, 2.0, 3.0]
1 [4, 2.0, 4.0] [5, 6.0, 3.0] [2, 3.0, 4.0]
2 [2, 4.0, 2.0] [6, 3.0, 4.0] [3, 4.0, 5.0]
3 [4, 2.0, 4.0] [3, 4.0, 6.0] [4, 5.0, 6.0]
4 [2, 4.0, nan] [4, 6.0, nan] [5, 6.0, nan]
5 [4, nan, nan] [6, nan, nan] [6, nan, nan]
This by using numpy, this may look ugly, but it is my first try with numpy...
def make_sliding_df(df,step=1,width=2):
l=[]
for x in df.columns:
a=df[x]
a=np.array(a)
b=np.append(a,[np.nan]*(width-1))
l.append((b[(np.arange(width)[None, :] + step*np.arange(len(a))[:, None])]).tolist())
newdf=pd.DataFrame(data=l).T
newdf.columns=df.columns
return(newdf)
make_sliding_df(df,step=1,width=2)
Out[157]:
a b month
0 [2.0, 4.0] [3.0, 5.0] [1.0, 2.0]
1 [4.0, 2.0] [5.0, 6.0] [2.0, 3.0]
2 [2.0, 4.0] [6.0, 3.0] [3.0, 4.0]
3 [4.0, 2.0] [3.0, 4.0] [4.0, 5.0]
4 [2.0, 4.0] [4.0, 6.0] [5.0, 6.0]
5 [4.0, nan] [6.0, nan] [6.0, nan]
make_sliding_df(df,step=1,width=3)
Out[158]:
a b month
0 [2.0, 4.0, 2.0] [3.0, 5.0, 6.0] [1.0, 2.0, 3.0]
1 [4.0, 2.0, 4.0] [5.0, 6.0, 3.0] [2.0, 3.0, 4.0]
2 [2.0, 4.0, 2.0] [6.0, 3.0, 4.0] [3.0, 4.0, 5.0]
3 [4.0, 2.0, 4.0] [3.0, 4.0, 6.0] [4.0, 5.0, 6.0]
4 [2.0, 4.0, nan] [4.0, 6.0, nan] [5.0, 6.0, nan]
5 [4.0, nan, nan] [6.0, nan, nan] [6.0, nan, nan]

Split list after repeating elements

I have this loop for creating a list of coefficients:
for i in N:
p = 0
for k in range(i+1):
p += (x**k)/factorial(k)
c.append(p)
For example N = [2, 3, 4] would give list c:
[1.0, 2.0, 2.5, 1.0, 2.0, 2.5, 2.6666666666666665, 1.0, 2.0, 2.5, 2.6666666666666665, 2.708333333333333]
I want a way of making separate lists after each 1.0 element. For example a nested list:
[[1.0, 2.0, 2.5], [1.0, 2.0, 2.5, 2.6666666666666665], [1.0, 2.0, 2.5, 2.6666666666666665, 2.708333333333333]]
I was thinking of using an if test, like
for c_ in c:
if c_ == 1.0:
anotherList.append(c_)
This only appends 1.0's though and I don't know how I can make it append everything after a one instead of just 1.0.
you can use itertools.groupby in list comprehension :
>>> [[1.0]+list(g) for k,g in itertools.groupby(l,lambda x:x==1.0) if not k]
[[1.0, 2.0, 2.5], [1.0, 2.0, 2.5, 2.6666666666666665], [1.0, 2.0, 2.5, 2.6666666666666665, 2.708333333333333]]
Try something like
another_list = []
for c_ in c:
if c_ == 1.0:
another_list.append([])
another_list[-1].append(c_)
Thanks for the suggestion #James Jenkinson

Categories

Resources