Matlab to python logic difficulty for arrays - python

I have created an m-by-n matrix in MATLAB and can easily select a range of values within a certain column and row. For instance, if I have matrix A:
A =
0 0 0 0
1 2 3 4
5 6 7 8
9 10 11 12
I can isolate the values: 1,5 and 9 from the first column by typing: A(2:4,1). The results will yield [1;5;9]. As it relates to python, I am not sure how to index an array such that I have the desired values as above.

This can be done using numpy
a = numpy.matrix('0 0 0 0; 1 2 3 4; 5 6 7 8; 9 10 11 12')
Required result is a[1:,0] or a[1:4,0]
Only difference is that the array indexing start from 0 instead of 1.

Related

How to make a grid of the size a rows x b columns from a list containing exactly a*b items? Python grid, list, matrix?

How do I make a 3x5 grid out of a list containing 15 items/strings?
I have a list containing 15 symbols but it could very well also just be a list such as mylist = list(range(15)), that I want to portray in a grid with 3 rows and columns. How does that work without importing another module?
I've been playing around with the for loop a bit to try and find a way but it's not very intuitive yet so I've been printing long lines of 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 etc I do apologize for this 'dumb' question but I'm an absolute beginner as you can tell and I don't know how to move forward with this simple problem
This is what I was expecting for an output, as I want to slowly work my way up to making a playing field or a tictactoe game but I want to understand portraying grids, lists etc as best as possible first
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
A mxn Grid? There are multiple ways to do it. Print for every n elements.
mylist = list(range(15))
n = 5
chunks = (mylist[i:i+n] for i in range(0, len(mylist), n))
for chunk in chunks:
print(*chunk)
Gives 3x5
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
Method 2
If you want more cosmetic then you can try
Ref
pip install tabulate
Code
mylist = list(range(15))
wrap = [mylist[x:x+5] for x in range(0, len(mylist),5)]
from tabulate import tabulate
print(tabulate(wrap))
Gives #
-- -- -- -- --
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
-- -- -- -- --

list index out of range in calculation of nodal distance

I am working on a small task in which I have to find the distance between two nodes. Each node has X and Y coordinates which can be seen below.
node_number X_coordinate Y_coordinate
0 0 1 0
1 1 1 1
2 2 1 2
3 3 1 3
4 4 0 3
5 5 0 4
6 6 1 4
7 7 2 4
8 8 3 4
9 9 4 4
10 10 4 3
11 11 3 3
12 12 2 3
13 13 2 2
14 14 2 1
15 15 2 0
For the purpose I mentioned above, I wrote below code,
X1_coordinate = df['X_coordinate'].tolist()
Y1_coordinate = df['Y_coordinate'].tolist()
node_number1 = df['node_number'].tolist()
nodal_dist = []
i = 0
for i in range(len(node_number1)):
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
nodal_dist.append(dist)
I got the error
list index out of range
Kindly let me know what I am doing wrong and what should I change to get the answer.
Indexing starts at zero, so the last element in the list has an index that is one less than the number of elements in that list. But the len() function gives you the number of elements in the list (in other words, it starts counting at 1), so you want the range of your loop to be len(node_number1) - 1 to avoid an -off-by-one error.
The problems should been in this line
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
the X1_coordinate[i+1] and the ] Y1_coordinate[i+1]] go out of range on the last number call.

Rolling sum on a dynamic window

I am new to python and the last time I coded was in the mid-80's so I appreciate your patient help.
It seems .rolling(window) requires the window to be a fixed integer. I need a rolling window where the window or lookback period is dynamic and given by another column.
In the table below, I seek the Lookbacksum which is the rolling sum of Data as specified by the Lookback column.
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
eg:
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
You can create a custom function for use with df.apply, eg:
def lookback_window(row, values, lookback, method='sum', *args, **kwargs):
loc = values.index.get_loc(row.name)
lb = lookback.loc[row.name]
return getattr(values.iloc[loc - lb: loc + 1], method)(*args, **kwargs)
Then use it as:
df['new_col'] = df.apply(lookback_window, values=df['Data'], lookback=df['Lookback'], axis=1)
There may be some corner cases but as long as your indices align and are unique - it should fulfil what you're trying to do.
here is one with a list comprehension which stores the index and value of the column df['Lookback'] and the gets the slice by reversing the values and slicing according to the column value:
df['LookbackSum'] = [sum(df.loc[:e,'Data'][::-1].to_numpy()[:i+1])
for e,i in enumerate(df['Lookback'])]
print(df)
Data Lookback LookbackSum
0 1 0 1
1 1 1 2
2 1 2 3
3 2 2 4
4 3 1 5
5 2 3 8
6 3 3 10
7 2 2 7
8 1 3 8
9 2 1 3
An exercise in pain, if you want to try an almost fully vectorized approach. Sidenote: I don't think it's worth it here. At all.
Inspired by Divakar's answer here
Given:
import numpy as np
import pandas as pd
d={'Data':[1,1,1,2,3,2,3,2,1,2],
'Lookback':[0,1,2,2,1,3,3,2,3,1],
'LookbackSum':[1,2,3,4,5,8,10,7,8,3]}
df=pd.DataFrame(data=d)
Using the function from Divakar's answer, but slightly modified
from skimage.util.shape import view_as_windows as viewW
def strided_indexing_roll(a, r, fill_value=np.nan):
# Concatenate with sliced to cover all rolls
p = np.full((a.shape[0],a.shape[1]-1),fill_value)
a_ext = np.concatenate((p,a,p),axis=1)
# Get sliding windows; use advanced-indexing to select appropriate ones
n = a.shape[1]
return viewW(a_ext,(1,n))[np.arange(len(r)), -r + (n-1),0]
Now, we just need to prepare a 2d array for the data and independently shift the rows according to our desired lookback values.
arr = df['Data'].to_numpy().reshape(1, -1).repeat(len(df), axis=0)
shifter = np.arange(len(df) - 1, -1, -1) #+ d['Lookback'] - 1
temp = strided_indexing_roll(arr, shifter, fill_value=0)
out = strided_indexing_roll(temp, (len(df) - 1 - df['Lookback'])*-1, 0).sum(-1)
Output:
array([ 1, 2, 3, 4, 5, 8, 10, 7, 8, 3], dtype=int64)
We can then just assign it back to the dataframe as needed and check.
df['out'] = out
#output:
Data Lookback LookbackSum out
0 1 0 1 1
1 1 1 2 2
2 1 2 3 3
3 2 2 4 4
4 3 1 5 5
5 2 3 8 8
6 3 3 10 10
7 2 2 7 7
8 1 3 8 8
9 2 1 3 3

Finding contiguous, non-unique slices in Pandas series without iterating

I'm trying to parse a logfile of our manufacturing process. Most of the time the process is run automatically but occasionally, the engineer needs to switch into manual mode to make some changes and then switches back to automatic control by the reactor software. When set to manual mode the logfile records the step as being "MAN.OP." instead of a number. Below is a representative example.
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
ser_orig = pd.Series(steps)
which results in
0 1
1 2
2 2
3 MAN.OP.
4 MAN.OP.
5 2
6 2
7 3
8 3
9 MAN.OP.
10 MAN.OP.
11 4
12 4
dtype: object
I need to detect the 'MAN.OP.' and make them distinct from each other. In this example, the two regions with values == 2 should be one region after detecting the manual mode section like this:
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
I have code that iterates over this series and produces the correct result when the series is passed to my object. The setter is:
#step_series.setter
def step_series(self, ss):
"""
On assignment, give the manual mode steps a unique name. Leave
the steps done on recipe the same.
"""
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
counter = 0
continuous = False
for i in ss.index:
if continuous and ss.at[i] != manual_mode:
continuous = False
counter += 1
elif not continuous and ss.at[i] == manual_mode:
continuous = True
ss.at[i] = new_manual_mode_text.format(str(counter))
elif continuous and ss.at[i] == manual_mode:
ss.at[i] = new_manual_mode_text.format(str(counter))
self._step_series = ss
but this iterates over the entire dataframe and is the slowest part of my code other than reading the logfile over the network.
How can I detect these non-unique sections and rename them uniquely without iterating over the entire series? The series is a column selection from a larger dataframe so adding extra columns is fine if needed.
For the completed answer I ended up with:
#step_series.setter
def step_series(self, ss):
pd.options.mode.chained_assignment = None
manual_mode = "MAN.OP."
new_manual_mode_text = "Manual_Mode_{}"
newManOp = (ss=='MAN.OP.') & (ss != ss.shift())
ss[ss == 'MAN.OP.'] = 'Manual_Mode_' + (newManOp.cumsum()-1).astype(str)
self._step_series = ss
Here's one way:
steps = [1,2,2,'MAN.OP.','MAN.OP.',2,2,3,3,'MAN.OP.','MAN.OP.',4,4]
steps = pd.Series(steps)
newManOp = (steps=='MAN.OP.') & (steps != steps.shift())
steps[steps=='MAN.OP.'] += seq.cumsum().astype(str)
>>> steps
0 1
1 2
2 2
3 MAN.OP.1
4 MAN.OP.1
5 2
6 2
7 3
8 3
9 MAN.OP.2
10 MAN.OP.2
11 4
12 4
dtype: object
To get the exact format you listed (starting from zero instead of one, and changing from "MAN.OP." to "Manual_mode_"), just tweak the last line:
steps[steps=='MAN.OP.'] = 'Manual_Mode_' + (seq.cumsum()-1).astype(str)
>>> steps
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object
There a pandas enhancement request for contiguous groupby, which would make this type of task simpler.
There is s function in matplotlib that takes a boolean array and returns a list of (start, end) pairs. Each pair represents a contiguous region where the input is True.
import matplotlib.mlab as mlab
regions = mlab.contiguous_regions(ser_orig == manual_mode)
for i, (start, end) in enumerate(regions):
ser_orig[start:end] = new_manual_mode_text.format(i)
ser_orig
0 1
1 2
2 2
3 Manual_Mode_0
4 Manual_Mode_0
5 2
6 2
7 3
8 3
9 Manual_Mode_1
10 Manual_Mode_1
11 4
12 4
dtype: object

Pythonic way of copying values in an array with a complex rule

If the value in column a is 1 then the value of b is copied in column c until a is -1.
In the example below, a is 1 in row 2 and -1 in row 5. Then the second value in column b (13) is copied in column c from row 2 to 5.
row a b c
1 0 12 0
2 1 13 13
3 0 15 13
4 0 2 13
5 -1 19 13
6 0 34 0
7 0 11 0
8 1 23 23
9 0 14 23
10 -1 9 23
11 0 18 0
12 0 19 0
I've done this with a for loop, but there must be a more elegant way to do this manipulating series (I'm using pandas, numpy). All your help is greatly appreciated.
Here's a solution that does use a for loop but is pretty succinct while still being understandable.
I'm assuming you have the data stored in table, with a as table[:,0] and that a always appears as (1, -1)*, with 0 interspersed.
starts = table[:,0] == 1
ends = table[:,0] == -1
for start, end in zip(starts.nonzero()[0], ends.nonzero()[0]):
table[start:end+1,2] = table[start,1]
I bet there's some fancy way to get rid of that loop, but I'd also bet that it's harder to tell what's going on.
I agree with everyone else that if you post what you currently have it'd help to go from there.

Categories

Resources