Pythonic way of copying values in an array with a complex rule

Pythonic way of copying values in an array with a complex rule - python

If the value in column a is 1 then the value of b is copied in column c until a is -1.
In the example below, a is 1 in row 2 and -1 in row 5. Then the second value in column b (13) is copied in column c from row 2 to 5.
row a b c
1 0 12 0
2 1 13 13
3 0 15 13
4 0 2 13
5 -1 19 13
6 0 34 0
7 0 11 0
8 1 23 23
9 0 14 23
10 -1 9 23
11 0 18 0
12 0 19 0
I've done this with a for loop, but there must be a more elegant way to do this manipulating series (I'm using pandas, numpy). All your help is greatly appreciated.

Here's a solution that does use a for loop but is pretty succinct while still being understandable.
I'm assuming you have the data stored in table, with a as table[:,0] and that a always appears as (1, -1)*, with 0 interspersed.
starts = table[:,0] == 1
ends = table[:,0] == -1
for start, end in zip(starts.nonzero()[0], ends.nonzero()[0]):
table[start:end+1,2] = table[start,1]
I bet there's some fancy way to get rid of that loop, but I'd also bet that it's harder to tell what's going on.
I agree with everyone else that if you post what you currently have it'd help to go from there.

Related

How to make a grid of the size a rows x b columns from a list containing exactly a*b items? Python grid, list, matrix?

How do I make a 3x5 grid out of a list containing 15 items/strings?
I have a list containing 15 symbols but it could very well also just be a list such as mylist = list(range(15)), that I want to portray in a grid with 3 rows and columns. How does that work without importing another module?
I've been playing around with the for loop a bit to try and find a way but it's not very intuitive yet so I've been printing long lines of 0 0 0 0 0 1 1 1 1 1 2 2 2 2 2 etc I do apologize for this 'dumb' question but I'm an absolute beginner as you can tell and I don't know how to move forward with this simple problem
This is what I was expecting for an output, as I want to slowly work my way up to making a playing field or a tictactoe game but I want to understand portraying grids, lists etc as best as possible first
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

A mxn Grid? There are multiple ways to do it. Print for every n elements.
mylist = list(range(15))
n = 5
chunks = (mylist[i:i+n] for i in range(0, len(mylist), n))
for chunk in chunks:
print(*chunk)
Gives 3x5
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
Method 2
If you want more cosmetic then you can try
Ref
pip install tabulate
Code
mylist = list(range(15))
wrap = [mylist[x:x+5] for x in range(0, len(mylist),5)]
from tabulate import tabulate
print(tabulate(wrap))
Gives #
-- -- -- -- --
0 1 2 3 4
5 6 7 8 9
10 11 12 13 14
-- -- -- -- --

list index out of range in calculation of nodal distance

I am working on a small task in which I have to find the distance between two nodes. Each node has X and Y coordinates which can be seen below.
node_number X_coordinate Y_coordinate
0 0 1 0
1 1 1 1
2 2 1 2
3 3 1 3
4 4 0 3
5 5 0 4
6 6 1 4
7 7 2 4
8 8 3 4
9 9 4 4
10 10 4 3
11 11 3 3
12 12 2 3
13 13 2 2
14 14 2 1
15 15 2 0
For the purpose I mentioned above, I wrote below code,
X1_coordinate = df['X_coordinate'].tolist()
Y1_coordinate = df['Y_coordinate'].tolist()
node_number1 = df['node_number'].tolist()
nodal_dist = []
i = 0
for i in range(len(node_number1)):
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
nodal_dist.append(dist)
I got the error
list index out of range
Kindly let me know what I am doing wrong and what should I change to get the answer.

Indexing starts at zero, so the last element in the list has an index that is one less than the number of elements in that list. But the len() function gives you the number of elements in the list (in other words, it starts counting at 1), so you want the range of your loop to be len(node_number1) - 1 to avoid an -off-by-one error.

The problems should been in this line
dist = math.sqrt((X1_coordinate[i+1] - X1_coordinate[i])**2 + (Y1_coordinate[i+1] - Y1_coordinate[i])**2)
the X1_coordinate[i+1] and the ] Y1_coordinate[i+1]] go out of range on the last number call.

Checking for subset in a column?

I'm trying to flag some price data as "stale" if the quoted price of the security hasn't changed over lets say 3 trading days. I'm currently trying it with:
firm["dev"] = np.std(firm["Price"],firm["Price"].shift(1),firm["Price"].shift(2))
firm["flag"] == np.where(firm["dev"] = 0, 1, 0)
But I'm getting nowhere with it. This is what my dataframe would look like.
Index
Price
Flag
1
10
0
2
11
0
3
12
0
4
12
0
5
12
1
6
11
0
7
13
0
Any help is appreciated!

If you are okay with other conditions, you can first check if series.diff equals 0 and take cumsum to check if you have a cumsum of 2 (n-1). Also check if the next row is equal to current, when both these conditions suffice, assign a flag of 1 else 0.
n=3
firm['Flag'] = (firm['Price'].diff().eq(0).cumsum().eq(n-1) &
firm['Price'].eq(firm['Price'].shift())).astype(int)
EDIT, to make it a generalized function with consecutive n, use this:
def fun(df,col,n):
c = df[col].diff().eq(0)
return (c|c.shift(-1)).cumsum().ge(n) & df[col].eq(df[col].shift())
firm['flag_2'] = fun(firm,'Price',2).astype(int)
firm['flag_3'] = fun(firm,'Price',3).astype(int)
print(firm)
Price Flag flag_2 flag_3
Index
1 10 0 0 0
2 11 0 0 0
3 12 0 0 0
4 12 0 1 0
5 12 1 1 1
6 11 0 0 0
7 13 0 0 0

Python Groupby and Count

I'm working on create a sankey plot and have the raw data mapped so that I know source and target node. I'm having an issue with grouping the source & target and then counting the number of times each occurs. E.g. using the table below finding out how many time 0 -> 4 occurs and recording that in the dataframe.
index event_action_num next_action_num
227926 0 6
227928 1 5
227934 1 6
227945 1 7
227947 1 6
227951 0 7
227956 0 6
227958 2 6
227963 0 6
227965 1 6
227968 1 5
227972 3 6
Where I want to send up is:
event_action_num next_action_num count_of
0 4 1728
0 5 2382
0 6 3739
etc
Have tried:
df_new_2 = df_new.groupby(['event_action_num', 'next_action_num']).count()
but doesn't give me the result I'm looking for.
Thanks in advance

Try to use agg('size') instead of count():
df_new_2.groupby(['event_action_num', 'next_action_num']).agg('size')
For your sample data output will be:

Matlab to python logic difficulty for arrays

I have created an m-by-n matrix in MATLAB and can easily select a range of values within a certain column and row. For instance, if I have matrix A:
A =
0 0 0 0
1 2 3 4
5 6 7 8
9 10 11 12
I can isolate the values: 1,5 and 9 from the first column by typing: A(2:4,1). The results will yield [1;5;9]. As it relates to python, I am not sure how to index an array such that I have the desired values as above.

This can be done using numpy
a = numpy.matrix('0 0 0 0; 1 2 3 4; 5 6 7 8; 9 10 11 12')
Required result is a[1:,0] or a[1:4,0]
Only difference is that the array indexing start from 0 instead of 1.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pythonic way of copying values in an array with a complex rule - python

Related

How to make a grid of the size a rows x b columns from a list containing exactly a*b items? Python grid, list, matrix?

list index out of range in calculation of nodal distance

Checking for subset in a column?

Python Groupby and Count

Matlab to python logic difficulty for arrays

Categories

Resources