My dataframe:
A B C A_Q B_Q C_Q
27 40 41 2 1 etc
28 39 40 1 5
30 28 29 3 6
28 27 28 4 1
15 10 11 5 4
17 13 14 1 5
16 60 17 8 10
14 21 18 9 1
20 34 23 10 2
21 45 34 7 4
I want to iterate through each row in every column with a _Q suffix, starting with A_Q and do the following:
if row value = '1', grab the corresponding value in col 'A'
assign that value to a variable, call it x
keep looping down the col A_Q
if row value is either 1,2,3,4,5,6,7,8 or 9, ignore
if the value is 10, then get the corresponding value in col 'A' and assign that to variable y
calculate % change, call it chg, between y and x: (y/x)-1)*100
append chg to dataframe
keep going down the column with steps 1-7 above until the end
Then do the same for the other columns B_Q, C_Q etc
So for example, in the above, the first "1" that appears corresponds to 28 in col A. So x = 28. Then keep iterating, ignoring values 1 through 9, until you get a 10, which corresponds to 20 in col A. Calculate % change = ((20/27)-1)*100 = -25.9% and append that to df in a newly created col A_S. Then resume from that point on with same steps until reach end of the file. And finally, do the same for the rest of the columns.
So then the df would look like:
A B C A_Q B_Q C_Q A_S B_S C_S etc
27 40 41 2 1 etc
28 39 40 1 5
30 28 29 3 6
28 27 28 4 1
15 10 11 5 4
17 13 14 1 5
16 60 17 8 10 50
14 21 18 9 1
20 34 23 10 2 -25.9
21 45 34 7 4
I thought to create a function and then do something like df ['_S'] = df.apply ( function, axis =1) but am stuck on the implementation of the above steps 1-8. Thanks!
Do you need to append the results as a new column? You're going to end up with nearly empty columns with just one data value. Could you just append all of the results at the bottom of the '_Q' columns? Anyway here's my stab at the function to do all you asked:
def func(col1, col2):
l = []
x = None
for index in range(0, len(col1)):
if x is None and col1[index] == 1:
x = col2[index]
l.append(0)
elif not(x is None) and col1[index] == 10:
y = col2[index]
l.append(((float(y)/x)-1)*100)
x = None
else:
l.append(0)
return l
You'd then pass this function A_Q as col1 and A as col2 and it should return what you want. For passing functions, assuming that every A, B, C column has an associated _Q column, you could do something like:
q = [col for col in df.columns if '_Q' in col]
for col in q:
df[col[:len(col) - 2] + '_S] = func(df[col], df[col[:len(col) - 2]
Related
A pandas Series object is created from a list of numbers and characters, charSeries, copied below for reference.
A random five digit string of characters and numbers, created from the below list, is made:
code_core = Q5YUO
Each character in code_core is iterated through to get the corresponding index:
for character in code_core:
index_tosum = charSeries[charSeries == character].index(0)
This works for all alphabet characters. However for any number an error is returned. In the above core_core, an error is returned for character '5':
return getitem(key)
IndexError: index 0 is out of bounds for axis 0 with size 0
It is found that for 5, or any number,
charSeries[charSeries == character]
is empty. I do not see why this is as 5 is one of the values in the Series below and has an index of 5.
What is the error in my code or my approach?
0 0
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 B
11 C
12 D
13 F
14 G
15 H
16 J
17 K
18 L
19 M
20 N
21 P
22 Q
23 R
24 S
25 T
26 V
27 W
28 X
29 Y
30 Z
IIUC use next with iter with default value if no match, also if mixed numeric and string values convert them to strings by astype:
code_core = 'Q5YUO'
for character in code_core:
out = next(iter(charSeries.index[charSeries.astype(str) == character]), 'no match')
print (out)
22
5
29
no match
no match
I have two dataframes as such:
df_pos = pd.DataFrame(
data = [[5,4,3,6,0,7,1,2], [2,5,3,6,4,7,1,0]]
)
df_value = pd.DataFrame(
data=[np.arange(10 + i, 50 + i, 5) for i in range(0,2)]
)
and I want to have a new dataframe df_final where df_pos notates the position and df_value the corresponding value.
I can do it like this:
df_value_copy = df_value.copy()
for i in range(len(df_pos)):
df_value_copy.iloc[i, df_pos.iloc[i, :]] = df_value.iloc[i].values
df_final = df_value_copy
However, I have very large dataframes that would be way too slow. Therefore I want to see whether there is any smarter way to do it.
We can also try np.put_along_axis to place df_value into df_final based on the df_pos:
df_final = df_value.copy()
np.put_along_axis(
arr=df_final.values, # Destination Arr
indices=df_pos.values, # Indices
values=df_value.values, # Source Values
axis=1 # Along Axis
)
The arguments do not need to be kwargs can be positional like:
df_final = df_value.copy()
np.put_along_axis(df_final.values, df_pos.values, df_value.values, 1)
df_final:
0 1 2 3 4 5 6 7
0 30 40 45 20 15 10 25 35
1 46 41 11 21 31 16 26 36
You can try setting values with numpy advanced indexing:
df_final = df_value.copy()
df_final.values[np.arange(len(df_pos))[:,None], df_pos.values] = df_value.values
df_final
0 1 2 3 4 5 6 7
0 30 40 45 20 15 10 25 35
1 46 41 11 21 31 16 26 36
I have a data frame containing three columns, whereas col_1 and col_2 are containing some arbitrary data:
data = {"Height": range(1, 20, 1), "Col_1": range(2, 40, 2), "Col_2": range(3, 60, 3)}
df = pd.DataFrame(data)
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
9 10 20 30
10 11 22 33
11 12 24 36
12 13 26 39
13 14 28 42
14 15 30 45
15 16 32 48
16 17 34 51
17 18 36 54
18 19 38 57
and another data frame containing height values, that should be used to segment the Height column from the df.
data_segments = {"Section Height" : [1, 10, 20]}
df_segments = pd.DataFrame(data_segments)
Section Height
0 1
1 10
2 20
I want to create two new data frames, df_segment_0 containing all columns of the initial df but only for Height rows within the first two indices in the df_segments. The same approach should be taken for the df_segment_1. They should look like:
df_segment_0
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
df_segment_1
Height Col_1 Col_2
9 10 20 30
10 11 22 33
11 12 24 36
12 13 26 39
13 14 28 42
14 15 30 45
15 16 32 48
16 17 34 51
17 18 36 54
18 19 38 57
I tried the following code using the .loc method and added the suggestion of C Hecht to create a list of data frames:
df_segment_list = []
try:
for index in df_segments.index:
df_segment = df[["Height", "Col_1", "Col_2"]].loc[(df["Height"] >= df_segments["Section Height"][index]) & (df["Height"] < df_segments["Section Height"][index + 1])]
df_segment_list.append(df_segment)
except KeyError:
pass
Try-except is used only to ignore the error for the last name entry since there is no height for index=2. The data frames in this list can be accessed as C Hecht:
df_segment_0 = df_segment_list[0]
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
However, I would like to automate the naming of the final data frames. I tried:
for i in range(0, len(df_segment_list)):
name = "df_segment_" + str(i)
name = df_segment_list[i]
I expect that this code to simply automate the df_segment_0 = df_segment_list[0], instead I receive an error name 'df_segment_0' is not defined.
The reason I need separate data frames is that I will perform many subsequent operations using Col_1 and Col_2, so I need row-wise access to each one of them, for example:
df_segment_0 = df_segment_0 .assign(col_3 = df_segment_0 ["Col_1"] / df_segment_0 ["Col_2"])
How do I achieve this?
EDIT 1: Clarified question with the suggestion from C Hecht.
If you want to get all entries that are smaller than the current segment height in your segmentation data frame, here you go :)
import pandas as pd
df1 = pd.DataFrame({"Height": range(1, 20, 1), "Col_1": range(2, 40, 2), "Col_2": range(3, 60, 3)})
df_segments = pd.DataFrame({"Section Height": [1, 10, 20]})
def segment_data_frame(data_frame: pd.DataFrame, segmentation_plan: pd.DataFrame):
df = data_frame.copy() # making a safety copy because we mutate the df !!!
for sh in segmentation_plan["Section Height"]: # sh is the new maximum "Height"
df_new = df[df["Height"] < sh] # select all entries that match the maximum "Height"
df.drop(df_new.index, inplace=True) # remove them from the original DataFrame
yield df_new
# ATTENTION: segment_data_frame() will calculate each segment at runtime!
# So if you don't want to iterate over it but rather have one list to contain
# them all, you must use list(segment_data_frame(...)) or [x for x in segment_data_frame(...)]
for segment in segment_data_frame(df1, df_segments):
print(segment)
print()
print(list(segment_data_frame(df1, df_segments)))
If you want to execute certain steps on those steps you can just use the defined list like so:
for segment in segment_data_frame(df1, df_segments):
do_stuff_with(segment)
If you want to keep track and name the individual frames, you can use a dictionary
Unfortunately I don't 100% understand what you have in mind, but I hope that the following should help you in finding the answer:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Section Height': [20, 90, 111, 232, 252, 3383, 3768, 3826, 3947, 4100], 'df_names': [f'df_section_{i}' for i in range(10)]})
df['shifted'] = df['Section Height'].shift(-1)
new_dfs = []
for index, row in df.iterrows():
if np.isnan(row['shifted']):
# Don't know what you want to do here
pass
else:
new_df = pd.DataFrame({'heights': [i for i in range(int(row['Section Height']), int(row['shifted']))]})
new_df.name = row['df_names']
new_dfs.append(new_df)
The content of new_dfs are dataframes that look like this:
heights
0 20
1 21
2 22
3 23
4 24
.. ...
65 85
66 86
67 87
68 88
69 89
[70 rows x 1 columns]
If you clarify your questions given this input, we could help you all the way, but this should hopefully point you in the right direction.
Edit: A small comment on using df.name: This is not really stable and if you do stuff like dropping a column, pickling/unpickling, etc. the name will likely be lost. But you can surely find a good solution to maintain the name depending on your needs.
So im trying to print the items from the list in to a 18x4 matrix or a table. I´ve tried doing it by using format but it hasn't seem to work.
This is the desired output I want to get from the list, but im not sure how to print the first 18 items in one column and then the next 18 items in second column and so forth. I hope the description is well enough explained since english is not my first language. Any help would be greatly appreciated.
Use nested loops for the rows and columns. In the inner loop, use a stride of 18 to get every 18th element starting from the index in the first column.
for i in range(18):
for j in range(i, len(wireless_node_list), 18):
print(wireless_node_list[j], end='\t')
print()
You'll need to layout each row of output. Figuring out how many rows to display requires some basic math if the goal is to have a target number of columns:
# Some sample data
values = [x / 7 for x in range(50)]
cols = 4
# Figure out how many rows are needed
rows, extra = divmod(len(values), cols)
if extra > 0:
# That means we need one final partial row
rows += 1
# And show each row in turn
for row in range(rows):
line = ""
for col in range(cols):
i = col * rows + row
if i < len(values):
line += f"{i:3d} {values[i]:3.9f} "
print(line)
Which outputs:
0 0.000000000 13 1.857142857 26 3.714285714 39 5.571428571
1 0.142857143 14 2.000000000 27 3.857142857 40 5.714285714
2 0.285714286 15 2.142857143 28 4.000000000 41 5.857142857
3 0.428571429 16 2.285714286 29 4.142857143 42 6.000000000
4 0.571428571 17 2.428571429 30 4.285714286 43 6.142857143
5 0.714285714 18 2.571428571 31 4.428571429 44 6.285714286
6 0.857142857 19 2.714285714 32 4.571428571 45 6.428571429
7 1.000000000 20 2.857142857 33 4.714285714 46 6.571428571
8 1.142857143 21 3.000000000 34 4.857142857 47 6.714285714
9 1.285714286 22 3.142857143 35 5.000000000 48 6.857142857
10 1.428571429 23 3.285714286 36 5.142857143 49 7.000000000
11 1.571428571 24 3.428571429 37 5.285714286
12 1.714285714 25 3.571428571 38 5.428571429
This is a bit brute force, but should work. Just build your list and then do a little math with your indexes.
list = ["112312", "12321312", "9809809", "8374973498", "3827498734", "5426547", "08091280398", "ndfahdda", "ppoiudapp", "dafdsf", "huhidhsaf", "nadsjhfdk", "hdajfhk", "jkhjhkh", "hdjhkajhkj"]
build = {}
len = len(list)
rowCount = len//4 + 1
for i in range(rowCount):
build[i] = []
if i < len: build[i].append(list[i])
if i + 4 < len: build[i].append(list[i + 4])
if i + 8 < len: build[i].append(list[i + 8])
if i + 12 < len: build[i].append(list[i + 12])
print(build)
This little sample tested out fine for me.
def tablesOneToTen(): # a function that will print out multiplication tables from 1-10
x = 1
y = 1
while x <= 10 and y <= 12:
f = x * y
print(f)
y = y + 1
x = x + 1
tablesOneToTen()
I am trying to make a function that will give me values from the multiplication table from 1-10.
Should I add if and elif statements in addition to nested while loops to make this code work?
For these sort of iteration tasks you're better off using the for loop since you already know the boundaries you're working with, also Python makes creating for loops especially easy.
With while loops you have to check that you are in range using conditionals while also explicitly incrementing your counters making mistakes all the more likely.
Since you know you need multiplication tables for values of x and y ranging from 1-10 you can, to get you familiar with loops, create two for loops:
def tablesOneToTen(): # a function that will print out multiplication tables from 1-10
# This will iterate with values for x in the range [1-10]
for x in range(1, 11):
# Print the value of x for reference
print("Table for {} * (1 - 10)".format(x))
# iterate for values of y in a range [1-10]
for y in range(1, 11):
# Print the result of the multiplication
print(x * y, end=" ")
# Print a new Line.
print()
Running this will give you the tables you need:
Table for 1 * (1 - 10)
1 2 3 4 5 6 7 8 9 10
Table for 2 * (1 - 10)
2 4 6 8 10 12 14 16 18 20
Table for 3 * (1 - 10)
3 6 9 12 15 18 21 24 27 30
With a while loop, the logic is similar but of course just more verbose than it need to since you must initialize, evaluate the condition and increment.
As a testament to its uglyness, the while loop would look something like this:
def tablesOneToTen():
# initialize x counter
x = 1
# first condition
while x <= 10:
# print reference message
print("Table for {} * [1-10]".format(x))
# initialize y counter
y = 1
# second condition
while y <=10:
# print values
print(x*y, end=" ")
# increment y
y += 1
# print a new line
print(" ")
# increment x
x += 1
Using Python 3
for i in range(1, 10+1):
for j in range(i, (i*10)+1):
if (j % i == 0):
print(j, end="\t")
print()
or:
for i in range(1, 10+1):
for j in range(i, (i*10)+1, i):
print(j, end="\t")
print()
Output:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
Hope it would help you to get 1 to 10 table.
a = [1,2,3,4,5,6,7,8,9,10]
for i in a:
print(*("{:3}" .format (i*col) for col in a))
print()