i am trying to round values to it's higher multiple of 5
For example:
df
values rounded_values (expected column)
10 10
11 15
13 15
22 25
21 25
34 35
35 35
The underlying container of a Pandas column is a numpy array, so you could just use numpy here:
import numpy as np
df['rounded_values'] = np.round_((np.ceil(df['values']/5)*5)).astype(int)
Try:
df["rounded_values"] = ((df.values+4)//5)*5
print(df)
values rounded_values
0 10 10
1 11 15
2 13 15
3 22 25
4 21 25
5 34 35
6 35 35
Use mod:
def round_to5(x):
if x%5 == 0:
return x
else:
return x + (5 - x%5)
df['round_values'] = df['values'].apply(lambda x: round_to5(x))
Output is:
values round_values
0 10 10
1 12 15
2 24 25
3 27 30
This can be done simply, by first dividing by 5, then calling math.ceil, then multiplying by 5:
>>>import math
>>>round5 = lambda n: math.ceil(n / 5) * 5
>>> round5(22)
25
>>> round5(21)
25
>>> round5(35)
35
>>> round5(10)
10
>>> round5(11)
15
>>> round5(15)
15
Related
I have a data frame containing three columns, whereas col_1 and col_2 are containing some arbitrary data:
data = {"Height": range(1, 20, 1), "Col_1": range(2, 40, 2), "Col_2": range(3, 60, 3)}
df = pd.DataFrame(data)
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
9 10 20 30
10 11 22 33
11 12 24 36
12 13 26 39
13 14 28 42
14 15 30 45
15 16 32 48
16 17 34 51
17 18 36 54
18 19 38 57
and another data frame containing height values, that should be used to segment the Height column from the df.
data_segments = {"Section Height" : [1, 10, 20]}
df_segments = pd.DataFrame(data_segments)
Section Height
0 1
1 10
2 20
I want to create two new data frames, df_segment_0 containing all columns of the initial df but only for Height rows within the first two indices in the df_segments. The same approach should be taken for the df_segment_1. They should look like:
df_segment_0
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
df_segment_1
Height Col_1 Col_2
9 10 20 30
10 11 22 33
11 12 24 36
12 13 26 39
13 14 28 42
14 15 30 45
15 16 32 48
16 17 34 51
17 18 36 54
18 19 38 57
I tried the following code using the .loc method and added the suggestion of C Hecht to create a list of data frames:
df_segment_list = []
try:
for index in df_segments.index:
df_segment = df[["Height", "Col_1", "Col_2"]].loc[(df["Height"] >= df_segments["Section Height"][index]) & (df["Height"] < df_segments["Section Height"][index + 1])]
df_segment_list.append(df_segment)
except KeyError:
pass
Try-except is used only to ignore the error for the last name entry since there is no height for index=2. The data frames in this list can be accessed as C Hecht:
df_segment_0 = df_segment_list[0]
Height Col_1 Col_2
0 1 2 3
1 2 4 6
2 3 6 9
3 4 8 12
4 5 10 15
5 6 12 18
6 7 14 21
7 8 16 24
8 9 18 27
However, I would like to automate the naming of the final data frames. I tried:
for i in range(0, len(df_segment_list)):
name = "df_segment_" + str(i)
name = df_segment_list[i]
I expect that this code to simply automate the df_segment_0 = df_segment_list[0], instead I receive an error name 'df_segment_0' is not defined.
The reason I need separate data frames is that I will perform many subsequent operations using Col_1 and Col_2, so I need row-wise access to each one of them, for example:
df_segment_0 = df_segment_0 .assign(col_3 = df_segment_0 ["Col_1"] / df_segment_0 ["Col_2"])
How do I achieve this?
EDIT 1: Clarified question with the suggestion from C Hecht.
If you want to get all entries that are smaller than the current segment height in your segmentation data frame, here you go :)
import pandas as pd
df1 = pd.DataFrame({"Height": range(1, 20, 1), "Col_1": range(2, 40, 2), "Col_2": range(3, 60, 3)})
df_segments = pd.DataFrame({"Section Height": [1, 10, 20]})
def segment_data_frame(data_frame: pd.DataFrame, segmentation_plan: pd.DataFrame):
df = data_frame.copy() # making a safety copy because we mutate the df !!!
for sh in segmentation_plan["Section Height"]: # sh is the new maximum "Height"
df_new = df[df["Height"] < sh] # select all entries that match the maximum "Height"
df.drop(df_new.index, inplace=True) # remove them from the original DataFrame
yield df_new
# ATTENTION: segment_data_frame() will calculate each segment at runtime!
# So if you don't want to iterate over it but rather have one list to contain
# them all, you must use list(segment_data_frame(...)) or [x for x in segment_data_frame(...)]
for segment in segment_data_frame(df1, df_segments):
print(segment)
print()
print(list(segment_data_frame(df1, df_segments)))
If you want to execute certain steps on those steps you can just use the defined list like so:
for segment in segment_data_frame(df1, df_segments):
do_stuff_with(segment)
If you want to keep track and name the individual frames, you can use a dictionary
Unfortunately I don't 100% understand what you have in mind, but I hope that the following should help you in finding the answer:
import pandas as pd
import numpy as np
df = pd.DataFrame({'Section Height': [20, 90, 111, 232, 252, 3383, 3768, 3826, 3947, 4100], 'df_names': [f'df_section_{i}' for i in range(10)]})
df['shifted'] = df['Section Height'].shift(-1)
new_dfs = []
for index, row in df.iterrows():
if np.isnan(row['shifted']):
# Don't know what you want to do here
pass
else:
new_df = pd.DataFrame({'heights': [i for i in range(int(row['Section Height']), int(row['shifted']))]})
new_df.name = row['df_names']
new_dfs.append(new_df)
The content of new_dfs are dataframes that look like this:
heights
0 20
1 21
2 22
3 23
4 24
.. ...
65 85
66 86
67 87
68 88
69 89
[70 rows x 1 columns]
If you clarify your questions given this input, we could help you all the way, but this should hopefully point you in the right direction.
Edit: A small comment on using df.name: This is not really stable and if you do stuff like dropping a column, pickling/unpickling, etc. the name will likely be lost. But you can surely find a good solution to maintain the name depending on your needs.
I must write a program that accepts a number, n, where -6 < n < 2. The program must print out the numbers n to n+41 as 6 rows of 7 numbers. The first row must contain the values n to n+6, the second, the values n+7 to n+7+6, and so on.
That is, numbers are printed using a field width of 2, and are right-justified. Fields are separated by a single space. There are no spaces after the final field.
Output:
Enter the start number: -2
-2 -1 0 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 32
33 34 35 36 37 38 39
The numbers need to be directly lined under each other.
I have absolutely no idea how to do this
This is my code so far:
start = int(input('Enter the start number: '))
for n in range(n,n+41):
If you could help me I will really appreciate it.
I assume you are not allowed to use a library to tabulate the numbers for you and are expected to do the logic yourself.
You need to print 6 rows of numbers. Start by determining the first number of each row. That is given by range(n,n+42,7) (note, not n+41). For starting value -2, those are the numbers -2, 5, 12, 19, 26, 33. Every other number in the row is just the next 6 integers. If the first number in the row is leftmost then the entire row is given by range(leftmost, leftmost + 7). So the first row those are the numbers -2, -1, 0, 1, 2, 3, 4.
To print 6 rows of 7 numbers you need a loop with 6 iterations, one for each value of leftmost. Inside that loop you print the other numbers. The only complication is all of the numbers in the list must be followed by a space, except the last. So that has to get special treatment.
You need to specify format {0:2d} to ensure that "numbers are printed using a field width of 2".
n = -2
for leftmost in range(n,n+42,7):
for value in range(leftmost,leftmost + 6):
print("{0:2d}".format(value), end=" ")
print("{0:2d}".format(leftmost+6))
-2 -1 0 1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31 32
33 34 35 36 37 38 39
check the tabulate library here, you can use it to format the output - the tablefmt="plain" parameter produces a very similar table.
If you store the numbers in a list you can use list slicing to get the rows of 7 numbers each and put those in an another list to satisfy the format that tabulate is expecting
from tabulate import tabulate
n = 2
while not -6 < n < 2:
n = int(input('Please submit a number greater than -6 and smaller than 2:\n'))
number_list, output_list = [], []
for i in range(42):
number_list.append(n + i)
for i in range(6):
output_list.append(number_list[i*7:i*7+7])
print()
print(
tabulate(
output_list,
tablefmt='plain'
)
)
Please submit a number greater than -6 and smaller than 2:
-3
-3 -2 -1 0 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
32 33 34 35 36 37 38
My dataframe:
A B C A_Q B_Q C_Q
27 40 41 2 1 etc
28 39 40 1 5
30 28 29 3 6
28 27 28 4 1
15 10 11 5 4
17 13 14 1 5
16 60 17 8 10
14 21 18 9 1
20 34 23 10 2
21 45 34 7 4
I want to iterate through each row in every column with a _Q suffix, starting with A_Q and do the following:
if row value = '1', grab the corresponding value in col 'A'
assign that value to a variable, call it x
keep looping down the col A_Q
if row value is either 1,2,3,4,5,6,7,8 or 9, ignore
if the value is 10, then get the corresponding value in col 'A' and assign that to variable y
calculate % change, call it chg, between y and x: (y/x)-1)*100
append chg to dataframe
keep going down the column with steps 1-7 above until the end
Then do the same for the other columns B_Q, C_Q etc
So for example, in the above, the first "1" that appears corresponds to 28 in col A. So x = 28. Then keep iterating, ignoring values 1 through 9, until you get a 10, which corresponds to 20 in col A. Calculate % change = ((20/27)-1)*100 = -25.9% and append that to df in a newly created col A_S. Then resume from that point on with same steps until reach end of the file. And finally, do the same for the rest of the columns.
So then the df would look like:
A B C A_Q B_Q C_Q A_S B_S C_S etc
27 40 41 2 1 etc
28 39 40 1 5
30 28 29 3 6
28 27 28 4 1
15 10 11 5 4
17 13 14 1 5
16 60 17 8 10 50
14 21 18 9 1
20 34 23 10 2 -25.9
21 45 34 7 4
I thought to create a function and then do something like df ['_S'] = df.apply ( function, axis =1) but am stuck on the implementation of the above steps 1-8. Thanks!
Do you need to append the results as a new column? You're going to end up with nearly empty columns with just one data value. Could you just append all of the results at the bottom of the '_Q' columns? Anyway here's my stab at the function to do all you asked:
def func(col1, col2):
l = []
x = None
for index in range(0, len(col1)):
if x is None and col1[index] == 1:
x = col2[index]
l.append(0)
elif not(x is None) and col1[index] == 10:
y = col2[index]
l.append(((float(y)/x)-1)*100)
x = None
else:
l.append(0)
return l
You'd then pass this function A_Q as col1 and A as col2 and it should return what you want. For passing functions, assuming that every A, B, C column has an associated _Q column, you could do something like:
q = [col for col in df.columns if '_Q' in col]
for col in q:
df[col[:len(col) - 2] + '_S] = func(df[col], df[col[:len(col) - 2]
Suppose:
df['Column_Name'].max() # is the maximum value in a particular column in a dataframe
Then, you want to select 10 rows before the row that has maximum value in a particular column and 10 rows after that row (i.e. 10 + 1 + 10 = 21 rows total), then, how can it be done in Python?
Here is an addition to #2rs2ts solution to account for your max value being near the beginning or end of your series or dataframe.
df['a'][max(0,index_of_max_value-10):min(len(df['a']), index_of_max_value+11)]
You want to get the index of the row that has the maximum value. Assuming you're using Pandas, this would be done by using idxmax().
>>> from pandas import DataFrame
>>> data = [{'a':x} for x in range(40)]
>>> from random import shuffle
>>> shuffle(data)
>>> df = DataFrame(data)
>>> index_of_max_value = df['a'].idxmax()
>>> df['a'][max(0,index_of_max_value-10):min(len(df['a']), index_of_max_value+11)]
19 16
20 36
21 8
22 20
23 14
24 31
25 6
26 18
27 17
28 23
29 39
30 5
31 25
32 4
33 12
34 35
35 26
36 0
37 27
38 21
39 30
Name: a, dtype: int64
def tablesOneToTen(): # a function that will print out multiplication tables from 1-10
x = 1
y = 1
while x <= 10 and y <= 12:
f = x * y
print(f)
y = y + 1
x = x + 1
tablesOneToTen()
I am trying to make a function that will give me values from the multiplication table from 1-10.
Should I add if and elif statements in addition to nested while loops to make this code work?
For these sort of iteration tasks you're better off using the for loop since you already know the boundaries you're working with, also Python makes creating for loops especially easy.
With while loops you have to check that you are in range using conditionals while also explicitly incrementing your counters making mistakes all the more likely.
Since you know you need multiplication tables for values of x and y ranging from 1-10 you can, to get you familiar with loops, create two for loops:
def tablesOneToTen(): # a function that will print out multiplication tables from 1-10
# This will iterate with values for x in the range [1-10]
for x in range(1, 11):
# Print the value of x for reference
print("Table for {} * (1 - 10)".format(x))
# iterate for values of y in a range [1-10]
for y in range(1, 11):
# Print the result of the multiplication
print(x * y, end=" ")
# Print a new Line.
print()
Running this will give you the tables you need:
Table for 1 * (1 - 10)
1 2 3 4 5 6 7 8 9 10
Table for 2 * (1 - 10)
2 4 6 8 10 12 14 16 18 20
Table for 3 * (1 - 10)
3 6 9 12 15 18 21 24 27 30
With a while loop, the logic is similar but of course just more verbose than it need to since you must initialize, evaluate the condition and increment.
As a testament to its uglyness, the while loop would look something like this:
def tablesOneToTen():
# initialize x counter
x = 1
# first condition
while x <= 10:
# print reference message
print("Table for {} * [1-10]".format(x))
# initialize y counter
y = 1
# second condition
while y <=10:
# print values
print(x*y, end=" ")
# increment y
y += 1
# print a new line
print(" ")
# increment x
x += 1
Using Python 3
for i in range(1, 10+1):
for j in range(i, (i*10)+1):
if (j % i == 0):
print(j, end="\t")
print()
or:
for i in range(1, 10+1):
for j in range(i, (i*10)+1, i):
print(j, end="\t")
print()
Output:
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 12 14 16 18 20
3 6 9 12 15 18 21 24 27 30
4 8 12 16 20 24 28 32 36 40
5 10 15 20 25 30 35 40 45 50
6 12 18 24 30 36 42 48 54 60
7 14 21 28 35 42 49 56 63 70
8 16 24 32 40 48 56 64 72 80
9 18 27 36 45 54 63 72 81 90
10 20 30 40 50 60 70 80 90 100
Hope it would help you to get 1 to 10 table.
a = [1,2,3,4,5,6,7,8,9,10]
for i in a:
print(*("{:3}" .format (i*col) for col in a))
print()