Find and replace values in df column in Pyhton - python

I have a lot of data calculated and stored in a dataframe. A small example from the datatable:
enter image description here
At first I calculate all the values in the 3th column. After that I want to change every value that is bigger than the value 2. Is there a function where I can find all the values bigger than 2 and replace them by another value?
I can only find a function to replace in a dataframe when a specific value is present, but I can't determine all the values and the location in the column up front.
The function I tried: df.loc[df['Zelfconsumptie'] > 2, 'Zelfconsumptie'] = 2

To find all values in a given column 'Zelfconsumptie' in df that are greater than 2 and set those values = 2 use this:
df[df['Zelfconsumptie'] > 2] = 2

Related

How to get the whole row with column names while looking for a minimum value?

I have a data frame. There are four columns.
I can find a minimum number using this code:
df_temp=df_A2C.loc[ (df_A2C['TO_ID'] == 7)]
mini_value = df_temp['DURATION_H'].min()
print("minimum value in column 'TO_ID' is: " , mini_value)
Output:
minimum value in column 'TO_ID' is: 0.434833333333333
Now, I am trying to get the whole row with all column names while looking for a minimum value using TO_ID. Something like this.
How can we get the whole row with all column names while looking for a minimum value?
if you post the data as a code or text, I would have been able to share the result
assumption: you're searching for a minimum value for a specific to_id
# as per your code, filter out by to_id
# sort the result on duration and take the top value
df_A2C.loc[ (df_A2C['TO_ID'] == 7)].sort_values('DURATION_H').head(1)

How to change DataFrame column values so that mean is modified accordingly?

I have a Pandas DataFrame extracted from Estespark Weather for the dates between Sep-2009 and Oct-2018, and the mean of the Average windspeed column is 4.65. I am taking a challenge where there is a sanity check where the mean of this column needed to be 4.64. How can I modify the values of this column so that the mean of this column becomes 4.64? Is there any code solution for this, or do we have to do it manually?
I can see two solutions:
Substract 0.01 (4.65 - 4.64) to every value of that column like:
df['AvgWS'] -= 0.01
2 If you dont want to alter all rows: find wich rows you can remove to give you the desired mean (if there are any):
current_mean = 4.65
desired_mean = 4.64
n_rows = len(df['AvgWS'])
df['can_remove'] = df['AvgWS'].map(lambda x: (current_mean*n_rows - x)/(n_rows-1) == 4.64)
This will create a new boolean column in your dataframe with True in the rows that, if removed, make the rest of the column's mean = 4.64. If there are more than one you can analyse them to choose which one seems less important and then remove that one.

Finding rows with highest means in dataframe

I am trying to find the rows, in a very large dataframe, with the highest mean.
Reason: I scan something with laser trackers and used a "higher" point as reference to where the scan starts. I am trying to find the object placed, through out my data.
I have calculated the mean of each row with:
base = df.mean(axis=1)
base.columns = ['index','Mean']
Here is an example of the mean for each row:
0 4.407498
1 4.463597
2 4.611886
3 4.710751
4 4.742491
5 4.580945
This seems to work fine, except that it adds an index column, and gives out columns with an index of type float64.
I then tried this to locate the rows with highest mean:
moy = base.loc[base.reset_index().groupby(['index'])['Mean'].idxmax()]
This gives out tis :
index Mean
0 0 4.407498
1 1 4.463597
2 2 4.611886
3 3 4.710751
4 4 4.742491
5 5 4.580945
But it only re-index (I have now 3 columns instead of two) and does nothing else. It still shows all rows.
Here is one way without using groupby
moy=base.sort_values('Mean').tail(1)
It looks as though your data is a string or single column with a space in between your two numbers. Suggest splitting the column into two and/or using something similar to below to set the index to your specific column of interest.
import pandas as pd
df = pd.read_csv('testdata.txt', names=["Index", "Mean"], delimiter="\s+")
df = df.set_index("Index")
print(df)

Update Pandas dataframe value based on present value

I have a Pandas dataframe with values which should lie between, say, 11-100. However, sometimes I'll have values between 1-10, and this is because the person who was entering that row used a convention that the value in question should be multiplied by 10. So what I'd like to do is run a Pandas command which will fix those particular rows by multiplying their value by 10.
I can reference the values in question by doing something like
my_dataframe[my_dataframe['column_name']<10]
and I could set them all to a particular value, like 50, like so
my_dataframe[my_dataframe['column_name']<10] = 50
but how do I set them to a value which is 10* the value of that particular row?
I think you can use:
my_dataframe[my_dataframe['column_name']<10] *= 10

Rejecting zero values when creating a list of minimum values. (Python Field Calc)

I'm trying to create a list of minimum values from four columns of values. Below is the statement I have used.
min ([!Depth!, !Depth_1!, !Depth_12!, !Depth_1_13!])
The problem I'm having is that some of the fields under these columns contain zeros. I need it to return the next lowest value from the columns that is greater than zero.
I have an attribute table for a shapefile from an ArcGIS document. It has 10 columns. ID, Shape, Buffer ID (x4), Depth (x4).
I need to add an additional column to this data which represents the minimum number from the 4 depth columns. Many of the cells in this column are equal to zero. I need the new column to take the minimum value from the four depth columns but ignore the zero values and take the next lowest value.
A screen shot of what I am working from:
Create a function that does it for you. I added a pic so you can follow the steps. Just change the input names to your column names.
def my_min(d1,d2,d3,d4):
lst = [d1,d2,d3,d4]
return min([x for x in lst if x !=0])

Categories

Resources