Find and replace values in df column in Pyhton

Find and replace values in df column in Pyhton - python

I have a lot of data calculated and stored in a dataframe. A small example from the datatable:
enter image description here
At first I calculate all the values in the 3th column. After that I want to change every value that is bigger than the value 2. Is there a function where I can find all the values bigger than 2 and replace them by another value?
I can only find a function to replace in a dataframe when a specific value is present, but I can't determine all the values and the location in the column up front.
The function I tried: df.loc[df['Zelfconsumptie'] > 2, 'Zelfconsumptie'] = 2

To find all values in a given column 'Zelfconsumptie' in df that are greater than 2 and set those values = 2 use this:
df[df['Zelfconsumptie'] > 2] = 2

Related

How to get the whole row with column names while looking for a minimum value?

I have a data frame. There are four columns.
I can find a minimum number using this code:
df_temp=df_A2C.loc[ (df_A2C['TO_ID'] == 7)]
mini_value = df_temp['DURATION_H'].min()
print("minimum value in column 'TO_ID' is: " , mini_value)
Output:
minimum value in column 'TO_ID' is: 0.434833333333333
Now, I am trying to get the whole row with all column names while looking for a minimum value using TO_ID. Something like this.
How can we get the whole row with all column names while looking for a minimum value?

if you post the data as a code or text, I would have been able to share the result
assumption: you're searching for a minimum value for a specific to_id
# as per your code, filter out by to_id
# sort the result on duration and take the top value
df_A2C.loc[ (df_A2C['TO_ID'] == 7)].sort_values('DURATION_H').head(1)

How to change DataFrame column values so that mean is modified accordingly?

I have a Pandas DataFrame extracted from Estespark Weather for the dates between Sep-2009 and Oct-2018, and the mean of the Average windspeed column is 4.65. I am taking a challenge where there is a sanity check where the mean of this column needed to be 4.64. How can I modify the values of this column so that the mean of this column becomes 4.64? Is there any code solution for this, or do we have to do it manually?

I can see two solutions:
Substract 0.01 (4.65 - 4.64) to every value of that column like:
df['AvgWS'] -= 0.01
2 If you dont want to alter all rows: find wich rows you can remove to give you the desired mean (if there are any):
current_mean = 4.65
desired_mean = 4.64
n_rows = len(df['AvgWS'])
df['can_remove'] = df['AvgWS'].map(lambda x: (current_mean*n_rows - x)/(n_rows-1) == 4.64)
This will create a new boolean column in your dataframe with True in the rows that, if removed, make the rest of the column's mean = 4.64. If there are more than one you can analyse them to choose which one seems less important and then remove that one.

Finding rows with highest means in dataframe

I am trying to find the rows, in a very large dataframe, with the highest mean.
Reason: I scan something with laser trackers and used a "higher" point as reference to where the scan starts. I am trying to find the object placed, through out my data.
I have calculated the mean of each row with:
base = df.mean(axis=1)
base.columns = ['index','Mean']
Here is an example of the mean for each row:
0 4.407498
1 4.463597
2 4.611886
3 4.710751
4 4.742491
5 4.580945
This seems to work fine, except that it adds an index column, and gives out columns with an index of type float64.
I then tried this to locate the rows with highest mean:
moy = base.loc[base.reset_index().groupby(['index'])['Mean'].idxmax()]
This gives out tis :
index Mean
0 0 4.407498
1 1 4.463597
2 2 4.611886
3 3 4.710751
4 4 4.742491
5 5 4.580945
But it only re-index (I have now 3 columns instead of two) and does nothing else. It still shows all rows.

Here is one way without using groupby
moy=base.sort_values('Mean').tail(1)

It looks as though your data is a string or single column with a space in between your two numbers. Suggest splitting the column into two and/or using something similar to below to set the index to your specific column of interest.
import pandas as pd
df = pd.read_csv('testdata.txt', names=["Index", "Mean"], delimiter="\s+")
df = df.set_index("Index")
print(df)

Update Pandas dataframe value based on present value

I have a Pandas dataframe with values which should lie between, say, 11-100. However, sometimes I'll have values between 1-10, and this is because the person who was entering that row used a convention that the value in question should be multiplied by 10. So what I'd like to do is run a Pandas command which will fix those particular rows by multiplying their value by 10.
I can reference the values in question by doing something like
my_dataframe[my_dataframe['column_name']<10]
and I could set them all to a particular value, like 50, like so
my_dataframe[my_dataframe['column_name']<10] = 50
but how do I set them to a value which is 10* the value of that particular row?

I think you can use:
my_dataframe[my_dataframe['column_name']<10] *= 10

Rejecting zero values when creating a list of minimum values. (Python Field Calc)

I'm trying to create a list of minimum values from four columns of values. Below is the statement I have used.
min ([!Depth!, !Depth_1!, !Depth_12!, !Depth_1_13!])
The problem I'm having is that some of the fields under these columns contain zeros. I need it to return the next lowest value from the columns that is greater than zero.
I have an attribute table for a shapefile from an ArcGIS document. It has 10 columns. ID, Shape, Buffer ID (x4), Depth (x4).
I need to add an additional column to this data which represents the minimum number from the 4 depth columns. Many of the cells in this column are equal to zero. I need the new column to take the minimum value from the four depth columns but ignore the zero values and take the next lowest value.
A screen shot of what I am working from:

Create a function that does it for you. I added a pic so you can follow the steps. Just change the input names to your column names.
def my_min(d1,d2,d3,d4):
lst = [d1,d2,d3,d4]
return min([x for x in lst if x !=0])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Find and replace values in df column in Pyhton - python

To find all values in a given column 'Zelfconsumptie' in df that are greater than 2 and set those values = 2 use this: df[df['Zelfconsumptie'] > 2] = 2

Related

How to get the whole row with column names while looking for a minimum value?

How to change DataFrame column values so that mean is modified accordingly?

Finding rows with highest means in dataframe

Update Pandas dataframe value based on present value

Rejecting zero values when creating a list of minimum values. (Python Field Calc)

Categories

Resources