*edited DataFrame random generator
I have 2 dfs, one used as a mask for the other.
rndm = pd.DataFrame(np.random.randint(0,15,size=(100, 4)), columns=list('ABCD'))
rndm_mask = pd.DataFrame(np.random.randint(0,2,size=(100, 4)), columns=list('ABCD'))
I want to use 2 conditions to change the values in rndm:
Is the value the mode of the column?
rndm_mask == 1
What works so far:
def colorBoolean(val):
return f'background-color: {"red" if val else ""}'
rndm.style.apply(lambda _: rndm_mask.applymap(colorBoolean), axis=None)
# helper function to find Mode
def highlightMode(s):
# Get mode of columns
mode_ = s.mode().values
# Apply style if the current value is in mode_ array (len==1)
return ['background-color: yellow' if v in mode_ else '' for v in s]
Issue:
I'm unsure how to chain both functions in a way that values in rndm are highlighted only if they match both criteria (ie. value must be the most frequent value in column as well as be True in rndm_mask).
I appreciate any advice! Thanks
Try this, since your df_bool dataframe is a mask (identically indexed) then you can referred to the df_bool object inside the style function, where x.name is the name of the column passed in via df.apply:
df = pd.DataFrame({'A':[5.5, 3, 0, 3, 1],
'B':[2, 1, 0.2, 4, 5],
'C':[3, 1, 3.5, 6, 0]})
df_bool = pd.DataFrame({'A':[0, 1, 0, 0, 1],
'B':[0, 0, 1, 0, 0],
'C':[1, 1, 1, 0, 0]})
# I want to use 2 conditions to change the values in df:
# Is the value the mode of the column?
# df_bool == 1
# What works so far:
def colorBoolean(x):
return [f'background-color: red' if v else '' for v in df_bool[x.name]]
# helper function to find Mode
def highlightMode(s):
# Get mode of columns
mode_ = s.mode().values
# Apply style if the current value is in mode_ array (len==1)
return ['background-color: yellow' if v in mode_ else '' for v in s]
df.style.apply(colorBoolean).apply(highlightMode)
Output:
Or the other way:
df.style.apply(highlightMode).apply(colorBoolean)
Output:
Update
Highlight where both are true:
def highlightMode(s):
# Get mode of columns
mode_ = s.mode().values
# Apply style if the current value is in mode_ array (len==1)
return ['background-color: yellow' if (v in mode_) & b else '' for v, b in zip(s, df_bool[s.name])]
df.style.apply(highlightMode)
Output:
Related
Suppose we have a categorical variable
Age['0-17','18-25','35-40','55+']
What should we prefer; OneHotEncoding, LabelEncoding or Mapping (like assigning data values such as '0-17':1, '18-25':2) and Why?
You can solve this problem with pure python like below:
age = ['0-17','18-25','35-40','40-55', '55-70', '70-85', '85+']
rng = range(len(age))
# If you want label start from '1'
# rng = range(1,len(age)+1)
res = dict(zip(age, rng))
print(res)
Output:
{'0-17': 0, '18-25': 1, '35-40': 2, '40-55': 3, '55-70': 4, '70-85': 5, '85+': 6}
I'm trying to make a function that, for each entry in a dictionary, takes the mean for every entry's list of values and returns the key that corresponds to the lowest mean. If negative values are present anywhere in the list, it should first remove them then compute the means as normally.
For example:
least_mean({'Option1': [0, 1, 2], 'Option2': [8, 9, -9999],'Option3': [0, -9999, 5, 3]})
should return 'Option1' because it has the lowest mean with a value of 1.5
My Attempt
def least_mean(string):
empty = []
for i in string:
empty.append((i,sum([j for j in string[i] if j > 0])/len([j for j in string[i] if j > 0])))
return empty
I have created a function that returns a list of tuples containing each option and their mean. However, I'm not sure how to make this function more specific to return 'Option1' by itself. For example, plugging ({'Option1': [0, 1, 2], 'Option2': [8, 9, -9999],'Option3': [0, -9999, 5, 3]}) in returns
[('Option1', 1.5), ('Option2', 8.5), ('Option3', 4.0)]
but I would like to get 'Option1' alone. If possible, could this be done in a one line list comprehension without imports?
You can use min() with a custom key parameter to find the key corresponding to the minimum mean, ignoring negative values. No list comprehension needed:
from statistics import mean
data = {'Option1': [0, 1, 2], 'Option2': [8, 9, -9999],'Option3': [0, -9999, 5, 3]}
result = min(data, key=lambda x: mean(filter(lambda x: x >= 0, data[x])))
print(result)
If you don't want to use the statistics import, you can use:
result = min(data, key=lambda x: sum(filter(lambda x: x >= 0, data[x])) \
/ sum(1 for _ in filter(lambda x: x >= 0, data[x])))
This outputs:
Option1
If you want to add to your existing approach to extract the minimum option, use:
data = [('Option1', 1.5), ('Option2', 8.5), ('Option3', 4.0)]
result, _ = min(data, key=lambda x: x[1])
print(result)
This also outputs:
Option1
Given an input dataframe and string:
df = pd.DataFrame({"A" : [10, 20, 30], "B" : [0, 1, 8]})
colour = "green" #or "red", "blue" etc.
I want to add a new column df["C"] conditional on the values in df["A"], df["B"] and colour so it looks like:
df = pd.DataFrame({"A" : [4, 2, 10], "B" : [1, 4, 3], "C" : [True, True, False]})
So far, I have a function that works for just the input values alone:
def check_passing(colour, A, B):
if colour == "red":
if B < 5:
return True
else:
return False
if colour == "blue":
if B < 10:
return True
else:
return False
if colour == "green":
if B < 5:
if A < 5:
return True
else:
return False
else:
return False
How would you go about using this function in df.assign() so that it calculates this for each row? Specifically, how do you pass each column to check_passing()?
df.assign() allows you to refer to the columns directly or in a lambda, but doesn't work within a function as you're passing in the entire column:
df = df.assign(C = check_passing(colour, df["A"], df["B"]))
Is there a way to avoid a long and incomprehensible lambda? Open to any other approaches or suggestions!
Applying a function like that can be inefficient, especially when dealing with dataframes with many rows. Here is a one-liner:
colour = "green" #or "red", "blue" etc.
df['C'] = ((colour == 'red') & df['B'].lt(5)) | ((colour == 'blue') & df['B'].lt(5)) | ((colour == 'green') & df['B'].lt(5) & df['A'].lt(5))
I want to fill missing values with like this:
data = pd.read_csv("E:\\SPEED.csv")
Data - DataFrame
Case - 1
if flcass= "motorway", "motorway_link", "trunk" or "trunk_link"
I want to replace the text "nan" with 110
Case - 2
if flcass= "primary", "primary_link", "secondary" or "secondary_link"
I want to replace the text "nan" with 70
Case - 3
if "fclass" is another value, I want to change it to 40.
I would be grateful for any help.
Two ways in pandas:
df = DataFrame(
{
"A": [1, 2, np.nan, 4],
"B": [1, 4, 9, np.nan],
"C": [1, 2, 3, 5],
"D": list("abcd"),
}
)
fillna lets you fill NA's (or NaNs) with what appears to be a fixed value:
df['B'].fillna(12)
[1,4,9,12]
interpolate uses scipy's interpolation methods -- linear by default:
df.interpolate()
df['A']
[1,2,3,4]
Thank you all for your answers. However, as there are 6812 rows and 16 columns (containing nan values) in the data, it seems that different solutions are required.
You can try this
import pandas as pd
import math
def valuesMapper(data, valuesDict, columns_to_update):
for i in columns_to_update:
data[i] = data[i].apply(lambda x: valuesDict.get(x, 40) if math.isnan(x) else x)
return data
data = pd.read_csv("E:\\SPEED.csv")
valuesDict = {"motorway":110, "motorway_link":110, "trunk":110, "primary":70, "primary_link":70, "secondary":70, "secondary_link":70}
column_to_update = ['AGU_PZR_07_10'] #columns_to_update is the list of columns to be updated, you can get it through code didn't added that as i dont have your data
print(valuesMapper(data, valuesDict, columns_to_update))
With the below example:
data = pandas.DataFrame({
'flclass': ['a', 'b', 'c', 'a'],
'AGU': [float('nan'), float('nan'), float('nan'), 9]
})
You can update it using numpy conditionals iterating over your columns starting from 2nd ([1:]) - 5th ([4:]) in your data:
for column in data.columns[1:]:
data[column] = np.where((data['flclass'] == 'b') & (data[column].isna()), 110, data[column])
Or panadas apply:
import numpy as np
data['AGU'] = data.apply(
lambda row: 110 if np.isnan(row['AGU']) and row['flclass'] in ("b","a") else row['AGU'],
axis=1,
)
where you can replace ("b","a") with eg ("motorway", "motorway_link", "trunk", "trunk_link")
So I have this function called replace_elem, written below:
def replace_elem(lst, index, elem):
"""Create and return a new list whose elements are the same as those in
LST except at index INDEX, which should contain element ELEM instead.
>>> old = [1, 2, 3, 4, 5, 6, 7]
>>> new = replace_elem(old, 2, 8)
>>> new
[1, 2, 8, 4, 5, 6, 7]
>>> new is old # check that replace_elem outputs a new list
False
"""
assert index >= 0 and index < len(lst), 'Index is out of bounds'
return [elem if i == lst[index] else i for i in lst]
I want to write this function below:
def put_piece(board, max_rows, column, player):
"""Puts PLAYER's piece in the bottommost empty spot in the given column of
the board. Returns a tuple of two elements:
1. The index of the row the piece ends up in, or -1 if the column
is full.
2. The new board
>>> rows, columns = 2, 2
>>> board = create_board(rows, columns)
>>> row, new_board = put_piece(board, rows, 0, 'X')
>>> row
1
>>> row, new_board = put_piece(new_board, rows, 0, 'O')
>>> row
0
>>> row, new_board = put_piece(new_board, rows, 0, 'X')
>>> row
-1
"""
The hint was that I would use the replace_elem twice, but what I'm wondering is that replace_elem only takes in one index to give the location for what to replace so I'm curious as to how I could access lets say the first row and 3rd column index in python using only one subscript notation. Note I also have to return the whole board and not just the row
This isn't homework but self study as the material for this course is posted online for free and course has finished.
I believe this what you're looking for. My assumption here is an empty spot in the board will be 0.
I also had to modify your replace_elem as you should be looking for the index and replacing that value with elem.
def replace_elem(lst, index, elem):
assert index >= 0 and index < len(lst), 'Index out of bounds'
return [elem if i == index else lst[i] for i in range(len(lst))]
def put_piece(board, max_rows, column, player):
# return the column in board
board_col = list(map(lambda x: x[column], board))
try:
# find an the last empty element - empty == 0
row = len(board_col) - board_col[::-1].index(0) - 1
except ValueError:
return -1, board
new_col = replace_elem(board_col, row, player)
return row, [[board[r][c] if c != column else new_col[r] for c in range(len(board[r]))] for r in range(len(board))]
examples:
board = [[0, 0],[0,0]]
row, new_board = put_piece(board, 2, 0, 'X')
print('row: %s, board: %s' %(row, new_board))
Output: row: 1, board: [[0, 0], ['X', 0]]
row, new_board = put_piece(new_board, 2, 0, 'O')
print('row: %s, board: %s' %(row, new_board))
Output: row: 0, board: [['O', 0], ['X', 0]]
row, new_board = put_piece(new_board, 2, 0, 'X')
print('row: %s, board: %s' %(row, new_board))
Output: row: -1, board: [['O', 0], ['X', 0]]