Conditional check in pandas dataframe loc [duplicate] - python

This question already has answers here:
Logical operators for Boolean indexing in Pandas
(4 answers)
Pandas column access w/column names containing spaces
(6 answers)
Closed 3 years ago.
I'm referring to this document https://datatofish.com/if-condition-in-pandas-dataframe/
The part - (3) IF condition - strings
I'm trying to implement it with 2 conditions as:
x.loc[x.Test Status == 'Finished' and x.Results Validation == 'In Limits', 'Outcome'] = 'PASS'
I've a invalid syntax error. How do I handle this? I've tried multiple workarounds like np.where but no luck.

Related

how to replace part of columns with replace methods [duplicate]

This question already has answers here:
Implementing thousands (1k = 1000, 1kk = 1000000) interpreter
(3 answers)
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 11 months ago.
I have a data frame
df3= pd.DataFrame(
{"a": ["21.7K","22.7K","1.7K"]}
)
I would like to change the type of a to int, I try to replace "K" with "000", but the replace()method doesn't work even I set inplace = True. And it seems replacing with "000" also causes new problems with the inaccurate numbers. how can I change the data type to int?

Python zip with multiple arrays [duplicate]

This question already has answers here:
passing one list of values instead of mutiple arguments to a function?
(2 answers)
What do the * (star) and ** (double star) operators mean in a function call?
(4 answers)
Closed 2 years ago.
I want to create a pandas DataFrame with a array of columns, but instead of
data = list(zip(column1, column2, column3))
I want to use something like this
columns = [column1, column2, column3]
data = list(zip(columns))
Is it possible?
you need to use * operator in python.
data = list(zip(*columns))

Performance of Pandas string contains for column [duplicate]

This question already has answers here:
Pandas filtering for multiple substrings in series
(3 answers)
Closed 4 years ago.
I have a DataFrame of 83k rows and a column "Text" of text that i have to search for ~200 masks. Is there a way to pass a column to .str.contains()?
I'm able to do it like this:
start = time.time()
[a["Text"].str.contains(m).sum() for m in \
b["mask"].values]
print time.time() - start
But it's taking 34.013s. Is there any faster way?
Edit:
b["mask"] looks like:
'PR347856|P5478'
'BS7623|B5763'
and i want the count of occurances for each mask, so i can't join them.
Edit:
a["text"] contains strings of the size of ~ 3 sentences
Maybe you can vectorize the containment operation.
text_contains = a['Text'].str.contains
b['mask'].map(lambda m: text_contains(m).sum())

Partial Indexing Error in Python Series [duplicate]

This question already has answers here:
key error and MultiIndex lexsort depth
(1 answer)
What exactly is the lexsort_depth of a multi-index Dataframe?
(1 answer)
Closed 5 years ago.
I have created a Hierarchical indexed Series and I wanted to partially index some values of the Series. But When I changed the alphabetic order of the Series. The partially indexing is not working. Can anybody explain why is this happening?
with Some better and logical explanation.
sr = Series(np.arange(11),index=[['a','b','b','c','d','d','e','e','f','f','f'],[1,2,1,3,1,2,1,2,1,2,3]])
print (sr['a':'c'])
This gives the resultant output but when I change the alphabetic order of the indexes, the partial indexing gives an error.
hs = Series(np.arange(10),index=[['a','a','b','b','c','c','d','e','e','a'],[1,0,2,1,0,1,1,3,2,3]])
print(hs['a':'c'])
pandas.errors.UnsortedIndexError: 'Key length (1) was greater than MultiIndex lexsort depth (0)'

converting pandas data frame to dictionary of tuples [duplicate]

This question already has answers here:
pandas DataFrame to dict with values as tuples
(2 answers)
Closed 5 years ago.
I am using the following code to convert a Dataframe whose structure is as follows
dummy= df.set_index(['location']).T.to_dict('list')
for key,value in dummy.items():
dummy[key] = tuple(value)
to obtain a dictionary of tuples
{loc_1:(35.99,-81.44),loc_2:(22.55,-108.5)}
Question
1. Will the order be preserved as lat-long? (Is there a chance the first tuple can turn out to be (-81.44,35.99)?
Question 2. Is there a better(faster/elegant)way of doing the above
Using a comprehension and itertuples
dict([(t.location, (t.lat, t.long)) for t in df.itertuples()])
{loc_1: (35.99, -81.44), loc_2: (22.55, -108.5)}

Categories

Resources