how to replace part of columns with replace methods [duplicate]

how to replace part of columns with replace methods [duplicate] - python

This question already has answers here:
Implementing thousands (1k = 1000, 1kk = 1000000) interpreter
(3 answers)
Convert the string 2.90K to 2900 or 5.2M to 5200000 in pandas dataframe
(6 answers)
Closed 11 months ago.
I have a data frame
df3= pd.DataFrame(
{"a": ["21.7K","22.7K","1.7K"]}
)
I would like to change the type of a to int, I try to replace "K" with "000", but the replace()method doesn't work even I set inplace = True. And it seems replacing with "000" also causes new problems with the inaccurate numbers. how can I change the data type to int?

Related

Python zip with multiple arrays [duplicate]

This question already has answers here:
passing one list of values instead of mutiple arguments to a function?
(2 answers)
What do the * (star) and ** (double star) operators mean in a function call?
(4 answers)
Closed 2 years ago.
I want to create a pandas DataFrame with a array of columns, but instead of
data = list(zip(column1, column2, column3))
I want to use something like this
columns = [column1, column2, column3]
data = list(zip(columns))
Is it possible?

you need to use * operator in python.
data = list(zip(*columns))

Python math operation on column [duplicate]

This question already has answers here:
Convert pandas.Series from dtype object to float, and errors to nans
(3 answers)
Closed 3 years ago.
Data from json is in df and am trying to ouput to a csv.
I am trying to multiply dataframe column with a fixed value and having issues how data is displayed
I have used the following but the data is still not how i want to display
df_entry['Hours'] = df_entry['Hours'].multiply(2)
df_entry['Hours'] = df_entry['Hours'] * 2
Input
ID, name,hrs
100,AB,37.5
Expected
ID, name,hrs
100,AB,75.0
What I am getting
ID, name,hrs
100,AB,37.537.5

That happens because the dtype of the column is str. You need to convert it to float before multiplication.
df_entry['Hours'] = df_entry['Hours'].astype(float) * 2

You can use apply function.
df_entry['Hours'] = df_entry['Hours'].apply(lambda x: float(int(x))*2)

Conditional check in pandas dataframe loc [duplicate]

This question already has answers here:
Logical operators for Boolean indexing in Pandas
(4 answers)
Pandas column access w/column names containing spaces
(6 answers)
Closed 3 years ago.
I'm referring to this document https://datatofish.com/if-condition-in-pandas-dataframe/
The part - (3) IF condition - strings
I'm trying to implement it with 2 conditions as:
x.loc[x.Test Status == 'Finished' and x.Results Validation == 'In Limits', 'Outcome'] = 'PASS'
I've a invalid syntax error. How do I handle this? I've tried multiple workarounds like np.where but no luck.

Performance of Pandas string contains for column [duplicate]

This question already has answers here:
Pandas filtering for multiple substrings in series
(3 answers)
Closed 4 years ago.
I have a DataFrame of 83k rows and a column "Text" of text that i have to search for ~200 masks. Is there a way to pass a column to .str.contains()?
I'm able to do it like this:
start = time.time()
[a["Text"].str.contains(m).sum() for m in \
b["mask"].values]
print time.time() - start
But it's taking 34.013s. Is there any faster way?
Edit:
b["mask"] looks like:
'PR347856|P5478'
'BS7623|B5763'
and i want the count of occurances for each mask, so i can't join them.
Edit:
a["text"] contains strings of the size of ~ 3 sentences

Maybe you can vectorize the containment operation.
text_contains = a['Text'].str.contains
b['mask'].map(lambda m: text_contains(m).sum())

converting pandas data frame to dictionary of tuples [duplicate]

This question already has answers here:
pandas DataFrame to dict with values as tuples
(2 answers)
Closed 5 years ago.
I am using the following code to convert a Dataframe whose structure is as follows
dummy= df.set_index(['location']).T.to_dict('list')
for key,value in dummy.items():
dummy[key] = tuple(value)
to obtain a dictionary of tuples
{loc_1:(35.99,-81.44),loc_2:(22.55,-108.5)}
Question
1. Will the order be preserved as lat-long? (Is there a chance the first tuple can turn out to be (-81.44,35.99)?
Question 2. Is there a better(faster/elegant)way of doing the above

Using a comprehension and itertuples
dict([(t.location, (t.lat, t.long)) for t in df.itertuples()])
{loc_1: (35.99, -81.44), loc_2: (22.55, -108.5)}

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to replace part of columns with replace methods [duplicate] - python

Related

Python zip with multiple arrays [duplicate]

Python math operation on column [duplicate]

Conditional check in pandas dataframe loc [duplicate]

Performance of Pandas string contains for column [duplicate]

converting pandas data frame to dictionary of tuples [duplicate]

Categories

Resources