How to convert object to float in Pandas? - python

I read a csv file into a pandas dataframe and got all column types as objects. I need to convert the second and third columns to float.
I tried using
df["Quantidade"] = pd.to_numeric(df.Quantidade, errors='coerce')
but got NaN.
Here's my dataframe. Should I need to use some regex in the third column to get rid of the "R$ "?

Try this:
# sample dataframe
d = {'Quantidade':['0,20939', '0,0082525', '0,009852', '0,012920', '0,0252'],
'price':['R$ 165.000,00', 'R$ 100.000,00', 'R$ 61.500,00', 'R$ 65.900,00', 'R$ 49.375,12']}
df = pd.DataFrame(data=d)
# Second column
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)
#Third column
df['price'] = df.price.str.replace(r'\w+\$\s+', '').str.replace('.', '')\
.str.replace(',', '.').astype(float)
Output:
Quantidade price
0 0.209390 165000.00
1 0.008252 100000.00
2 0.009852 61500.00
3 0.012920 65900.00
4 0.025200 49375.12

Try something like this:
df["Quantidade"] = df["Quantidade"].str.replace(',', '.').astype(float)

df['Quantidade'] = df['Quantidade'].astype(float)

Related

add string based on conditional formating of a dataframe

data = {"marks":[1,2,3,4,5,6,7,8,9,10,11,12], "month":['jan','feb','mar','apr','may','jun','jul','aug','sep','oct','nov','dec']}
df2 = pd.DataFrame(data)
Till now I tried below but not getting as mentioned above:
for i in df2['month']:
if (i=='jan' or i=='feb' or i=='mar'):
df2['q'] = '1Q'
else:
df2['q']='other'
Use Series.dt.quarter with convert column to datetimes and add q:
df2['new'] = 'q' + pd.to_datetime(df2['month'], format='%b').dt.quarter.astype(str)
Or use Series.map by dictionary:
d = {'jan':'q1', 'feb':'q1','mar':'q1',
'apr':'q2','may':'q2', 'jun':'q2',
'jul':'q3','aug':'q3', 'sep':'q3',
'oct':'q4','nov':'q4', 'dec':'q4'}
df2['new'] = df2['month'].map(d)

How to extract sub string by defining before and after delimiter

I have data frame which contains the URLs and I want to extract something in between.
df
URL
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg
desired output would be like this
URL Type
https://storage.com/vision/Glass2020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Glass2020
https://storage.com/vision/Carpet5020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Carpet5020
https://storage.com/vision/Metal8020/2020-02-04_B8I8FZHl-xJ_2236301468348443721.jpg Metal8020
I would use df['URL'].str.extract but to understand how to define before and after the delimiter.
One idea is use Series.str.split with select second last value by indexing:
df['Type'] = df['URL'].str.split('/').str[-2]
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
EDIT: For specify different values outside expected output use Series.str.extract:
df['Type'] = df['URL'].str.extract('vision/(.+)/2020')
print (df)
URL Type
0 https://storage.com/vision/Glass2020/2020-02-0... Glass2020
1 https://storage.com/vision/Carpet5020/2020-02-... Carpet5020
2 https://storage.com/vision/Metal8020/2020-02-0... Metal8020
Try str.split:
df['Type'] = df.URL.str.split('/').str[-2]

Subset string rows that contain a 'flexible' pattern

I have the following df.
data = [
['DWWWWD'],
['DWDW'],
['WDWWWWWWWWD'],
['DDW'],
['WWD'],
]
df = pd.DataFrame(data, columns=['letter_sequence'])
I want to subset the rows that contain the pattern 'D' + '[whichever number of W's]' + 'D'. Examples of rows I want in my output df: DWD, DWWWWWWWWWWWD, WWWWWDWDW...
I came up with the following, but it does not really work for 'whichever number of W's'.
df[df['letter_sequence'].str.contains(
'DWD|DWWD|DWWWD|DWWWWD|DWWWWWD|DWWWWWWD|DWWWWWWWD|DWWWWWWWWD', regex=True
)]
Desired output new_df:
letter_sequence
0 DWWWWD
1 DWDW
2 WDWWWWWWWWD
Any alternatives?
Use [W]{1,} for one or more W, regex=True is by default, so should be omit:
df = df[df['letter_sequence'].str.contains('D[W]{1,}D')]
print (df)
letter_sequence
0 DWWWWD
1 DWDW
2 WDWWWWWWWWD
You can use the regex: D\w+D.
The code is shown below:
df = df[df['letter_sequence'].str.contains('Dw+D')]
Please let me know if it helps.

DataFrame with one column 0 to 100

I need a DataFrame of one column ['Week'] that has all values from 0 to 100 inclusive.
I need it as a Dataframe so I can perform a pd.merge
So far I have tried creating an empty DataFrame, creating a series of 0-100 and then attempting to append this series to the DataFrame as a column.
alert_count_list = pd.DataFrame()
week_list= pd.Series(range(0,101))
alert_count_list['week'] = alert_count_list.append(week_list)
Try this:
df = pd.DataFrame(columns=["week"])
df.loc[:,"week"] = np.arange(101)
alert_count_list = pd.DataFrame(np.zeros(101), columns=['week'])
or
alert_count_list = pd.DataFrame({'week':range(101)})
You can try:
week_vals = []
for i in range(0, 101):
week_vals.append(i)
df = pd.Dataframe(columns = ['week'])
df['week'] = week_vals

get column name that contains a specific value in pandas

I want to get column name from the whole database (assume the database contains more than 100 rows with more than 50 column) based on specific value that contain in a specific column in pandas.
Here is my code:
import pandas as pd
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9]})
pos = 2
response = raw_input("input")
placeholder = (df == response).idxmax(axis=1)[0]
print df
print (placeholder)
Tried a lot . . .
Example:
when the user will input 2; it will show answer: A
if the input is 4; feedback will be B
and if 7 then reply will be C
tried iloc but I've seen row have to be noticed there.
Please Help Dear Guys . . . . .
Thanks . . . :)
Try this
for i in df.columns:
newDf = df.loc[lambda df: df[i] == response]
if(not newDf.empty):
print(i)
First of all you should treat the input as integer. So instead of raw_input, use input:
response = input("input")
After that you can use any:
df[df==YOUR_VALUE].any()
This will return a boolean Series with columns names and whether they contain the value you are looking for.
In your example:
df = pd.DataFrame({'A':[1,2,3], 'B':[4,5,6], 'C':[7,8,9]})
response = input("input")
placeholder = df[df==response].any()
for input 4 the output will be:
A False
B True
C False
dtype: bool

Categories

Resources