How do I multiply a dataframe column by a float constant? - python

I'm trying to multiply a column by a float. I have the code for it here:
if str(cMachineName)==str("K42"):
df_temp.loc[:, "P"] *= float((105.0* 59.0*math.pi*0.95/1000)/3540)
But it gives me this error:
TypeError: can't multiply sequence by non-int of type 'float'.
How do I solve it?

I think problem is some non numeric values like 45 as string:
Solution is converting to float, int by astype:
df_temp = pd.DataFrame({'P':[1,2.5,'45']})
print (df_temp['P'].dtype)
object
df_temp["P"] = df_temp["P"].astype(float)
df_temp["P"] *= float((105.0* 59.0*math.pi*0.95/1000)/3540)
print (df_temp)
P
0 0.005223
1 0.013057
2 0.235030
Another problem is non numeric data like gh, for converting is necessary to_numeric with errors='coerce' for converting them to NaNs:
df_temp = pd.DataFrame({'P':[1,2.5,'gh']})
print (df_temp['P'].dtype)
object
df_temp["P"] = pd.to_numeric(df_temp["P"], errors='coerce')
print (df_temp)
P
0 1.0
1 2.5
2 NaN
df_temp["P"] *= float((105.0* 59.0*math.pi*0.95/1000)/3540)
print (df_temp)
P
0 0.005223
1 0.013057
2 NaN

Maybe this is too simple an answer, but this worked for me and is relatively simple.
Dataframe["new column"] = (dataframe ["old column"] * by float constant)
January1st["weight_lb"] = (January1st["weight_kg"] * 2.2)
Dataframe.head() to see whether it worked.

Related

pandas: convert column with multiple datatypes to int, ignore errors

I have a column with data that needs some massaging. the column may contain strings or floats. some strings are in exponential form. Id like to best try to format all data in this column as a whole number where possible, expanding any exponential notation to integer. So here is an example
df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].astype(int, errors = 'ignore')
The above code does not seem to do a thing. i know i can convert the exponential notation and decimals with simply using the int function, and i would think the above astype would do the same, but it does not. for example, the following code work in python:
int(1170E1), int(1.17E+04), int(11700.0)
> (11700, 11700, 11700)
Any help in solving this would be appreciated. What i'm expecting the output to look like is:
0 '11700'
1 '11700'
2 '11700
3 '24477G'
4 '124601'
5 '247602'
You may check with pd.to_numeric
df.code = pd.to_numeric(df.code,errors='coerce').fillna(df.code)
Out[800]:
0 11700.0
1 11700.0
2 11700.0
3 24477G
4 124601.0
5 247602.0
Name: code, dtype: object
Update
df['code'] = df['code'].astype(object)
s = pd.to_numeric(df['code'],errors='coerce')
df.loc[s.notna(),'code'] = s.dropna().astype(int)
df
Out[829]:
code
0 11700
1 11700
2 11700
3 24477G
4 124601
5 247602
BENY's answer should work, although you potentially leave yourself open to catching exceptions and filling that you don't want to. This will also do the integer conversion you are looking for.
def convert(x):
try:
return str(int(float(x)))
except ValueError:
return x
df = pd.DataFrame({'code': ['1170E1', '1.17E+04', 11700.0, '24477G', '124601', 247602.0]})
df['code'] = df['code'].apply(convert)
outputs
0 11700
1 11700
2 11700
3 24477G
4 124601
5 247602
where each element is a string.
I will be the first to say, I'm not proud of that triple cast.

How to sum an object column in python

I have a data set represented in a Pandas object, see below:
datetime season holiday workingday weather temp atemp humidity windspeed casual registered count
1/1/2011 0:00 1 0 0 1 9.84 14.395 81 0 3 13 16
1/1/2011 1:00 1 0 0 2 9.02 13.635 80 0 8 32 40
1/1/2011 2:00 1 0 0 3 9.02 13.635 80 0 5 27 32
p_type_1 = pd.read_csv("Bike Share Demand.csv")
p_type_1 = (p_type_1 >>
rename(date = X.datetime))
p_type_1.date.str.split(expand=True,)
p_type_1[['Date','Hour']] = p_type_1.date.str.split(" ",expand=True,)
p_type_1['date'] = pd.to_datetime(p_type_1['date'])
p_hour = p_type_1["Hour"]
p_hour
Now I am trying to take the sum of my column Hour that I created (p_hour)
p_hours = p_type_1["Hour"].sum()
p_hours
and get this error:
TypeError: must be str, not int
so I then tried:
p_hours = p_type_1(str["Hour"].sum())
p_hours
and get this error:
TypeError: 'type' object is not subscriptable
i just want the sum, what gives.
Your dataframe datatypes are problem.
Take a closer look at this question:
Convert DataFrame column type from string to datetime, dd/mm/yyyy format
Sample code that should be solution for your problem, i simplified CSV
'''
CSV
datetime,season
1/1/2011 0:00,1
1/1/2011 1:00,1
1/1/2011 2:00,1
'''
import pandas as pd
p_type_1 = pd.read_csv("Bike Share Demand.csv")
p_type_1['datetime'] = p_type_1['datetime'].astype('datetime64[ns]')
p_type_1['hour'] = [val.hour for i, val in p_type_1['datetime'].iteritems()]
print(p_type_1['hour'].sum())
There's quite a bit going on in here that's not correct. So I'll try to break down the issues and offer alternatives.
Here:
p_hours = p_type_1(str["Hour"].sum())
p_hours
What your issue is, is that you are actually trying to do this:
p_hours = p_type_1([str("Hour")].sum())
p_hours
Instead of doing that, your code technically asks for the property named 'Hour' in the string type. That's not what you are trying to do. This crash is unrelated to your core problem, and is just a syntax error.
What the problem actually is here, is that your dataframe column has mixed string and integer types together in the same column. The sum operation will concatenate string, or sum numeric types. In a mixed type, it will fail out.
In order to verify that this is the issue however, we would need to see your actual dataframe, as I have a feeling the one you gave may not be the correct one.
As a proof of concept, I created the following example:
import pandas as pd
dta = [str(x) for x in range(20)]
dta.append(12)
frame = pd.DataFrame.from_dict({
"data": dta})
print(frame["data"].sum())
>>> TypeError: can only concatenate str (not "int") to str
Note that the newer editions of pandas have more clear error messages.

How to Convert Panda Strings Containing " $ - , " Characters to Float

I have a df where some object columns contain $, ,, negative numbers and .:
Date Person Salary Change
0 11/1/15 Mike $100.52 ($20)
1 11/1/15 Bill $300.11 ($300.22)
2 11/1/15 Jake - ($1,100)
3 11/1/15 Jack $411.43 $500
4 11/1/15 Faye NaN $1,000.12
5 11/1/15 Clay $122.00 $100
6 11/1/15 Dick $1,663.33 -
I want to convert them to float, but when I try:
df['Salary'] = df['Salary'].str.replace(',', '').str.replace('$', '').str.replace('-', '').astype(float)
I get an empty ValueError: could not convert string to float:. It seems like it's the - is causing some issues, so is there an elegant way of handling it?
I would use a plain Python function because it is easier to write and test:
def conv(txt):
txt = str(txt)
txt = txt.strip()
neg = txt.endswith(')')
try:
val = float(txt.strip('$()-,').replace(',', ''))
except:
val = np.nan
return -val if neg else val
df['Salary'] = df['Salary'].apply(conv)
Try:
df['Salary'] = df['Salary'].str.replace(',', '').str.replace('$', '').str.replace('-', '0').astype(float)
Your issue is most likely trying to convert blank strings to float. Python does not treat '' as a float. You are better off replacing it with 0.
Or a better solution:
df['Salary'] = df['Salary'].str.replace(',', '').str.replace('$', '').str.replace('-', '0')
df['Salary'] = pd.to_numeric(df['Salary'], errors = 'coerce', downcast = 'float')
If you want to see which rows are causing the issue since pd.to_numeric will coerce will return Nan.

Calculate weighted sum using two columns in pandas dataframe

I am trying to calculate weighted sum using two columns in a python dataframe.
Dataframe structure:
unique_id weight value
1 0.061042375 20.16094523
1 0.3064548 19.50932003
1 0.008310739 18.76469039
1 0.624192086 21.25
2 0.061042375 20.23776924
2 0.3064548 19.63366165
2 0.008310739 18.76299395
2 0.624192086 21.25
.......
Output I desired is:
Weighted sum for each unique_id = sum((weight) * (value))
Example: Weighted sum for unique_id 1 = ( (0.061042375 * 20.16094523) + (0.3064548 * 19.50932003) + (0.008310739 * 18.76469039) + (0.624192086 * 21.25) )
I checked out this answer (Calculate weighted average using a pandas/dataframe) but could not figure out the correct way of applying it to my specific scenario.
This is what I am doing based on the above answer:
#Assume temp_weighted_sum_dataframe is the dataframe stated above
grouped_data = temp_weighted_sum_dataframe.groupby('unique_id') #I think this groups data based on unique_id values
weighted_sum_output = (grouped_data.weight * grouped_data.value).transform("sum") #This should allow me to multiple weight and value for every record within each group and sum it up to one value for that group.
# On above line I am getting the error > TypeError: unsupported operand type(s) for *: 'SeriesGroupBy' and 'SeriesGroupBy'
Any help is appreciated, thanks
The accepted answer in the linked question would indeed solve your problem. However, I would solve it differently with just one groupby:
u = (df.assign(s=df['weight']*df['value'])
.groupby('unique_id')
[['s', 'weight']]
.sum()
)
u['s']/u['weight']
Output:
unique_id
1 20.629427
2 20.672208
dtype: float64
you could do it this way:
df['partial_sum'] = df['weight']*df['value']
out = df.groupby('unique_id')['partial_sum'].agg('sum')
output:
unique_id
1 20.629427
2 20.672208
or..
df['weight'].mul(df['value']).groupby(df['unique_id']).sum()
same output
You may take advantage agg by using agg with # (it is dot)
df.groupby('unique_id')[['weight']].agg(lambda x: x.weight # x.value)
Out[24]:
weight
unique_id
1 20.629427
2 20.672208

How to convert string into datetime?

I'm quite new to Python and I'm encountering a problem.
I have a dataframe where one of the columns is the departure time of flights. These hours are given in the following format : 1100.0, 525.0, 1640.0, etc.
This is a pandas series which I want to transform into a datetime series such as : S = [11.00, 5.25, 16.40,...]
What I have tried already :
Transforming my objects into string :
S = [str(x) for x in S]
Using datetime.strptime :
S = [datetime.strptime(x,'%H%M.%S') for x in S]
But since they are not all the same format it doesn't work
Using parser from dateutil :
S = [parser.parse(x) for x in S]
I got the error :
'Unknown string format'
Using the panda datetime :
S= pd.to_datetime(S)
Doesn't give me the expected result
Thanks for your answers !
Since it's a columns within a dataframe (A series), keep it that way while transforming should work just fine.
S = [1100.0, 525.0, 1640.0]
se = pd.Series(S) # Your column
# se:
0 1100.0
1 525.0
2 1640.0
dtype: float64
setime = se.astype(int).astype(str).apply(lambda x: x[:-2] + ":" + x[-2:])
This transform the floats to correctly formatted strings:
0 11:00
1 5:25
2 16:40
dtype: object
And then you can simply do:
df["your_new_col"] = pd.to_datetime(setime)
How about this?
(Added an if statement since some entries have 4 digits before decimal and some have 3. Added the use case of 125.0 to account for this)
from datetime import datetime
S = [1100.0, 525.0, 1640.0, 125.0]
for x in S:
if str(x).find(".")==3:
x="0"+str(x)
print(datetime.strftime(datetime.strptime(str(x),"%H%M.%S"),"%H:%M:%S"))
You might give it a go as follows:
# Just initialising a state in line with your requirements
st = ["1100.0", "525.0", "1640.0"]
dfObj = pd.DataFrame(st)
# Casting the string column to float
dfObj_num = dfObj[0].astype(float)
# Getting the hour representation out of the number
df1 = dfObj_num.floordiv(100)
# Getting the minutes
df2 = dfObj_num.mod(100)
# Moving the minutes on the right-hand side of the decimal point
df3 = df2.mul(0.01)
# Combining the two dataframes
df4 = df1.add(df3)
# At this point can cast to other types
Result:
0 11.00
1 5.25
2 16.40
You can run this example to verify the steps for yourself, also you can make it into a function. Make slight variations if needed in order to tweak it according to your precise requirements.
Might be useful to go through this article about Pandas Series.
https://www.geeksforgeeks.org/python-pandas-series/
There must be a better way to do this, but this works for me.
df=pd.DataFrame([1100.0, 525.0, 1640.0], columns=['hour'])
df['hour_dt']=((df['hour']/100).apply(str).str.split('.').str[0]+'.'+
df['hour'].apply((lambda x: '{:.2f}'.format(x/100).split('.')[1])).apply(str))
print(df)
hour hour_dt
0 1100.0 11.00
1 525.0 5.25
2 1640.0 16.40

Categories

Resources