price
quantity
high time
10.4
3
2021-11-08 14:26:00-05:00
dataframe = ddg
the datatype for hightime is datetime64[ns, America/New_York]
i want the high time to be only 14:26:00 (getting rid of 2021-11-08 and -05:00) but i got an error when using the code below
ddg['high_time'] = ddg['high_time'].dt.strftime('%H:%M')
I think because it's not the right column name:
# Your code
>>> ddg['high_time'].dt.strftime('%H:%M')
...
KeyError: 'high_time'
# With right column name
>>> ddg['high time'].dt.strftime('%H:%M')
0 14:26
Name: high time, dtype: object
# My dataframe:
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 1 non-null float64
1 quantity 1 non-null int64
2 high time 1 non-null datetime64[ns, America/New_York]
dtypes: datetime64[ns, America/New_York](1), float64(1), int64(1)
memory usage: 152.0 bytes
Related
The data set had "deaths" as object and I need to convert it to the INTEGER. I try to use the formula from another thread and it doesn't seem to work.
******Input:******
data.info()
*****Output:*****
data.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1270 entries, 0 to 1271
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 1270 non-null object
1 leading_cause 1270 non-null object
2 sex 1270 non-null object
3 race_ethnicity 1270 non-null object
4 deaths 1270 non-null object
dtypes: object(5)
memory usage: 59.5+ KB
****Input:****
df = pd.DataFrame({'deaths':['50','30','28']})
print (df)
df = pd.DataFrame({'deaths':['50','30','28']})
print (df)
****Output:****
deaths
0 50
1 30
2 28
****Input:****
print (pd.to_numeric(df.deaths, errors='coerce'))
****Output:****
0 50
1 30
2 28
Name: deaths, dtype: int64
****Input:****
df.deaths = pd.to_numeric(df.deaths, errors='coerce').astype('Int64')
print (df)
****Output:****
deaths
0 50
1 30
2 28
****Input:****
data.info()
****Output:****
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1270 entries, 0 to 1271
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 1270 non-null object
1 leading_cause 1270 non-null object
2 sex 1270 non-null object
3 race_ethnicity 1270 non-null object
4 deaths 1270 non-null object
dtypes: object(5)
memory usage: 59.5+ KB
If you have nulls (np.NaN) in the column it will not convert to int type.
You need to deal with nulls first.
1 Either replace them with an int value:
df.deaths = df.deaths.fillna(0)
df.deaths = df.deaths.astype(int)
2 Or drop null values:
df = df[df.deaths.notna()]
df.deaths = df.deaths.astype(int)
3 Or (preferred) learn to live with them:
# make your other function accept null values
I am trying to convert all the cells value (except date) to float point number, I can successfully convert first 3 column but getting an error on the last one:
Here is my code:
df['Market Cap_'+str(coin)] = df['Market Cap_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Volume_'+str(coin)] = df['Volume_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Open_'+str(coin)] = df['Open_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
df['Close_'+str(coin)] = df['Close_'+str(coin)].str.replace(',','').str.replace('$', '').astype(float)
Here is df.info():
<class 'pandas.core.frame.DataFrame'>
Int64Index: 30 entries, 1 to 30
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Date_ETHEREUM 30 non-null datetime64[ns]
1 Market Cap_ETHEREUM 30 non-null float64
2 Volume_ETHEREUM 30 non-null float64
3 Open_ETHEREUM 30 non-null float64
4 Close_ETHEREUM 30 non-null object
dtypes: datetime64[ns](1), float64(3), object(1)
memory usage: 1.4+ KB
And here is the Error:
AttributeError: Can only use .str accessor with string values!
As you can see the column type is an object, (same as what others were before conversion, but I'm getting an error on this one)
So I have two spreadsheets in csv format that I've been provided with for my masters uni course.
Part of the processing of the data involved the merging of the files, followed by running some reports off the merged content using dates. this I've completed successfully, however....
The current date format I'm led to believe is epoch so for example the first date on the spreadsheet is 43471
So, firstly I ran this code first to check what format it was looking at
pd.read_csv('bookloans_merged.csv')
df.info()
This returned the result
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1958 entries, 0 to 1957
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number 1958 non-null int64
1 Title 1958 non-null object
2 Author 1854 non-null object
3 Genre 1958 non-null object
4 SubGenre 1958 non-null object
5 Publisher 1845 non-null object
6 member_number 1958 non-null int64
7 date_of_loan 1958 non-null int64
8 date_of_return 1958 non-null int64
dtypes: int64(4), object(5)
memory usage: 137.8+ KB
I then ran the following code:
# parsing date values
df = pd.read_csv('bookloans_merged.csv')
df[['date_of_loan','date_of_return']] = df[['date_of_loan','date_of_return']].apply(pd.to_datetime, format='%Y-%m-%d %H:%M:%S.%f')
df.to_csv('bookloans_merged_dates.csv', index=False)
Running this again:
pd.read_csv('bookloans_merged_dates.csv')
df.info()
I get:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1958 entries, 0 to 1957
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Number 1958 non-null int64
1 Title 1958 non-null object
2 Author 1854 non-null object
3 Genre 1958 non-null object
4 SubGenre 1958 non-null object
5 Publisher 1845 non-null object
6 member_number 1958 non-null int64
7 date_of_loan 1958 non-null datetime64[ns]
8 date_of_return 1958 non-null datetime64[ns]
dtypes: datetime64[ns](2), int64(2), object(5)
memory usage: 137.8+ KB
So I can see the date_of_loan and date_of_return is now datetime64
trouble is, all the dates are now showing as 1970-01-01 00:00:00.000043471
How do I get to 01/03/2019 format please?
Thanks
David.
So I managed to get this figured out, with a little help. Here is the answer
from datetime import datetime
df1 = pd.DataFrame(data_frame, columns=['Title','Author','date_of_loan'])
df1['date_of_loan'] = pd.to_datetime(df1['date_of_loan'], unit='d', origin=pd.Timestamp('1900-01-01'))
df1.sort_values('date_of_loan', ascending=True)
from datetime import datetime
excel_date = 43139
d_time = datetime.fromordinal(datetime(1900, 1, 1).toordinal() + excel_date - 2)
t_time = d_time.timetuple()
print(d_time)
print(t_time)
So how I was able to use that premise in my program was like this
from datetime import datetime
df1 = pd.DataFrame(data_frame, columns=['Title','Author','date_of_loan'])
df1['date_of_loan'] = pd.to_datetime(df1['date_of_loan'], unit='d', origin=pd.Timestamp('1900-01-01'))
df1.sort_values('date_of_loan', ascending=True)
I am new to pandas and I am trying to convert Time into DateTime format. Unfortunately I get the time with an added date which is not my intention.
My dataFrame is the following:
After running data['Time'] = pd.to_datetime(data['Time'], format = '%H:%M:%S') I get the following:
What am I doing wrong?
Try this:
data = {'time':['05:05:30','06:04:23','03:40:45','12:05:30'], 'value':[2,3,5,7]}
data = pd.DataFrame(data)
data['TIME']=pd.to_datetime(data['time'],format='%H:%M:%S')
you get TIME in the desired format:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 time 4 non-null object
1 value 4 non-null int64
2 TIME 4 non-null timedelta64[ns]
dtypes: int64(1), object(1), timedelta64[ns](1)
given dataframe d such as this:
index col1
1 a
2 a
3 b
4 b
Create a prefiltered group object with new values:
g = d[prefilter].groupby(['some cols']).apply( somefunc )
index col1
2 c
4 d
Now I want to update df to this:
index col1
1 a
2 c
3 b
4 d
Ive been hacking away with update, ix, filtering, where, etc... I am guessing there is an obvious solution I am not seeing here.
stuff like this is not working:
d[d.index == db.index]['alert_v'] = db['alert_v']
q90 = g.transform( somefunc )
d.ix[ d['alert_v'] >=q90, 'alert_v'] = 1
d.ix[ d['alert_v'] < q90, 'alert_v'] = 0
d['alert_v'] = np.where( d.index==db.index, db['alert_v'], d['alert_v'] )
any help is appreciated
thankyou
--edit--
the two dataframes are in the same form:
one is simply a filtered version of the other, with different values, that I want to update to the original.
ValueError: cannot reindex from a duplicate axis
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 2186 entries, 1984-12-12 13:33:00 to 1939-03-19 22:54:00
Data columns (total 9 columns):
source 2186 non-null object
subject_id 2186 non-null float64
alert_t 2186 non-null object
variable 2186 non-null object
timeindex 2186 non-null datetime64[ns]
alert_v 2105 non-null float64
value 2186 non-null float64
tavg 54 non-null timedelta64[ns]
iqt 61 non-null object
dtypes: datetime64[ns](1), float64(3), object(4), timedelta64[ns](1)None<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1982 entries, 1984-12-12 13:33:00 to 1939-03-19 22:54:00
Data columns (total 9 columns):
source 1982 non-null object
subject_id 1982 non-null float64
alert_t 1982 non-null object
variable 1982 non-null object
timeindex 1982 non-null datetime64[ns]
alert_v 1982 non-null int64
value 1982 non-null float64
tavg 0 non-null timedelta64[ns]
iqt 0 non-null object
dtypes: datetime64[ns](1), float64(2), int64(1), object(4), timedelta64[ns](1)None
you want the df.update() function.
Try something like this:
import pandas as pd
df1 = pd.DataFrame({'Index':[1,2,3,4],'Col1':['A', 'B', 'C', 'D']}).set_index('Index')
df2 = pd.DataFrame({'Index':[2,4],'Col1':['E', 'F']}).set_index('Index')
print df1
Col1
Index
1 A
2 B
3 C
4 D
df1.update(df2)
print df1
Col1
Index
1 A
2 E
3 C
4 F