I want to make a time-series analysis with python, but i can't convert the data into datetime because the data is still in string (MM-DD).
Period
Jan-10
Feb-10
Mar-10
Apr-10
etc
Is there any other way to convert this kind of data into datetime object?
There is no need to use the datetime module. Pandas can convert strings to date when reading the data from the csv file or you can use the to_datetime method after the data is loaded.
import pandas as pd
df = pd.read_csv('file.csv', parse_dates=['date'], infer_datetime_format=True)
If you are using a non-standard format, then you will get better results if you specify a format string. Here, it looks like the format string is '%b-%y', which is the abbreviated month name and the two-digit year without the century.
import pandas as pd
df = pd.read_csv('file.csv')
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
Related
I have a dataset in CSV which first column are dates (not datetimes, just dates).
The CSV is like this:
date,text
2005-01-01,"FOO-BAR-1"
2005-01-02,"FOO-BAR-2"
If I do this:
df = pd.read_csv('mycsv.csv')
I get:
print(df.dtypes)
date object
text object
dtype: object
How can I get column date by datetime.date?
Use:
df = pd.read_csv('mycsv.csv', parse_dates=[0])
This way the initial column will be of native pandasonic datetime type,
which is used in Pandas much more often than pythonic datetime.date.
It is a more natural approach than conversion of the column in question
after you read the DataFrame.
You can use pd.to_datetime function available in pandas.
For example in a dataset about scores of a cricket match. I can convert the Matchdate column to datatime object by applying pd.to_datetime function based on the data time format given in the data. ( Refer https://www.w3schools.com/python/python_datetime.asp to assign commands based on your data time formating )
cricket["MatchDate"]=pd.to_datetime(cricket["MatchDate"], format= "%m-%d-%Y")
This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Closed 1 year ago.
import pandas as pd
import sys
df = pd.read_csv(sys.stdin, sep='\t', parse_dates=['Date'], index_col=0)
df.to_csv(sys.stdout, sep='\t')
Date Open
2020/06/15 182.809924
2021/06/14 257.899994
I got the following output with the input shown above.
Date Open
2020-06-15 182.809924
2021-06-14 257.899994
The date format is changed. Is there a way to maintain the date format automatically? (For example, if the input is in YYYY/MM/DD format, the output should be in YYYY-MM-DD. If the input is in YYYY-MM-DD, the output should in YYYY-MM-DD, etc.)
I prefer a way that I don't have to manually test the data format. It is best if there is an automatical way to maintain the date format, no matter what the particular date format is.
You can specify the date_format argument in to_csv:
df.to_csv(sys.stdout, sep='\t', date_format="%Y/%m/%d")
Keep the dates as strings and parse them into an extra column if you need to operate on them as dates?
df = pd.read_csv(sys.stdin, sep='\t', index_col=0)
df['DateParsed'] = pd.to_datetime(df["Date"])
Please have look at both these images, especially Dates from Sno 32. The month column and day column are not properly converted . How can I make this correct? I have already referred to questions regarding timeseries but haven't found any answer to this kind of issue.
There is problem pandas by default parse months first if possible.
You can specify the format as DD/MM/YY
df['date'] = pd.to_datetime(df['date'], format='%d/%m/%y')
Or try using dayfirst=True parameter:
df['date'] = pd.to_datetime(df['date'], dayfirst=True)
Or if create DataFrame from file use parse_dates and dayfirst=True parameters:
df = pd.read_csv(file, parse_dates=['date'], dayfirst=True)
I am working on a data frame uploaded from CSV, I have tried changing the data typed on the CSV file and to save it but it doesn't let me save it for some reason, and therefore when I upload it to Pandas the date and time columns appear as object.
I have tried a few ways to transform them to datetime but without a lot of success:
1) df['COLUMN'] = pd.to_datetime(df['COLUMN'].str.strip(), format='%m/%d/%Y')
gives me the error:
AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
2) Defining dtypes at the beginning and then using it in the read_csv command - gave me an error as well since it does not accept datetime but only string/int.
Some of the columns I want to have a datetime format of date, such as: 2019/1/1, and some of time: 20:00:00
Do you know of an effective way of transforming those datatype object columns to either date or time?
Based on the discussion, I downloaded the data set from the link you provided and read it through pandas. I took one column and a part of it; which has the date and used the pandas data-time module as you did. By doing so I can use the script you mentioned.
#import necessary library
import numpy as np
import pandas as pd
#load the data into csv
data = pd.read_csv("NYPD_Complaint_Data_Historic.csv")
#take one column which contains the datatime as an example
dte = data['CMPLNT_FR_DT']
# =============================================================================
# I will try to take a part of the data from dte which contains the
# date time and convert it to date time
# =============================================================================
from pandas import datetime
test_data = dte[0:10]
df1 = pd.DataFrame(test_data)
df1['new_col'] = pd.to_datetime(df1['CMPLNT_FR_DT'])
df1['year'] = [i.year for i in df1['new_col']]
df1['month'] = [i.month for i in df1['new_col']]
df1['day'] = [i.day for i in df1['new_col']]
#The way you used to convert the data also works
df1['COLUMN'] = pd.to_datetime(df1['CMPLNT_FR_DT'].str.strip(), format='%m/%d/%Y')
It might be the way you get the data. You can see the output from this attached. As the result can be stored in dataframe it won't be a problem to save in any format. Please let me know if I understood correctly and it helped you. The month is not shown in the image, but you can get it.
I have a CSV file that has time represented in a format I'm not familiar with:
I am trying to compute the average time in all of those rows (efforts shown below).
Any sort of feedback will be appreciated.
import pandas as pd
import pandas as np
from datetime import datetime
flyer = pd.read_csv("./myfile.csv",parse_dates = ['timestamp'])
flyer.dropna(axis=0, how='any', thresh=None, subset=None, inplace=True)
pd.set_option('display.max_rows', 20)
flyer['timestamp'] = pd.to_datetime(flyer['timestamp'],
infer_datetime_format=True)
p = flyer.loc[:,'timestamp'].mean()
print(flyer['timestamp'].mean())
The above is correct, but if you're new it might not be as clear what 0x is feeding you.
import pandas as pd
# turn your csv into a pandas dataframe
df = pd.read_csv('your/file/location.csv')
The timestamp column might be interpreted as a bunch of strings, you won't be able to do the math you want on strings.
# this forces the column's data into timestamp variables
df['timestamp'] = pd.to_datetime(df['timestamp'], infer_datetime_format=True)
# now for your answer, get the average of the timestamp column
print(df['timestamp'].mean())
When you read the csv with pandas, add parse_dates = ['timestamp'] to the pd.read_csv() function call and it will read in that column correctly. The T in the timestamp field is a common way to separate the date and the time.
The -4:00 indicates time zone information, which in this case means -4:00 hours in comparison to UTC time.
As for calculating the mean time, that can get a bit tricky, but here's one solution for after you've imported the csv.
from datetime import datetime
pd.to_datetime(datetime.fromtimestamp(pd.to_timedelta(df['timestamp'].mean().total_seconds())))
This is converting the field to a datetime object in order to calculate the mean, then getting the total seconds (EPOCH time) and using that to convert back into a pandas datetime series.