I'm reading a sql query and using it as dataframe columns.
query = "SELECT count(*) as numRecords, YEARWEEK(date) as weekNum FROM events GROUP BY YEARWEEK(date)"
df = pd.read_sql(query, connection)
date = df['weekNum']
records = df['numRecords']
The date column, which are int64 values, look like this:
...
201850
201851
201852
201901
201902
...
How can I convert the dataframe to a real date value (instead of int64), so when I plot this, the axis do not break because of the year change?
I'm using matplotlib.
All you need to do is use:
pd.to_datetime(date,format='%Y&%W')
Edited:
It gave an error that Day should be mentioned to convert it into datetime. So to tackle that we attach a '-1' to the end (which means Monday... you can add any specific value from 0 to 6 where each represents a day).
Then grab the 'day of the week' using an additional %w in the format and it will work:
pd.to_datetime(date.apply(lambda x: str(x)+'-0'), format="%Y%W-%w")
Remember that to perform any of the above operations, the value in date dataframe or series should be a string object. If not, you can easily convert them using d.astype(str) and then perform all these operations.
Related
I'm trying to filter and extract a specific date with the month 2 inside my SQLite database using python and calculating their average monthly prices. This is what I've got so far...
The CurrentMonth variable currently holds the value 02. I keep receiving invalid syntax errors. My database is here:
Your syntax is invalid in SQLite. I think that you mean:
select * from stock_stats where strftime('%m', date) + 0 = ?
Rationale: strftime('%m', date) extracts the month part from the date column, and returns a string like '02'. You can just add 0 to force the conversion to a numeric value.
Note that:
you should also filter on the year part, to avoid mixing data from different years
a more efficient solution would be to pass 2 parameters, that define the start and end of the date range; this would avoid the need to use date functions on the date column.
date >= ? and date < ?
So, the thing is I need to create a crosstable from string data. I mean like in excel, if You put some string data into crosstable it is going to be automatically transformed into counted values per the other factor. For instance, I have column 'A' which contains application numbers and column 'B' which contains dates. I need to show how many applications were placed per each day. Classic crosstable returns me an error.
data.columns = [['applicationnumber', 'date', 'param1', 'param2', 'param3']] #mostly string values
Examples of input data:
applicationnumber = "AAA12345678"
date = 'YYYY-MM-DD'
Is this what you are looking for:
df = pd.DataFrame([['app1', '01/01/2019'],
['app2', '01/02/2019'],
['app3', '01/02/2019'],
['app4', '01/02/2019'],
['app5', '01/04/2019'],
['app6', '01/04/2019']],
columns=['app.no','date'])
print(pd.pivot_table(df, values='app.no', index='date', aggfunc=np.size))
Output:
app.no
date
01/01/2019 1
01/02/2019 3
01/04/2019 2
I have a large (+10m rows) dataframe with three columns: sales dates (dtype: datetime64[ns]), customer names and sales per customer. Sales dates include day, month and year in the form yyyy-mm-dd (i.e. 2019-04-19). I discovered the pandas to_period function and like to use the period[A-MAR] dtype. As the business year (ending in March) is different from the calendar year that is exactly what I was looking for. With the to_period function I am able to assign the respective sales dates to the correct business year while avoiding to create new columns with additional information.
I convert the date column as follows:
df_input['Date'] = pd.DatetimeIndex(df_input['Date']).to_period("A-MAR")
Now a peculiar issue arrises when I use pivot_table to aggregate the data and set margins=True. The aggfunc returns the correct values in the output table. However, the results in the last row (total value, created by the margins) are wrong as NaN is shown (or in my case a 0 as I set fill_value = 0). The function I use:
df_output = df_input.pivot_table(index="Customer",
columns = "Date",
values = "Sales",
aggfunc ={"Sales": np.sum},
fill_value = 0,
margins= True)
When I do not convert the dates to a period but use a simple year (integer) instead, the margins are calculated correctly and no NaN appears in the last row of the pivot output table.
I searched all over the internet but could not find a solution that was working. I would like to keep working with the period datatype and just need the margins to be calculated correctly. I hope someone can help me out here. Thank you!
I have an API in Python using sqlalchemy.
I have a string which represents a date in ISO format. I convert it using datetime.strptime like so: datetime.strptime(ToActionDateTime, '%Y-%m-%dZ').
Now I have to compare the value of a table's column which is a timestamp to that date.
After converting the initial ISO string, an example result looks like this 2018-12-06 00:00:00. I have to compare it for equality depending on date and not time but I can't manage to get it right. Any help would be appreciated.
Sample Python code:
ToActionDateTimeObj = datetime.strptime(ToActionDateTime, '%Y-%m-%dZ')
query = query.filter(db.c.Audit.ActionDateTime <= ToActionDateTimeObj)
Edit:
I have also tried to implement cast to both parts of the equation but it does not work either. I can't manage to get the right result when the selected date matches the date of the timestamp.
from sqlalchemy import Date, cast
ToActionDateTimeObj = datetime.strptime(ToActionDateTime, '%Y-%m-%dZ')
query = query.filter(cast(db.c.Audit.ActionDateTime, Date) <= cast(ToActionDateTimeObj, Date))
Since Oracle DATE datatype actually stores both date and time, a cast to DATE will not rid the value of its time portion, as it would in most other DBMS. Instead the function TRUNC(date [, fmt]) can be used to reduce a value to its date portion only. In its single argument form it truncates to the nearest day, or in other words uses 'DD' as the default model:
ToActionDateObj = datetime.strptime(ToActionDateTime, '%Y-%m-%dZ').date()
...
query = query.filter(func.trunc(db.c.Audit.ActionDateTime) <= ToActionDateObj)
If using the 2-argument form, then the precision specifier for day precision is either 'DDD', 'DD', or 'J'.
But this solution hides the column ActionDateTime from possible indexes. To make the query index friendly increment the date ToActionDateObj by one day and use less than comparison:
ToActionDateObj = datetime.strptime(ToActionDateTime, '%Y-%m-%dZ').date()
ToActionDateObj += timedelta(days=1)
...
query = query.filter(db.c.Audit.ActionDateTime < ToActionDateObj)
I have a panda table with a column called "date_of_work". This column contains date objects in the following format MM/DD/YYYY. For example 9/19/2016, or 12/5/2016
I'm trying to create a new column that assigns a day from a value between 1 and 365 so I can create a scatter plot with dates on the x axis. I created this function:
def converttoday(datex):
datex=str(datex)
newdate=datex.split('/')
day1=int(newdate[0])*30
day2=int(newdate[1])
finaldate=day1+day2
return finaldate
It ignores the year because I don't care about that (more focused on seasonality). Any idea how to convert this? I'm getting an error when attempting this.
Any help appreciated!
Try this:
df['dayofyear'] = pd.to_datetime(df['date_of_work']).dayofyear