I have a table tbl with column datetime in KDB's timestamp format which looks like 2014.11.22D17:43:40.123456789. I would like to cast this into a Python datetime format like this 2014-11-22 17:43:40.123456789 but I am having trouble using the update command.
I understand that I can do this to cast the timestamp:
q)`year`dd`mm`hh`uu`ss$2015.10.28D03:55:58 // this gives 2015 28 10 3 55 58i
And I understand I can create a new column datetime2 from datetime by reading it as a string then converting it into integer in this case:
q)update datetime2:"I"$string datetime from tbl
But I am having difficulty casting and updating at the same time:
q)update datetime2:`year-`dd-`mm `hh:`uu:`ss$datetime from tbl
The error I got is:
evaluation error:
length
[0] update datetime2:`year-`dd-`mm `hh:`uu:`ss$datetime from tbl
^
Can anyone point me in the right direction? Thank you.
Kdb doesn't have an alternative method of displaying timestamps, the only way to get what you're looking for is to string the timestamps and manipulate the individual characters. Something like:
q)tbl:([]datetime:5#2014.11.22D17:43:40.123456789)
q)update{" "sv'(ssr[;".";"-"];::)#'/:"D"vs'string x}datetime from tbl
datetime
-------------------------------
"2014-11-22 17:43:40.123456789"
"2014-11-22 17:43:40.123456789"
"2014-11-22 17:43:40.123456789"
"2014-11-22 17:43:40.123456789"
"2014-11-22 17:43:40.123456789"
This is purely cosmetic and these timestamps would be unusable in a timeseries sense, however maybe they suit your purpose.
What problem are you ultimately trying to solve? If you're trying to pass the data to python you might be better off working with the underlying numerical values and converting the numerical value back to timestamp on the python side.
You can use the qdate library from https://code.kx.com/analyst/libraries/date-parser/
q).qdate.print["%Y-%m-%d %H:%M:%S.%N"; 2014.11.22D17:43:40.123456789]
"2014-11-22 17:43:40.123456789"
if you're planning to read the data into a python file, then it would be best to cast the dates to a long in your update statement, which represent the number of nanoseconds after epoch:
update datetime2:`long$datetime from tbl
and then convert the longs back into python datetimes in your python file, otherwise you could convert it to a string in the table then alter the individual characters to put it in python datetime form (this would only make sense if the data is staying inside kdb)
another way to go could be to use the PyQ or embedpy libraries, these load python and q interpreters into the same process and support type conversion:
https://pythonhosted.org/pyq/
https://github.com/KxSystems/embedPy
Related
This is my data :
dates = np.arange("2018-01-01", "2021-12-31", dtype="datetime64[D]")
I now want to convert from :
"2018-01-01" -> "Jan-01-2018" ["Monthname-day-year"] format
How to i do this ?
Is it possible to initialize this in the way we want to convert ?
Can i use something like:
for i in dates:
i = i.replace(i.month,i.strftime("%b"))
You can try this:
from datetime import datetime
dates = np.arange("2018-01-01", "2021-12-31", dtype="datetime64[D]")
result_dates = []
for date in dates.astype(datetime):
result_dates.append(date.strftime("%b-%d-%Y"))
But you will need to convert result dates as shown in the code
I feel compelled to elaborate on Silvio Mayolo's very relevant but ostensibly ignored comment above. Python stores a timestamp as structure (see How does Python store datetime internally? for more information) Hence, the DateTime does not as such have a 'format'. A format only becomes necessary when you want to print the date because you must first convert the timestamp to a string. Thus, you do NOT need to initialise any format. You only need to declare a format when the time comes to print the timestamp.
While you CAN store the date as a string in your dataframe index in a specific format, you CANNOT perform time related functions on it without first converting the string back to a time variable. ie current_time.hour will return an integer with the current hour if current_time is a datetime variable but will crash if it is a string formatted as a timestamp (such as "2023-01-15 17:23").
This is important to understand, because eventually you will need to manipulate the variables and need to understand whether you are working with a time or a string.
I am using Python 2--I am behind moving over my code--so perhaps this issue has gone away.
Using pandas, I can create a datetime like this:
import pandas as pd
big_date= pd.datetime(9999,12,31)
print big_date
9999-12-31 00:00:00
big_date2 = pd.to_datetime(big_date)
. . .
Out of bounds nanosecond timestamp: 9999-12-31 00:00:00
I understand the reason for the error in that there are obviously too many nanoseconds in a date that big. I also know that big_date2 = pd.to_datetime(big_date, errors='ignore') would work. However, in my situation, I have a column of what are supposed to be dates (read from SQL server) and I do indeed want it to change invalid data/dates to NaT. In effect, I was using pd.to_datetime as a validity check. To Pandas, on the one hand, 9999-12-31 is a valid date, and on the other, it's not. That means I can't use it and have had to come up with something else.
I've played around with the arguments in pandas to_datetime and not been able to solve this.
I've looked at other questions/problems of this nature, and not found an answer.
I have a similar issue and was able to find a solution.
I have a pandas dataframe with one column that contains a datetime (retrieved from a database table where the column was a DateTime2 data type), but I need to be able to represents date that are further in the future than the Timestamp.max value.
Fortunately, I didn't need to worry about the time part of the datetime column - it was actually always 00:00:00 (I didn't create the database design and, yes, it probably should have been a Date data type and not a DateTime2 data type). So I was able to get round the issue by converting the pandas dataframe column to just a date type. For example:
for i, row in df.iterrows():
df.set_value(i, 'DateColumn', datetime.datetime(9999, 12, 31).date())
sets all of the values in the column to the date 9999-12-31 and you don't receive any errors when using this column anymore.
So, if you can afford to lose the time part of the date you are trying to use you can work round the limitation of the datetime values in the dataframe by converting to a date.
I am using pandas to import data dfST = read_csv( ... , parse_dates={'timestamp':[date]})
In my csv, date is in the format YYY/MM/DD, which is all I need - there is no time. I have several data sets that I need to compare for membership. When I convert theses 'timestamp' to a string, sometimes I get something like this:
'1977-07-31T00:00:00.000000000Z'
which I understand is a datetime including milliseconds and a timezone. Is there any way to suppress the addition of the extraneous time on import? If not, I need to exclude it somehow.
dfST.timestamp[1]
Out[138]: Timestamp('1977-07-31 00:00:00')
I have tried formatting it, which seemed to work until I called the formatted values:
dfSTdate=pd.to_datetime(dfST.timestamp, format="%Y-%m-%d")
dfSTdate.head()
Out[123]:
0 1977-07-31
1 1977-07-31
Name: timestamp, dtype: datetime64[ns]
But no... when I test the value of this I also get the time:
dfSTdate[1]
Out[124]: Timestamp('1977-07-31 00:00:00')
When I convert this to an array, the time is included with the millisecond and the timezone, which really messes my comparisons up.
test97=np.array(dfSTdate)
test97[1]
Out[136]: numpy.datetime64('1977-07-30T20:00:00.000000000-0400')
How can I get rid of the time?!?
Ultimately I wish to compare membership among data sets using numpy.in1d with date as a string ('YYYY-MM-DD') as one part of the comparison
This is due to the way datetime values are stored in pandas: using the numpy datetime64[ns] dtype. So datetime values are always stored at nanosecond resolution. Even if you only have a date, this will be converted to a timestamp with a zero time of nanosecond resolution. This is just due to the implementation in pandas.
The issues you have with printing the values and having unexpected results, is just because how these objects are printed in the python console (their representation), not their actual value.
If you print a single values, you get a the Timestamp representation of pandas:
Timestamp('1977-07-31 00:00:00')
So you get the seconds here as well, just because this is the default representation.
If you convert it to an array, and then print it, you get the standard numpy representation:
numpy.datetime64('1977-07-30T20:00:00.000000000-0400')
This is indeed a very misleading representation. Because numpy will, just for printing it in the console, convert it to your local timezone. But this doesn't change your actual value, it's just weird printing.
That is the background, now to answer your question, how do I get rid of the time?
That depends on your goal. Do you really want to convert it to a string? Or do you just don't like the repr?
if you just want to work with the datetime values, you don't need to get rid of it.
if you want to convert it to strings, you can apply strfitme (df['timestamp'].apply(lambda x: x.strftime('%Y-%m-%d'))). Or if it is to write it as strings to csv, use the date_format keyword in to_csv
if you really want a 'date', you can use the datetime.date type (standard python type) in a DataFrame column. You can convert your existing column to this with with: pd.DatetimeIndex(dfST['timestamp']).date. But personally I don't think this has many advantages.
I want to take some action based on comparing two dates. Date 1 is stored in a python variable. Date 2 is retrieved from the database in the select statement. For example I want to retrieve some records from the database where the associated date in the record (in form of the timestamp) is later than the date defined by the python variable. Preferably, I would like the comparison to be in readable date format rather than in timestamps.
I am a beginner with python.
----edit -----
Sorry for being ambiguous. Here's what I am trying to do:
import MySQLdb as mdb
from datetime import datetime
from datetime import date
import time
conn = mdb.connect('localhost','root','root','my_db')
cur = conn.cursor()
right_now = date.today()// python date
this is the part which I want to figure out
The database has a table which has timestamp. I want to compare that timestamp with this date and then retrieve records based on that comparison. For example I want to retrieve all records for which timestamp is above this date
cur.execute("SELECT created from node WHERE timestamp > right_now")
results = cur.fetchall()
for row in results:
print row
first of all, I guess Date 1 (python variable) is a datetime object. http://docs.python.org/2/library/datetime.html
As far as I have used it, MySQLdb gives you results in a (python) datetime object if the sql type was datetime.
So actually you have nothing to do, you can use python datetime comparison methods with date 1 and date 2.
I am a little bit confused about "comparison to be in readable date format rather than in timestamps". I mean the timestamps is readable enough, right?
If Date 1 is timestamps data, then you just simply do comparison. If not, then convert it to timestamps or convert the date in database to date type, both way works.
If you are asking how to write the code to do the comparison, you would use either '_mysql' or sqlalchemy to help you. The detailed syntax can be found at any where.
Anyway, the question itself is not clear enough, so the answer is blur, too.
I am having design problems with date storage/retrieval using Python and SQLite.
I understand that a SQLite date column stores dates as text in ISO format
(ie. '2010-05-25'). So when I display a British date (eg. on a web-page) I
convert the date using
datetime.datetime.strptime(mydate,'%Y-%m-%d').strftime('%d/%m/%Y')
However, when it comes to writing-back data to the table, SQLite is very
forgiving and is quite happy to store '25/06/2003' in a date field, but this
is not ideal because
I could be left with a mixture of date formats in the same
column,
SQLite's date functions only work with ISO format.
Therefore I need to convert the date string back to ISO format before
committing, but then I would need a generic function which checks data about to
be written in all date fields and converts to ISO if necessary. That sounds a
bit tedious to me, but maybe it is inevitable.
Are there simpler solutions? Would it be easier to change the date field to a
10-character field and store 'dd/mm/yyyy' throughout the table? This way no
conversion is required when reading or writing from the table, and I could use
datetime() functions if I needed to perform any date-arithmetic.
How have other developers overcome this problem? Any help would be appreciated.
For the record, I am using SQLite3 with Python 3.1.
If you set detect_types=sqlite3.PARSE_DECLTYPES in sqlite3.connect,
then the connection will try to convert sqlite data types to Python data types
when you draw data out of the database.
This is a very good thing since its much nicer to work with datetime objects than
random date-like strings which you then have to parse with
datetime.datetime.strptime or dateutil.parser.parse.
Unfortunately, using detect_types does not stop sqlite from accepting
strings as DATE data, but you will get an error when you try to
draw the data out of the database (if it was inserted in some format other than YYYY-MM-DD)
because the connection will fail to convert it to a datetime.date object:
conn=sqlite3.connect(':memory:',detect_types=sqlite3.PARSE_DECLTYPES)
cur=conn.cursor()
cur.execute('CREATE TABLE foo(bar DATE)')
# Unfortunately, this is still accepted by sqlite
cur.execute("INSERT INTO foo(bar) VALUES (?)",('25/06/2003',))
# But you won't be able to draw the data out later because parsing will fail
try:
cur.execute("SELECT * FROM foo")
except ValueError as err:
print(err)
# invalid literal for int() with base 10: '25/06/2003'
conn.rollback()
But at least the error will alert you to the fact that you've inserted
a string for a DATE when you really should be inserting datetime.date objects:
cur.execute("INSERT INTO foo(bar) VALUES (?)",(datetime.date(2003,6,25),))
cur.execute("SELECT ALL * FROM foo")
data=cur.fetchall()
data=zip(*data)[0]
print(data)
# (datetime.date(2003, 6, 25),)
You may also insert strings as DATE data as long as you use the YYYY-MM-DD format. Notice that although you inserted a string, it comes back out as a datetime.date object:
cur.execute("INSERT INTO foo(bar) VALUES (?)",('2003-06-25',))
cur.execute("SELECT ALL * FROM foo")
data=cur.fetchall()
data=zip(*data)[0]
print(data)
# (datetime.date(2003, 6, 25), datetime.date(2003, 6, 25))
So if you are disciplined about inserting only datetime.date objects into the DATE field, then you'll have no problems later when drawing the data out.
If your users are input-ing date data in various formats, check out dateutil.parser.parse. It may be able to help you convert those various strings into datetime.datetime objects.
Note that SQLite itself does not have a native date/time type. As #unutbu answered, you can make the pysqlite/sqlite3 module try to guess (and note that it really is a guess) which columns/values are dates/times. SQL expressions will easily confuse it.
SQLite does have a variety of date time functions and can work with various strings, numbers in both unixepoch and julian format, and can do transformations. See the documentation:
http://www.sqlite.org/lang_datefunc.html
You may find it more convenient to get SQLite to do the date/time work you need instead of importing the values into Python and using Python libraries to do it. Note that you can put constraints in the SQL table definition for example requiring that string value be present, be a certain length etc.