How to prevent Pandas to_dict() from converting timestamps to string?

How to prevent Pandas to_dict() from converting timestamps to string? - python

I have a dataframe with a date field which appear to be represented as unix timestamps. When i call df.to_dict() on it the dates are getting converted to a string like this yyyy-mm-dd .... how can I prevent this from happening?
I'm using the code to return a JSON in my FastAPI app ...
df_results = pd.read_sql_query(sql_query_str, _engine)
return_object["results"] = df_results.to_dict(orient='records')
# outputs "date": 2021-12-31" in the json
return_object["results"] = json.loads(df_results.to_json(orient='records'))
# outputs "date": 1640908800000 in the json

You can specify the data type before using .to_dict(). Calling it as an integer should keep the UNIX timestamp e.g
df.astype(int).to_dict()

Related

How can I add a zero to dates in a string so all months are 2 characters? [duplicate]

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.

Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.

you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')

I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!

Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.

#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)

If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999

The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format

Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)

df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

Converting a string to Timestamp with Pyspark

I am currently attempting to convert a column "datetime" which has values that are dates/times in string form, and I want to convert the column such that all of the strings are converted to timestamps.
The date/time strings are of the form "10/11/2015 0:41", and I'd like to convert the string to a timestamp of form YYYY-MM-DD HH:MM:SS. At first I attempted to cast the column to timestamp in the following way:
df=df.withColumn("datetime", df["datetime"].cast("timestamp"))
Though when I did so, I received null for every value, which lead me to believe that the input dates needed to be formatted somehow. I have looked into numerous other possible remedies such as to_timestamp(), though this also gives the same null results for all of the values. How can a string of this format be converted into a timestamp?
Any insights or guidance are greatly appreciated.

Try:
import datetime
def to_timestamp(date_string):
return datetime.datetime.strptime(date_string, "%m/%d/%Y %H:%M")
df = df.withColumn("datetime", to_timestamp(df.datetime))

You can use the to_timestamp function. See Datetime Patterns for valid date and time format patterns.
df = df.withColumn('datetime', F.to_timestamp('datetime', 'M/d/y H:m'))
df.show(truncate=False)

You were doing it in the right way, except you missed to add the format ofstring type which is in this case MM/dd/yyyy HH:mm. Here M is used for months and m is used to detect minutes. Having said that, see the code below for reference -
df = spark.createDataFrame([('10/11/2015 0:41',), ('10/11/2013 10:30',), ('12/01/2016 15:56',)], ("String_Timestamp", ))
from pyspark.sql.functions import *
df.withColumn("Timestamp_Format", to_timestamp(col("String_Timestamp"), "MM/dd/yyyy HH:mm")).show(truncate=False)
+----------------+-------------------+
|String_Timestamp| Timestamp_Format|
+----------------+-------------------+
| 10/11/2015 0:41|2015-10-11 00:41:00|
|10/11/2013 10:30|2013-10-11 10:30:00|
|12/01/2016 15:56|2016-12-01 15:56:00|
+----------------+-------------------+

Pandas objet-list datetime serie to datetime index

I'm using the fields parameter on the python-elasticsearch api to retrieve some data from elasticsearch trying to parse the #timestamp in iso format, for use in a pandas dataframe.
fields = \
[{
"field": "#timestamp",
"format": "strict_date_optional_time"
}]
By default elasticsearch return the results on array-list format as seen in doc:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-fields.html
The fields response always returns an array of values for each field, even when there is a single value in the _source.
Due to this the resulting dataframe contains a object-list serie that can't be parsed to a datetime serie by conventional methods.
Name: fields.#timestamp, Length: 18707, dtype: object
0 [2021-11-04T01:30:00.263Z]
1 [2021-11-04T01:30:00.385Z]
2 [2021-11-04T01:30:00.406Z]
3 [2021-11-04T01:30:00.996Z]
4 [2021-11-04T01:30:01.001Z]
...
8368 [2021-11-04T02:00:00.846Z]
8369 [2021-11-04T02:00:00.894Z]
8370 [2021-11-04T02:00:00.895Z]
8371 [2021-11-04T02:00:00.984Z]
8372 [2021-11-04T02:00:00.988Z]
When trying to parse the serie to datetime serie:
pd.to_datetime(["fields.#timestamp"])
That result in:
TypeError: <class 'list'> is not convertible to datetime
My use case requires lot of datetime formats and fields parameter suits very well querying multiple in formats, but the on listed object datetime string difficult the things.

The issue here is that items of fields.#timestamp are actually lists.
So you could do :
fields['timestamp'] = fields['timestamp'].str[0]
to extract the date from the list,
and then use pd.to_datetime :
fields['timestamp'] = pd.to_datetime(fields['timestamp'])

MongoDB/Python - Date in collection (to use for query)

I just started using mongoDB (Version 3.6.8) today, and I like it.
I was reading that it should be possible to have a date object directly in the database, but I can't get it to work.
Also I was wondering if it is the best solution or if I should just store my dates as "Epoch millis" instead?
I am trying to use use the $dateFromString keyword which should work but i receive this error:
bson.errors.InvalidDocument: key '$dateFromString' must not start with '$'
My code looks like this:
from datetime import date
import pymongo
dbcli = pymongo.MongoClient('mongodb://192.168.1.8:27017')
db = dbcli['washbase']
col = db['machine']
def conv(dato):
return {
'$dateFromString': {
'dateString': dato,
'format': '%Y-%m-%d',
'timezone':'Europe/Copenhagen',
}
}
today = date.today().isoformat()
data = {
'day': conv(today),
'time':12,
'room':'2B',
}
col.insert_one(data)
The reason why I need something like a date-object in the database is because I want to do a conditional query on the data, so that the database only sends the data i require. So i expect to do something like this.
result = col.find(
{
'day' : {
'$gt' : {
'$date' : '2020-01-01'
}
}
}
)
for x in results:
print(x)
But when I do this the app prints nothing.

The $dateFromString is an operator for MongoDB aggregations. An aggregation is a powerful way to create complex queries in MongoDB. Hence, this might not be what you need.
I would recommend storing the dates in the normal format. So your code should look something like this:
from datetime import date
import pymongo
dbcli = pymongo.MongoClient('mongodb://192.168.1.8:27017')
db = dbcli['washbase']
col = db['machine']
today = date.today().isoformat()
data = {
'day': today,
'time':12,
'room':'2B',
}
col.insert_one(data)
If you are concerned about timezones, MongoDB stores each date in UTC by default, converting whatever timezone is specified in your date to UTC. When reading the dates, you can then convert them to whatever timezone you need.
EDIT:
When writing your query, try using an actual date object. This converts the query date to an actual ISO date that the DB can understand.
col.find({'day': {'$gte': ISODate(date.today) }})
If you're trying to find entries that fall within a date range, you can do something like:
col.find({'day': {'$gte': ISODate(date.today), '$lte': ISODate(date.today + 24 hours) }})

Converting date between DD/MM/YYYY and YYYY-MM-DD?

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.

Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.

you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')

I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!

Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.

#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)

If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999

The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format

Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)

df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to prevent Pandas to_dict() from converting timestamps to string? - python

You can specify the data type before using .to_dict(). Calling it as an integer should keep the UNIX timestamp e.g df.astype(int).to_dict()

Related

How can I add a zero to dates in a string so all months are 2 characters? [duplicate]

Converting a string to Timestamp with Pyspark

Pandas objet-list datetime serie to datetime index

MongoDB/Python - Date in collection (to use for query)

Converting date between DD/MM/YYYY and YYYY-MM-DD?

Categories

Resources