First off, I am very new to all of python. I am now trying to figure out how to replace a time string in a certain column (csv) when that time is greater than the current time.
The script I am building from is relying on petl, so that is what i am using. First, the source csv is opened with petl as table1. It contains multiple columns, one of which is " End time". In this column I would like to replace that time with #time? (for HTML formatting later), only if it is greater than the current time.
the time has the format like "12:15". However, I do not see any change when running the line with >, yet with < all values in the column change.
The line I am struggling with:
current=time.localtime()
table2= petl.convert(table1, ' End time', lambda v, row: '#'+v+'?' if time.strptime(v, '%H:%M') > current else v, pass_row=True)
I would also like to know how I can print or see what time.strptime is using as values, is this possible?
Any ideas are highly appreciated!
If you only pass hour and minute to time.strptime, strptime automatically fills in the missing values for year, month, day with 1900, 1, 1 - so of course that's always less than time.localtime().
If your table contains times in 24 hour format, you can directly compare the time strings from your table with the localtime in the narrower sense (just the TIME part).
To achieve this, use time formatting like so:
current = time.strftime('%H:%M',time.localtime())
It is often helpful to start a python interpreter in a shell and play with the intermediate steps of your computations. Just type the variable, and you will see what it evaluates to:
>>> t2 = time.strptime('12:15', '%H:%M')
>>> t2
time.struct_time(tm_year=1900, tm_mon=1, tm_mday=1, tm_hour=12, tm_min=15, tm_sec=0, tm_wday=0, tm_yday=1, tm_isdst=-1)
Related
I have dataframe comes that comes from sql.
Date columns are not the ones that I can process. So I need to change the format like "%d.%m.%t %H:%m".when I try to change the format with:
df3["ACT_START_DATE"]=pd.to_datetime(df3["ACT_START_DATE"])
df3["ACT_START_DATE"]=df3["ACT_START_DATE"].dt.strftime("%d.%m.%Y %H:%m")
Result
1- why do the HOUR changes ?
2- Beside hour changing, is there another way to do for all date columns at once ?
Thank you in advance.
The "minute" part is changing because your format is wrong
Notice the format: %d.%m.%Y %H:%m
%m is actually for MONTH
So instead of exporting for minute, you're exporting the month value there.
Change it to %d.%m.%Y %H:%M:%S and you'll be fine
(Notice the difference is capital M vs small m)
As for changing multiple columns, you can simply use
my_df[['column1','column2']] =
my_df[['column1','column2']].apply(pd.to_datetime, format='%d.%m.%Y %H:%M%S')
or something along those lines as you prefer
I recently started using Google's BigQuery service, and their Python API, to query some large databases. I'm new to SQL, and the BigQuery documentation isn't incredibly helpful for what I'm doing.
Currently I'm looking through the reddit_comments database, and there's 'created_utc' tag that I'm trying to filter by. This created_utc field is in terms of Unix timestamps (i.e. November 1st, 12:00 AM is 1541030400)
I'd like to grab comments day by day (or between two Unix timestamps) but in a way that I'm iterating over each day. Something like:
from datetime import datetime, timedelta
start = datetime.fromtimestamp(1538352000)
end = datetime.fromtimestamp(1541030400)
time = start
while time < end:
print(time)
time = time + timedelta(days = 1)
Printing times here yield one like: 2018-09-30 20:00:00
However in order to query, I have to convert back to the Unix timestamp by invoking datetime's timestamp() function like time.timestamp()
The problem is, I'm trying to use the timestamp() function inside the query like so:
SELECT *
FROM 'fh-bigquery.reddit_comments.2018_10'
...
AND (created_utc >= curr_day.timestamp() AND created_utc <= next_day.timestamp())
however, it's throwing a BadRequest: 400 Function not found. Is there a way to use built-in Python functions in the way that I've described above? Or does there need to be some alternative?
Everything so far seems pretty intuitive, but it's weird that I can't find much helpful information on this specifically.
You should use BigQuery's Built-in functions
For example:
To get current timestamp - CURRENT_TIMESTAMP()
To get timestamp of start of current date - TIMESTAMP_TRUNC(CURRENT_TIMESTAMP(), DAY)
To get timestamp of start of next date - TIMESTAMP_TRUNC(TIMESTAMP_ADD(CURRENT_TIMESTAMP() , INTERVAL 1 DAY), DAY)
and so on
Also, to convert created_utc to TIMESTAMP type - you can use TIMESTAMP_SECONDS(created_utc)
You can see more about TIMESTAMP Functions
I've been trying to figure out how to generate the same Unix epoch time that I see within InfluxDB next to measurement entries.
Let me start by saying I am trying to use the same date and time in all tests:
April 01, 2017 at 2:00AM CDT
If I view a measurement in InfluxDB, I see time stamps such as:
1491030000000000000
If I view that measurement in InfluxDB using the -precision rfc3339 it appears as:
2017-04-01T07:00:00Z
So I can see that InfluxDB used UTC
I cannot seem to generate that same timestamp through Python, however.
For instance, I've tried a few different ways:
>>> calendar.timegm(time.strptime('04/01/2017 02:00:00', '%m/%d/%Y %H:%M:%S'))
1491012000
>>> calendar.timegm(time.strptime('04/01/2017 07:00:00', '%m/%d/%Y %H:%M:%S'))
1491030000
>>> t = datetime.datetime(2017,04,01,02,00,00)
>>> print "Epoch Seconds:", time.mktime(t.timetuple())
Epoch Seconds: 1491030000.0
The last two samples above at least appear to give me the same number, but it's much shorter than what InfluxDB has. I am assuming that is related to the precision, InfluxDB does things down to nanosecond I think?
Python Result: 1491030000
Influx Result: 1491030000000000000
If I try to enter a measurement into InfluxDB using the result Python gives me it ends up showing as:
1491030000 = 1970-01-01T00:00:01.49103Z
So I have to add on the extra nine 0's.
I suppose there are a few ways to do this programmatically within Python if it's as simple as adding on nine 0's to the result. But I would like to know why I can't seem to generate the same precision level in just one conversion.
I have a CSV file with tons of old timestamps that are simply, "4/1/17 2:00". Every day at 2 am there is a measurement.
I need to be able to convert that to the proper format that InfluxDB needs "1491030000000000000" to insert all these old measurements.
A better understanding of what is going on and why is more important than how to programmatically solve this in Python. Although I would be grateful to responses that can do both; explain the issue and what I am seeing and why as well as ideas on how to take a CSV with one column that contains time stamps that appear as "4/1/17 2:00" and convert them to timestamps that appear as "1491030000000000000" either in a separate file or in a second column.
InfluxDB can be told to return epoch timestamps in second precision in order to work more easily with tools/libraries that do not support nanosecond precision out of the box, like Python.
Set epoch=s in query parameters to enable this.
See influx HTTP API timestamp format documentation.
Something like this should work to solve your current problem. I didn't have a test csv to try this on, but it will likely work for you. It will take whatever csv file you put where "old.csv" is and create a second csv with the timestamp in nanoseconds.
import time
import datetime
import csv
def convertToNano(date):
s = date
secondsTimestamp = time.mktime(datetime.datetime.strptime(s, "%d/%m/%y %H:%M").timetuple())
nanoTimestamp = str(secondsTimestamp).replace(".0", "000000000")
return nanoTimestamp
with open('old.csv', 'rb') as old_csv:
csv_reader = csv.reader(old_csv)
with open('new.csv', 'wb') as new_csv:
csv_writer = csv.writer(new_csv)
for i, row in enumerate(csv_reader):
if i != 0:
# Put whatever rows the data appears in and the row you want the data to go in here
row.append(convertToNano(row[<location of date in the row>]))
csv_writer.writerow(row)
As to why this is happening, after reading this it seems like you aren't the only one getting frustrated by this issue. It seems as though influxdb just happens to be using a different type of precision then most python modules. I didn't really see any way to get around it than doing the string manipulation of the date conversion unfortunately.
I am trying to figure out what the best way to create a list of timestamps in Python is, where the values for the items in the list increment by one minute. The timestamps would be by minute, and would be for the previous 24 hours. I need to create timestamps of the format "MM/dd/yyy HH:mm:ss" or to at least contain all of those measures. The timestamps will be an axis for a graph of data that I am collecting.
Calculating the times alone isn't too bad, as I could just get the current time, convert it to seconds, and change the value by one minute very easily. However, I am kind of stuck on figuring out the date aspect of it without having to do a lot of checking, which doesn't feel very Pythonic.
Is there an easier way to do this? For example, in JavaScript, you can get a Date() object, and simply subtract one minute from the value and JS will take care of figuring out if any of the other fields need to change and how they need to change.
datetime is the way to go, you might want to check out This Blog.
import datetime
import time
now = datetime.datetime.now()
print now
print now.ctime()
print now.isoformat()
print now.strftime("%Y%m%dT%H%M%S")
This would output
2003-08-05 21:36:11.590000
Tue Aug 5 21:36:11 2003
2003-08-05T21:36:11.590000
20030805T213611
You can also do subtraction with datetime and timedelta objects
now = datetime.datetime.now()
minute = timedelta(days=0,seconds=60,microseconds=0)
print now-minute
would output
2015-07-06 10:12:02.349574
You are looking for datetime and timedelta objects. See the docs.
I am new to python. I am looking for ways to extract/tag the date & time specific information from text
e.g.
1.I will meet you tomorrow
2. I had sent it two weeks back
3. Waiting for you last half an hour
I had found timex from nltk_contrib, however found couple of problems with it
https://code.google.com/p/nltk/source/browse/trunk/nltk_contrib/nltk_contrib/timex.py
b. Not sure of the Date data type passed to ground(tagged_text, base_date)
c. It deals only with date i.e. granularity at day level. Cant find expression like next one hour etc.
Thank you for your help
b) The data type that you need to pass to ground(tagged_text, base_date) is an instance of the datetime.date class which you'd initialize using something like:
from datetime import date
base_date = date.today()