How to distribute proportionally dates on a scale with Python - python

I have a very simple charting component which takes integer on the x/y axis. My problem is that I need to represent date/float on this chart. So I though I could distribute proportionally dates on a scale. In other words, let's say I have the following date : 01/01/2008, 02/01/2008 and 31/12/2008. The algorithm would return 0, 16.667, and 100 (1 month = 16.667%).
I tried to play with the datetime and timedelta classes of Python 2.5 and I am unable to achieve this. I thought I could use the number of ticks, but I am not even able to get that info from datetime.
Any idea how I could write this algorithm in Python? Otherwise, any other ideas or algorithms?

If you're dealing with dates, then you can use the method toordinal.
import datetime
jan1=datetime.datetime(2008,1,1)
dec31=datetime.datetime(2008,12,31)
feb1=datetime.datetime(2008,02,01)
dates=[jan1,dec31,feb1]
dates.sort()
datesord=[d.toordinal() for d in dates]
start,end=datesord[0],datesord[-1]
def datetofloat(date,start,end):
"""date,start,end are ordinal dates
ie Jan 1 of the year 1 has ordinal 1
Jan 1 of the year 2008 has ordinal 733042"""
return (date-start)*1.0/(end-start)
print datetofloat(dates[0],start,end)
0.0
print datetofloat(dates[1],start,end)
0.0849315068493*
print datetofloat(dates[2],start,end)
1.0
*16.67% is about two months of a year, so the proportion for Feb 1 is about half of that.

It's fairly easy to convert a timedelta into a numeric value.
Select an epoch time. Calculate deltas for every value relative to the epoch. Convert the delta's into a numeric value. Then map the numeric values as you normally would.
Conversion is straight forward. Something like:
def f(delta):
return delta.seconds + delta.days * 1440 * 60 +
(delta.microseconds / 1000000.0)

I don't know if I fully understand what you are trying to do, but you can just deal with times as number of seconds since the UNIX epoch and then just use plain old subtraction to get a range that you can scale to the size of your plot.
In processing, the map function will handle this case for you. http://processing.org/reference/map_.html I'm sure you can adapt this for your purpose

Related

Why does dividing by np.timedelta(...) provide a float answer from the difference between two datetime objects?

Say I have two datetime (or timestamp) variables and I am trying to get the difference between them. I can do:
diff = date1 - date2
This results in a timedelta object (or an array of dtype=timedelta64, if you were using two series of datetimes).
Then, if I want the float value of diff in number of days, I can perform:
diff / np.timedelta64(24, 'h') # np.timedelta64(1, 'd') works the same, IIRC.
This results in a float value, as stated, which you can then use arithmetic comparisons on.
What I don't understand from the documentation or google searching is why this works as a mathematical operation, when it feels much more like I'm simply converting from timedelta to float, and then selecting only the day value.
I'm most likely just not understanding the specifics of timedelta, but I'm hoping someone else understands it much better than I do, and can explain the logic behind this.
From my understanding of the general logic behind all of time issues the timedelta64 is just a 64 bit number representing a span of time. This span can be expressed in hours or days, but not in months or years because the number of months or years depends on the "position" of the span on a time-line.
Hours or days can be calculated from the number representing time difference by dividing it by the amount of "units" an hour or day is represented in, so division is the right way of obtaining the appropriate result from the timedelta64 values.
numpy documentation states on timedelta:
NumPy allows the subtraction of two datetime values, an operation which produces a number with a time unit.
I am following from this that the timedelta is just a number which value/order is defined by the associated unit, so it can be then converted to another units in a way units are converted from one to another and this is usually done by multiplication, division, addition and subtraction - which all are arithmetic operations on numbers.

Timedelta time difference expressed as float variable

I have data in a pandas dataframe that is marked by timestamps as datetime objects. I would like to make a graph that takes the time as something fluid. My idea was to substract the first timestamp from the others (here exemplary for the second entry)
xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']
to get the time passed since the first measurement. Which gives 350 days 08:27:51 as a timedelta object. So far so good.
This might be a duplicate but I have not found the solution here so far. Is there a way to quickly transform this object to a number of e.g. minutes or seconds or hours. I know I could extract the individual days, hours and minutes and make a tedious calculation to get it. But is there an integrated way to just turn this object into what I want?
Something like
timedelta.tominutes
that gives it back as a float of minutes, would be great.
If all you want is a float representation, maybe as simple as:
float_index = pd.Index(xhertz_df.loc['Dates'].values.astype(float))
In Pandas, Timestamp and Timedelta columns are internally handled as numpy datetime64[ns], that is an integer number of nanoseconds.
So it is trivial to convert a Timedelta column to a number of minutes:
(xhertz_df.loc[1]['Dates']-xhertz_df.loc[0]['Dates']).astype('int64')/60000000000.
Here is a way to do so with ‘timestamp‘:
Two examples for converting and one for the diff
import datetime as dt
import time
# current date and time
now = dt.datetime.now()
timestamp1 = dt.datetime.timestamp(now)
print("timestamp1 =", timestamp1)
time.sleep(4)
now = dt.datetime.now()
timestamp2 = dt.datetime.timestamp(now)
print("timestamp2 =", timestamp2)
print(timestamp2 - timestamp1)

How to convert an int64 into a datetime?

I'm trying to convert the column Year (type: int64) into a date type so that I can use the Groupby function to group by decade.
I'm using the following code to convert the datatype:
import datetime as dt
crime["Date"]=pd.TimedeltaIndex(crime["Year"], unit='d')+dt.datetime(1960,1,1)
crime[["Year","Date"]].head(10)
Screenshot of output
The date it is returning to me is not correct - it isn't starting at the correct year and the day is increasing by the rows.
I want the year to start at 1960, and for each row the year to increase by 1.
I tried substituting unit='d' in the code above with unit='y' and I get the following result:
Value Error: Units 'M' and 'Y' are no longer supported, as they do not represent unambiguous timedelta value durations.
I think #kate's answer is what you want. I wrote my answer before that one came along. I thought my answer might still be worth something to explain why unit='y' isn't supported, and why unit='d' isn't working for you either...
I wouldn't think this would be right:
TimedeltaIndex(crime["Year"], unit='d')
as I expect this to be interpreting your year count as a count of days. If you can't use unit='y', then maybe there's a good reason for that. Maybe that is because years don't always have the same number of days in them, and so specifying a number of years is ambiguous in terms of the number of days that equates to. You have to add any count of years to an actual year for it to make exact sense.
The same holds true, even moreso, for months, since months have a variety of day counts, so you can have no idea what a timedelta in months really means.
I would add the column in the following way:
crime['Date'] = crime['Year'].map(lambda x: dt.datetime(1960 + x,1,1))

How to find missing days or hours that breaks continuity in Datetime index?

Many thanks in advance for helping a python newbie like me !
I have a DataFrame containing daily or hourly prices for a particular crypto.
I was just wondering if there is an easy way to check if there is any missing day or hour (depending on the chosen granularity) that would break a perfectly constant timedelta (between 2 dates) in the index?
Here an example of an other "due diligence" check I am doing. I am just making sure that the temporal order is respected:
# Check timestamp order:
i = 0
for i in range(0,len(df.TS)-1):
if df.TS[i] > df.TS[i+1]:
print('Timestamp does not respect time direction, please check df.')
break
else:
i += 1
Perhaps there is surely a better way to do this but I didn't find any build in function for both of these checks I would like to do.
Many thanks again and best regards,
Pierre
If df.TS is where you store your datetime data, then you can do this (example for daily data, change freq accordingly):
pd.date_range(start = df.TS.min(), end = df.TS.max(), freq = 'D').difference(df.TS)
This will return the difference between a complete range and your datetime series.

Formatting date data in NumPy array

I would be really grateful for an advice. I had an exercise like it's written bellow:
The first column (index 0) contains year values as four digit numbers
in the format YYYY (2016, since all trips in our data set are from
2016). Use assignment to change these values to the YY format (16) in
the test_array ndarray.
I used a code to solve it:
test_array[:,0] = test_array[:,0]%100
But I'm sure it has to be more universal and smart way to get the same results with datetime or smth else. But I cant find it. I tried different variations of this code, but I dont get whats wrong:
dt.datetime.strptime(str(test_array[:,0]), "%Y")
test_array[:,0] = dt.datetime.strftime("%y")
Could you help me with this, please?
Thank you
In order to carry out the conversion of year from YYYY format to YY format would require intermediate datetime value on which operations such as strftime can be carried out in following manner:
df.iloc[:, 0] = df.iloc[:, 0].apply(lambda x: pd.datetime(x, 1, 1).strftime('%y'))
Here to obtain the datetime values we needed 3 args: year, month and date, out of which we had year and the values for rest were assumed to be 1 as default.

Categories

Resources