Calculate Time difference between two points in the same column (ArcGIS) - python

I am trying to calculate the time difference between two points in ArcGIS, using VBScript or Python. I have a dataset of over 10 thousand points. Each has coordinates, dates, and times. I want to create a new field and calculate the time difference in seconds.
The data looks as follows:
FID Shape N E DateTime
0 Point 4768252.94469 4768252.94469 2021/05/06 12:12:05
1 Point 4768245.79949 4768245.79949 2021/05/06 12:12:11
2 Point 4768241.44071 4768241.44071 2021/05/06 12:12:15
3 Point 4768237.3568 4768237.3568 2021/05/06 12:12:18
So, the result with the data showing up would be "6, 4, 3, ...". I would appreciate your help a lot as I have tried many things and none worked.

Here is one way to do it using the Pandas module for python.
You can do this:
# import module Pandas
import pandas as pd
# Data as a python Dictionary. Can be imported as CSV too.
data = {
'N' : ['4768252.94469', '4768245.79949', '4768241.44071', '4768237.3568'],
'E' : ['4768252.94469', '4768245.79949', '4768241.44071', '4768237.3568'],
'Time': ['12:12:05','12:12:11','12:12:15','12:12:18']
}
# Creating a Pandas Dataframe object
df = pd.DataFrame(data)
# If you want to import the data from CSV use df = pd.read_csv('csvname.csv')
# Converting Time column to datetime object
df['Time'] = pd.to_datetime(df['Time'])
# print the differences
print(df["Time"].diff())
output:
1 0 days 00:00:06
2 0 days 00:00:04
3 0 days 00:00:03

Related

How can i calculate for Average true range in pandas

how can I calculate the Average true range in a data frame
I have tried to using np where() and is not working
I have all this value below
Current High - Current Low
abs(Current High - Previous Close)
abs(Current Low - Previous Close)
but I don't know how I to set the highest between the three value to the pandas data frame
It looks like you might be trying to do the following :
import pandas as pd
from numpy.random import rand
df = pd.DataFrame(rand(10,5),columns={'High-Low','High-close','Low-close','A','B'})
cols = ['High-Low','High-close','Low-close']
df['true_range'] = df[cols].max(axis=1)
print(df)
The output will look like
High-Low Low-close B A High-close true_range
0 0.916121 0.026572 0.082619 0.672000 0.605287 0.916121
1 0.622589 0.944646 0.638486 0.905139 0.262275 0.944646
2 0.611374 0.756191 0.829803 0.828205 0.614956 0.756191
3 0.810638 0.501693 0.504800 0.069532 0.283825 0.810638
4 0.984463 0.900823 0.434061 0.905273 0.518056 0.984463
5 0.377742 0.480266 0.018676 0.383831 0.819448 0.819448
6 0.473753 0.652077 0.730400 0.305507 0.396969 0.652077
7 0.427047 0.733135 0.526076 0.542852 0.719194 0.733135
8 0.911629 0.633997 0.101848 0.020811 0.327233 0.911629
9 0.244624 0.893365 0.278941 0.354696 0.678280 0.893365
If this isn't what you had in mind, it would be helpful to clarify your question by providing a small example where you clearly identify the columns and the index in your DataFrame and what you mean by "true range".

How to create visualization from time series data in a .txt file in python

I have a .txt file with three columns: Time, ticker, price. The time is spaced in 15 second intervals. It looks like this uploaded to jupyter notebook and put into a Pandas DF.
time ticker price
0 09:30:35 EV 33.860
1 00:00:00 AMG 60.430
2 09:30:35 AMG 60.750
3 00:00:00 BLK 455.350
4 09:30:35 BLK 451.514
... ... ... ...
502596 13:00:55 TLT 166.450
502597 13:00:55 VXX 47.150
502598 13:00:55 TSLA 529.800
502599 13:00:55 BIDU 103.500
502600 13:00:55 ON 12.700
# NOTE: the first set of data has the data at market open for -
# every other time point, so that's what the 00:00:00 is.
#It is only limited to the 09:30:35 data.
I need to create a function that takes an input (a ticker) and then creates a bar chart that displays the data with 5 minute ticks ( the data is every 20 seconds, so for every 15 points in time).
So far I've thought about separating the "mm" part of the hh:mm:ss to just get the minutes in another column and then right a for loop that looks something like this:
for num in df['mm']:
if num %5 == 0:
print('tick')
then somehow appending the "tick" to the "time" column for every 5 minutes of data (I'm not sure how I would do this), then using the time column as the index and only using data with the "tick" index in it (some kind of if statement). I'm not sure if this makes sense but I'm drawing a blank on this.
You should have a look at the built-in functions in pandas. In the following example I'm using a date + time format but it shouldn't be hard to convert one to the other.
Generate data
%matplotlib inline
import pandas as pd
import numpy as np
dates = pd.date_range(start="2020-04-01", periods=150, freq="20S")
df1 = pd.DataFrame({"date":dates,
"price":np.random.rand(len(dates))})
df2 = df1.copy()
df1["ticker"] = "a"
df2["ticker"] = "b"
df = pd.concat([df1,df2], ignore_index=True)
df = df.sample(frac=1).reset_index(drop=True)
Resample Timeseries every 5 minutes
Here you can try to see the output of
df1.set_index("date")\
.resample("5T")\
.first()\
.reset_index()
Where we are considering just the first element at 05:00, 10:00 and so on. In general to do the same for every ticker we need a groupby
out = df.groupby("ticker")\
.apply(lambda x: x.set_index("date")\
.resample("5T")\
.first()\
.reset_index())\
.reset_index(drop=True)
Plot function
def plot_tick(data, ticker):
ts = data[data["ticker"]==ticker].reset_index(drop=True)
ts.plot(x="date", y="price", kind="bar", title=ticker);
plot_tick(out, "a")
Then you can improve the plot or, eventually, try to use plotly.

transform raw date format into pandas date object

I have a CSV file which looks like this:
time, Numbers
[30/Apr/1998:21:30:17,24736
[30/Apr/1998:21:30:53,24736
[30/Apr/1998:21:31:12,24736
[30/Apr/1998:21:31:19,3781
[30/Apr/1998:21:31:22,-
[30/Apr/1998:21:31:27,24736
[30/Apr/1998:21:31:29,-
[30/Apr/1998:21:31:29,-
[30/Apr/1998:21:31:32,929
[30/Apr/1998:21:31:43,-
[30/Apr/1998:21:31:44,1139
[30/Apr/1998:21:31:52,24736
[30/Apr/1998:21:31:52,3029
[30/Apr/1998:21:32:06,24736
[30/Apr/1998:21:32:16,-
[30/Apr/1998:21:32:16,-
[30/Apr/1998:21:32:17,-
[30/Apr/1998:21:32:30,14521
[30/Apr/1998:21:32:33,11324
[30/Apr/1998:21:32:35,24736
[30/Apr/1998:21:32:3l8,671
[30/Apr/1998:21:32:38,1512
[30/Apr/1998:21:32:38,1136
[30/Apr/1998:21:32:38,1647
[30/Apr/1998:21:32:38,1271
[30/Apr/1998:21:32:52,5933
[30/Apr/1998:21:32:58,-
[30/Apr/1998:21:32:59,231
upto one billion,
forget about numbers column, I have a concern to convert this time-date format in my CSV file to pandas time stamp, so I can plot my dataset and visualize it according to time, as I am new in datascience,here is my approach:
step 1: take all the time colum from my CSV file into an array,
step 2: split the data from the mid where :(colon) occurs, make two new arrays of date and time,
step 3: remove "[" from date array,
step 4: replace all forward slash into dashes in the date array,
step 5: and then append date and time array to make a single pandas format,
which will be looks like this, 2017-03-22 15:16:45 as you known that I am new and my approach is naive and also wrong, if someone can help me with providing me code snippet, I will be really happy, thanks
You can pass a format to pd.to_datetime(), in this case: [%d/%b/%Y:%H:%M:%S.
Be careful with erroneous data though as seen in row 3 in sample data below ([30/Apr/1998:21:32:3l8,671). To not get an error you can pass errors=coerce, will return Not a Time (NaT).
The other way would be to replace those rows manually or write some sort of regex/replace funtion first.
import pandas as pd
data = '''\
time, Numbers
[30/Apr/1998:21:30:17,24736
[30/Apr/1998:21:30:53,24736
[30/Apr/1998:21:32:3l8,671
[30/Apr/1998:21:32:38,1512
[30/Apr/1998:21:32:38,1136
[30/Apr/1998:21:32:58,-
[30/Apr/1998:21:32:59,231'''
fileobj = pd.compat.StringIO(data)
df = pd.read_csv(fileobj, sep=',', na_values=['-'])
df['time'] = pd.to_datetime(df['time'], format='[%d/%b/%Y:%H:%M:%S', errors='coerce')
print(df)
Returns:
time Numbers
0 1998-04-30 21:30:17 24736.0
1 1998-04-30 21:30:53 24736.0
2 NaT 671.0
3 1998-04-30 21:32:38 1512.0
4 1998-04-30 21:32:38 1136.0
5 1998-04-30 21:32:58 NaN
6 1998-04-30 21:32:59 231.0
Note that: na_values=['-'] was used here to help pandas understand the Numbers column is actually numbers and not strings.
And now we can perform actions like grouping (on minute for instance):
print(df.groupby(df.time.dt.minute)['Numbers'].mean())
#time
#30.0 24736.000000
#32.0 959.666667

For loop issues with Quandl -- Python

I'm trying to create a for-loop that automatically runs through my parsed list of NASDAQ stocks, and inserts their Quandl codes to then be retrieved from Quandl's database. essentially creating a large data set of stocks to perform data analysis on. My code "appears" right, but when I print the query it only prints 'GOOG/NASDAQ_Ticker' and nothing else. Any help and/or suggestions will be most appreciated.
import quandl
import pandas as pd
import matplotlib.pyplot as plt
import numpy
def nasdaq():
nasdaq_list = pd.read_csv('C:\Users\NAME\Documents\DATASETS\NASDAQ.csv')
nasdaq_list = nasdaq_list[[0]]
print nasdaq_list
for abbv in nasdaq_list:
query = 'GOOG/NASDAQ_' + str(abbv)
print query
df = quandl.get(query, authtoken="authoken")
print df.tail()[['Close', 'Volume']]
Iterating over a pd.DataFrame as you have done iterates by column. For example,
>>> df = pd.DataFrame(np.arange(9).reshape((3,3)))
>>> df
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
>>> for i in df[[0]]: print(i)
0
I would just get the first column as a Series with .ix,
>>> for i in df.ix[:,0]: print(i)
0
3
6
Note that in general if you want to iterate by row over a DataFrame you're looking for iterrows().

How to access elements from imported csv file with pandas in python?

Apologies for this basic question. I am new to Python and having some problem with my codes. I used pandas to load in a .csv file and having problem accessing particular elements.
import pandas as pd
dateYTM = pd.read_csv('Date.csv')
print(dateYTM)
## Result
# Date
# 0 20030131
# 1 20030228
# 2 20030331
# 3 20030430
# 4 20030530
#
# Process finished with exit code 0
How can I access say the first date? I tried many difference ways but wasn't able to achieve what I want? Many thanks.
You can use read_csv with parameter parse_dates loc, see Selection By Label:
import pandas as pd
import numpy as np
import io
temp=u"""Date,no
20030131,1
20030228,3
20030331,5
20030430,6
20030530,3
"""
#after testing replace io.StringIO(temp) to filename
dateYTM = pd.read_csv(io.StringIO(temp), parse_dates=['Date'])
print dateYTM
Date no
0 2003-01-31 1
1 2003-02-28 3
2 2003-03-31 5
3 2003-04-30 6
4 2003-05-30 3
#df.loc[index, column]
print dateYTM.loc[0, 'Date']
2003-01-31 00:00:00
print dateYTM.loc[0, 'no']
1
But if you need only one value, better is use at see Fast scalar value getting and setting:
#df.at[index, column]
print dateYTM.at[0, 'Date']
2003-01-31 00:00:00
print dateYTM.at[0, 'no']
1

Categories

Resources