I am dealing with monthly data for a number of different data sets (e.g. air temperature, ocean temperature, wind speeds etc), whereby each month and each data set will share similar attributes. I want to be able to initialise and populate these attributes as efficiently as possible. So far I can only think of trying to somehow append new attributes to pre-existing attributes. Is this possible in Python?
For example, let's say I initialise all my monthly variables like the following:
class Data_Read:
def __init__(self, monthly=np.zeros((200,200,40))): #3D data
self.jan = monthly
self.feb = monthly
self.march = monthly
self.april = monthly
self.may = monthly
self.june = monthly
self.july = monthly
self.august = monthly
self.sept = monthly
self.oct = monthly
self.nov = monthly
self.dec = monthly
Then I can create a new data set for each month which will become something like air_temp.jan or wind_speed.june by doing the following:
air_temp = Data_Read()
wind_speed = Data_Read()
These would be the raw data attributes. However I would then like to do some processing to each of these data sets (like de-trending). Is there a way I can create a class (or a new class?) that will generate new attributes like self.jan.detrend. Basically I want to avoid having to write 12 lines of code for every attribute I want to create, and then be able to easily call any "attribute of the attribute" in my data processing functions thereafter.
Many thanks.
Here's an example of how you could store everything internally in a dictionary and still reference the arrays as attributes as well as call functions on arrays by name:
import numpy as np
class Data_Read:
def __init__(self, monthly_shape=(200, 200, 40)): #3D data
months = [
"jan",
"feb",
"march",
"april",
"may",
"june",
"july",
"august",
"sept",
"oct",
"nov",
"dec"
]
self._months = {month: np.zeros(monthly_shape) for month in months}
def detrend(self, month):
# this is a dummy function that just increments
return self._months[month] + 1
def __getattr__(self, name):
if name in self._months:
return self._months[name]
return super().__getattr__(name)
air_temp = Data_Read()
print(air_temp.jan.shape) # (200, 200, 40)
print((air_temp.detrend("jan") == 1).all()) # True
You can also achieve the same result using setattr and getattr because attributes are just stored in a dictionary on the object anyway:
import numpy as np
class Data_Read:
def __init__(self, monthly_shape=(200, 200, 40)): #3D data
months = [
"jan",
"feb",
"march",
"april",
"may",
"june",
"july",
"august",
"sept",
"oct",
"nov",
"dec"
]
for month in months:
setattr(self, month, np.zeros(monthly_shape))
def detrend(self, month):
# this is a dummy function that just increments
return getattr(self, month) + 1
Related
I have working code using test data, seen here:
(sample data matches type and format of inputs that I will be using this script with later, including dates as strings appended with 'Z')
from sklearn import linear_model
import pandas as pd
import numpy as np
d = {'Start Sunday of FW': ['2022-03-02Z', '2022-03-03Z', '2022-03-04Z', '2022-03-05Z', '2022-03-06Z', '2022-03-01Z',
'2022-03-02Z', '2022-03-03Z', '2022-03-04Z', '2022-03-05Z', '2022-03-06Z', '2022-03-01Z'],
'Store': [1111, 1111, 1111, 1111, 1111, 1111, 2222, 2222, 2222, 2222, 2222, 2222],
'Sales': [3163, 4298, 2498, 4356, 4056, 3931, 3163, 4298, 2498, 4356, 4056, 1]}
df = pd.DataFrame(data=d)
def get_coef(input):
def model1(df):
y = df[['Sales']].values
x = df[['Start Sunday of FW']].values
return np.squeeze(linear_model.LinearRegression().fit(x,y).coef_)
cnames = {'Store': 'Store', 0: 'Coef'}
def prep_input(df):
df['Start Sunday of FW'] = df['Start Sunday of FW'].astype('string').str.rstrip('Z').astype('datetime64[ns]')
return df
return pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
print(get_coef(df))
This code runs fine on its own.
I have installed Tableau Prep and TabPy, and set it up correctly per instructions.
When I try to run the version in the code block below from Prep, however, the Prep flow fails and I get the error:
```2022-04-06,13:08:58 [ERROR] (base_handler.py:base_handler:115): Responding with status=500, message="Error processing script", info="TypeError : Object of type ndarray is not JSON serializable"```
If I instead have it print() the current return line from get_coef(), return the input instead, and remove the get_output_schema function: the return is printed correctly, the output does flow to Tableau Prep in the live previews, but the error still raises and the flow still won't work, which is baffling.
from sklearn import linear_model
import pandas as pd
import numpy as np
def get_coef(input):
def model1(df):
y = df[['Sales']].values
x = df[['Start Sunday of FW']].values
return np.squeeze(linear_model.LinearRegression().fit(x,y).coef_)
cnames = {'Store': 'Store', 0: 'Coef'}
def prep_input(df):
df['Start Sunday of FW'] = df['Start Sunday of FW'].astype('string').str.rstrip('Z').astype('datetime64[ns]')
return df
return pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
def get_output_schema():
return pd.DataFrame({
'Store' : prep_string(),
'Coef' : prep_decimal()
})
Can someone help me understand the issue? I don't know anything about JSON serialization to begin with, so posts like this are of little help to me; I can't even assess relevance.
result = pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
result['Coef'] = result['Coef'].astype('double')
return result
Coef was dtype object, TabPy requires double.
I am building a vaccination appointment program that automatically assigns a slot to the user.
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date in the CSV table we just created:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I need help with making the program check if the random hour it chose on that date has vacant slots. I can manage the while loops that are needed if the number of vacant slots is 0. And no, I have not tried much because I have no clue of how to do this.
P.S. If you're going to try running the code, please remember to change the save and read location.
Here is how I would do it. I've also cleaned it up a bit.
import random
import pandas as pd
start_date, end_date = '1/1/2022', '31/12/2022'
hours = [f'{hour}:00' for hour in range(8, 18)]
df = pd.DataFrame(
data=pd.date_range(start_date, end_date),
columns=['Date/Time']
)
for hour in hours:
df[hour] = 100
# 1000 simulations
for _ in range(1000):
random_date, random_hour = random.randrange(365), random.choice(hours)
# Check if slot has vacant slot
if df.at[random_date, random_hour] > 0:
df.at[random_date, random_hour] -= 1
else:
# Pass here, but you can add whatever logic you want
# for instance you could give it the next free slot in the same day
pass
print(df.describe())
import pandas
import random
from random import randrange
# randrange randomly picks an index for date and time for the user
random_date = randrange(365)
# random_hour = randrange(10) #consider removing this line since it's not used
lista = [# consider avoid using Python preserved names
"8:00",
"9:00",
"10:00",
"11:00",
"12:00",
"13:00",
"14:00",
"15:00",
"16:00",
"17:00",
]
hour = random.choice(lista)
df = pandas.read_csv("new.csv")
date = df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
if df.loc[random_date, hour] > 0:#here is what you asked for
df.loc[random_date, hour] -= 1
else:
print(f"No Vacant Slots in {random_date}, {hour}")
df.to_csv(r"new.csv", index=False)
print(date)
print(hour)
Here's another alternative. I'm not sure you really need the very large and slow-to-load pandas module for this. This does it with plan Python structures. I tried to run the simulation until it got a failure, but with 365,000 open slots, and flushing the database to disk each time, it takes too long. I changed the 100 to 8, just to see it find a dup in reasonable time.
import csv
import datetime
import random
def create():
start = datetime.date( 2022, 1, 1 )
oneday = datetime.timedelta(days=1)
headers = ["date"] + [f"{i}:00" for i in range(8,18)]
data = []
for _ in range(365):
data.append( [start.strftime("%Y-%m-%d")] + [8]*10 ) # not 100
start += oneday
write( headers, data )
def write(headers, rows):
fcsv = csv.writer(open('data.csv','w',newline=''))
fcsv.writerow( headers )
fcsv.writerows( rows )
def read():
days = []
headers = []
for row in csv.reader(open('data.csv')):
if not headers:
headers = row
else:
days.append( [row[0]] + list(map(int,row[1:])))
return headers, days
def choose( headers, days ):
random_date = random.randrange(365)
random_hour = random.randrange(len(headers)-1)+1
choice = days[random_date][0] + " " + headers[random_hour]
print( "Chose", choice )
if days[random_date][random_hour]:
days[random_date][random_hour] -= 1
write(headers,days)
return choice
else:
print("Randomly chosen slot is full.")
return None
create()
data = read()
while choose( *data ):
pass
I'm downloading historical candlestick data for multiple crypto pairs across different timeframes from the binance api, i would like to know how to sort this data according to pair and timeframe and check which pair on which timeframe executes my code, the following code is what i use to get historical data
import requests
class BinanceFuturesClient:
def __init__(self):
self.base_url = "https://fapi.binance.com"
def make_requests(self, method, endpoint, data):
if method=="GET":
response = requests.get(self.base_url + endpoint, params=data)
return response.json()
def get_symbols(self):
symbols = []
exchange_info = self.make_requests("GET", "/fapi/v1/exchangeInfo", None)
if exchange_info is not None:
for symbol in exchange_info['symbols']:
if symbol['contractType'] == 'PERPETUAL' and symbol['quoteAsset'] == 'USDT':
symbols.append(symbol['pair'])
return symbols
def initial_historical_data(self, symbol, interval):
data = dict()
data['symbol'] = symbol
data['interval'] = interval
data['limit'] = 35
raw_candle = self.make_requests("GET", "/fapi/v1/klines", data)
candles = []
if raw_candle is not None:
for c in raw_candle:
candles.append(float(c[4]))
return candles[:-1]
running this code
print(binance.initial_historical_data("BTCUSDT", "5m"))
will return this as the output
[55673.63, 55568.0, 55567.89, 55646.19, 55555.0, 55514.53, 55572.46, 55663.91, 55792.83, 55649.43,
55749.98, 55680.0, 55540.25, 55470.44, 55422.01, 55350.0, 55486.56, 55452.45, 55507.03, 55390.23,
55401.39, 55478.63, 55466.48, 55584.2, 55690.03, 55760.81, 55515.57, 55698.35, 55709.78, 55760.42,
55719.71, 55887.0, 55950.0, 55980.47]
which is a list of closes
i want to loop through the code in such a manner that i can return all the close prices for the pairs and timeframes i need and sort it accordingly, i did give it a try but am just stuck at this point
period = ["1m", "3m", "5m", "15m"]
binance = BinanceFuturesClient()
symbols = binance.get_symbols()
for symbol in symbols:
for tf in period:
historical_candles = binance.initial_historical_data(symbol, tf)
# store values and run through strategy
You can use my code posted below. It requires python-binance package to be installed on your environment and API key/secret from your Binance account. Method tries to load data by weekly chunks (parameter step) and supports resending requests on failures after timeout. It may helps when you need to fetch huge amount of data.
import pandas as pd
import pytz, time, datetime
from binance.client import Client
from tqdm.notebook import tqdm
def binance_client(api_key, secret_key):
return Client(api_key=api_key, api_secret=secret_key)
def load_binance_data(client, symbol, start='1 Jan 2017 00:00:00', timeframe='1M', step='4W', timeout_sec=5):
tD = pd.Timedelta(timeframe)
now = (pd.Timestamp(datetime.datetime.now(pytz.UTC).replace(second=0)) - tD).strftime('%d %b %Y %H:%M:%S')
tlr = pd.DatetimeIndex([start]).append(pd.date_range(start, now, freq=step).append(pd.DatetimeIndex([now])))
print(f' >> Loading {symbol} {timeframe} for [{start} -> {now}]')
df = pd.DataFrame()
s = tlr[0]
for e in tqdm(tlr[1:]):
if s + tD < e:
_start, _stop = (s + tD).strftime('%d %b %Y %H:%M:%S'), e.strftime('%d %b %Y %H:%M:%S')
nerr = 0
while nerr < 3:
try:
chunk = client.get_historical_klines(symbol, timeframe.lower(), _start, _stop)
nerr = 100
except e as Exception:
nerr +=1
print(red(str(e)))
time.sleep(10)
if chunk:
data = pd.DataFrame(chunk, columns = ['timestamp', 'open', 'high', 'low', 'close', 'volume', 'close_time', 'quote_av', 'trades', 'tb_base_av', 'tb_quote_av', 'ignore' ])
data.index = pd.to_datetime(data['timestamp'].rename('time'), unit='ms')
data = data.drop(columns=['timestamp', 'close_time']).astype(float).astype({
'ignore': bool,
'trades': int,
})
df = df.append(data)
s = e
time.sleep(timeout_sec)
return df
How to use
c = binance_client(<your API code>, <your API secret>)
# loading daily data from 1/Mar/21 till now (your can use other timerames like 1m, 5m etc)
data = load_binance_data(c, 'BTCUSDT', '2021-03-01', '1D')
It returns indexed DataFrame with loaded data:
time
open
high
low
close
volume
quote_av
trades
tb_base_av
tb_quote_av
ignore
2021-03-02 00:00:00
49595.8
50200
47047.6
48440.7
64221.1
3.12047e+09
1855583
31377
1.52515e+09
False
2021-03-03 00:00:00
48436.6
52640
48100.7
50349.4
81035.9
4.10952e+09
2242131
40955.4
2.07759e+09
False
2021-03-04 00:00:00
50349.4
51773.9
47500
48374.1
82649.7
4.07984e+09
2291936
40270
1.98796e+09
False
2021-03-05 00:00:00
48374.1
49448.9
46300
48751.7
78192.5
3.72713e+09
2054216
38318.3
1.82703e+09
False
2021-03-06 00:00:00
48746.8
49200
47070
48882.2
44399.2
2.14391e+09
1476474
21500.6
1.03837e+09
False
Next steps are up to you and dependent on how would you like to design your data structure. In simplest case you could store data into dictionaries:
from collections import defaultdict
data = defaultdict(dict)
for symbol in ['BTCUSDT', 'ETHUSDT']:
for tf in ['1d', '1w']:
historical_candles = load_binance_data(c, symbol, '2021-05-01', timeframe=tf)
# store values and run through strategy
data[symbol][tf] = historical_candles
to get access to your OHLC you just need following: data['BTCUSDT']['1d'] etc.
I want to know how I can convert an input date for example if I enter 01/01/1996, the output would be "1 January 1996", I need to be able to do this for any date that is entered.
I need to use slicing.
Thank you in advance.
inLine = input("please enter a date (dd/mm/yyyy) : ").split(',')
monthNames = ("January", "February", "March", \
"April", "May", "June", \
"July", "August", "September", \
"October", "November", "December" )
month = inLine [:10]
month2 = "month"[:1]
day = inLine
year = inLine[0:2]
year2 = "year"[0:4]
print("this is the date %s %s %s." % (month2, day2, year2))
I don't know why you need split your input when datetime parsing libraries already exist in Python.
>>> from datetime import datetime
>>> datetime.strptime('01/01/1996', '%M/%d/%Y').strftime('%d %B %Y')
'01 January 1996'
This is very simplistic response. You need to go away and learn about arrays, indexing and manipulating them. I also suggest, if this is homework of some kind, you go through and do more examples and look into what happens if, for example, I enter invalid input.
inLine = input("please enter a date (dd/mm/yyyy) : ").split('/')
monthNames = ("January", "February", "March", \
"April", "May", "June", \
"July", "August", "September", \
"October", "November", "December" )
day = int(inLine[0])
month = int(inLine[1])
year = int(inLine[2])
print("this is the date %s %s %s." % (day, monthNames[month-1], year))
I am parsing a file this way :
for d in csvReader:
print datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
date() returns : 2000-01-08, which is correct
time() returns : 06:20:00, which is also correct
How would I go about returning informations like "date+time" or "date+hours+minutes"
EDIT
Sorry I should have been more precise, here is what I am trying to achieve :
lmb = lambda d: datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f").date()
daily_quotes = {}
for k, g in itertools.groupby(csvReader, key = lmb):
lowBids = []
highBids = []
openBids = []
closeBids = []
for i in g:
lowBids.append(float(i["Low Bid"]))
highBids.append(float(i["High Bid"]))
openBids.append(float(i["Open Bid"]))
closeBids.append(float(i["Close Bid"]))
dayMin = min(lowBids)
dayMax = max(highBids)
open = openBids[0]
close = closeBids[-1]
daily_quotes[k.strftime("%Y-%m-%d")] = [dayMin,dayMax,open,close]
As you can see, right now I'm grouping values by day, I would like to group them by hour ( for which I would need date + hour ) or minutes ( date + hour + minute )
thanks in advance !
Don't use the date method of the datetime object you're getting from strptime. Instead, apply strftime directly to the return from strptime, which gets you access to all the member fields, including year, month, day, hour, minute, seconds, etc...
d = {"Date": "01-Jan-2000", "Time": "01:02:03.456"}
dt = datetime.datetime.strptime(d["Date"]+"-"+d["Time"], "%d-%b-%Y-%H:%M:%S.%f")
print dt.strftime("%Y-%m-%d-%H-%M-%S")