pandas - index information lost after iloc[2,:] - python

I got a class which similar as bellow code, it maintains a pandas frame of states, in my real case is much more complex than this given, so I need to create several algorithm based filters to find out one of my required record.
import pandas as pd
import datetime as dt
class stateMachine:
def __init__(self):
self.contextStack = pd.DataFrame(columns=['state',
'end_date', 'speed',
'weight', 'area'])
self.state = 'idle'
self.day = dt.date.today() - dt.timedelta(days=100)
def update(self):
state_ctx = {
'state': self.state,
'end_date':self.day,
'speed': self.speed ,
'weight':self.weight
}
df = pd.DataFrame([state_ctx])
self.contextStack = pd.concat([self.contextStack, df], ignore_index=True)
self.day = self.day + dt.timedelta(days=1)
def set_speed(self, speed):
self.speed = speed
def set_weight(self, weight):
self.weight = weight
def set_state(self, state):
self.state = state
Here is a very simple example code to add up items like:
sm = stateMachine()
states = ['idle', 'on', 'off']
for i in range(0, 10):
sm.set_speed(i)
sm.set_weight(i+100)
sm.set_state(states[i%3])
sm.update()
So after run this, I get my dataframes:
state end_date speed weight
0 idle 2022-01-24 0 100
1 on 2022-01-25 1 101
2 off 2022-01-26 2 102
3 idle 2022-01-27 3 103
4 on 2022-01-28 4 104
5 off 2022-01-29 5 105
6 idle 2022-01-30 6 106
7 on 2022-01-31 7 107
8 off 2022-02-01 8 108
9 idle 2022-02-02 9 109
my current algorithm can find one selected item, it looks like:
def get_filtered_stack(self, state):
filtered_df = self.contextStack[(self.contextStack['state']==state)]
return filtered_df
def find_item_understate(self, state, weight, speed):
self.state_stack = self.get_filtered_stack(state)
# after some operation, I get a index of wanted
# let's assume it is 0
ctx = self.state_stack.iloc[0,:]
return ctx
Here comes my problem, after my higher level application got this 'ctx' and later it no longer be able to address back it's index in the original whole pandas dataframe.
context = sm.find_item_understate('on', 104, 4)
because the 'index' information is lost after 'iloc', here is the test context from above code looks like:
state on
end_date 2022-01-25
speed 1
weight 101
Name: 1, dtype: object
In some cases, I need to find back the original index in my later process, in this case, it is 1.
But since the context already lost 'index' information, it causes a trouble to me that, in the end of the day, the out context lost the way back home.
note: the date/speed/weight columns can't be refered as a filter to address back the index.

Related

Check if the number of slots is > 0 before picking a date and an hour?

I am building a vaccination appointment program that automatically assigns a slot to the user.
This builds the table and saves it into a CSV file:
import pandas
start_date = '1/1/2022'
end_date = '31/12/2022'
list_of_date = pandas.date_range(start=start_date, end=end_date)
df = pandas.DataFrame(list_of_date)
df.columns = ['Date/Time']
df['8:00'] = 100
df['9:00'] = 100
df['10:00'] = 100
df['11:00'] = 100
df['12:00'] = 100
df['13:00'] = 100
df['14:00'] = 100
df['15:00'] = 100
df['16:00'] = 100
df['17:00'] = 100
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv')
And this code randomly pick a date and an hour from that date in the CSV table we just created:
import pandas
import random
from random import randrange
#randrange randomly picks an index for date and time for the user
random_date = randrange(365)
random_hour = randrange(10)
list = ["8:00", "9:00", "10:00", "11:00", "12:00", "13:00", "14:00", "15:00", "16:00", "17:00"]
hour = random.choice(list)
df = pandas.read_csv('new.csv')
date=df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
df.loc[random_date, hour] -= 1
df.to_csv(r'C:\Users\Ric\PycharmProjects\pythonProject\new.csv',index=False)
print(date)
print(hour)
I need help with making the program check if the random hour it chose on that date has vacant slots. I can manage the while loops that are needed if the number of vacant slots is 0. And no, I have not tried much because I have no clue of how to do this.
P.S. If you're going to try running the code, please remember to change the save and read location.
Here is how I would do it. I've also cleaned it up a bit.
import random
import pandas as pd
start_date, end_date = '1/1/2022', '31/12/2022'
hours = [f'{hour}:00' for hour in range(8, 18)]
df = pd.DataFrame(
data=pd.date_range(start_date, end_date),
columns=['Date/Time']
)
for hour in hours:
df[hour] = 100
# 1000 simulations
for _ in range(1000):
random_date, random_hour = random.randrange(365), random.choice(hours)
# Check if slot has vacant slot
if df.at[random_date, random_hour] > 0:
df.at[random_date, random_hour] -= 1
else:
# Pass here, but you can add whatever logic you want
# for instance you could give it the next free slot in the same day
pass
print(df.describe())
import pandas
import random
from random import randrange
# randrange randomly picks an index for date and time for the user
random_date = randrange(365)
# random_hour = randrange(10) #consider removing this line since it's not used
lista = [# consider avoid using Python preserved names
"8:00",
"9:00",
"10:00",
"11:00",
"12:00",
"13:00",
"14:00",
"15:00",
"16:00",
"17:00",
]
hour = random.choice(lista)
df = pandas.read_csv("new.csv")
date = df.iloc[random_date][0]
# 1 is substracted from that cell as 1 slot will be assigned to the user
if df.loc[random_date, hour] > 0:#here is what you asked for
df.loc[random_date, hour] -= 1
else:
print(f"No Vacant Slots in {random_date}, {hour}")
df.to_csv(r"new.csv", index=False)
print(date)
print(hour)
Here's another alternative. I'm not sure you really need the very large and slow-to-load pandas module for this. This does it with plan Python structures. I tried to run the simulation until it got a failure, but with 365,000 open slots, and flushing the database to disk each time, it takes too long. I changed the 100 to 8, just to see it find a dup in reasonable time.
import csv
import datetime
import random
def create():
start = datetime.date( 2022, 1, 1 )
oneday = datetime.timedelta(days=1)
headers = ["date"] + [f"{i}:00" for i in range(8,18)]
data = []
for _ in range(365):
data.append( [start.strftime("%Y-%m-%d")] + [8]*10 ) # not 100
start += oneday
write( headers, data )
def write(headers, rows):
fcsv = csv.writer(open('data.csv','w',newline=''))
fcsv.writerow( headers )
fcsv.writerows( rows )
def read():
days = []
headers = []
for row in csv.reader(open('data.csv')):
if not headers:
headers = row
else:
days.append( [row[0]] + list(map(int,row[1:])))
return headers, days
def choose( headers, days ):
random_date = random.randrange(365)
random_hour = random.randrange(len(headers)-1)+1
choice = days[random_date][0] + " " + headers[random_hour]
print( "Chose", choice )
if days[random_date][random_hour]:
days[random_date][random_hour] -= 1
write(headers,days)
return choice
else:
print("Randomly chosen slot is full.")
return None
create()
data = read()
while choose( *data ):
pass

Update row after comparing values on pandas dataframe

I connect to an API that provides covid-19 data in Brazil organized by state and city, as follows:
#Bibliotecas
import pandas as pd
from pandas import Series, DataFrame, Panel
import matplotlib.pyplot as plt
from matplotlib.pyplot import plot_date, axis, show, gcf
import numpy as np
from urllib.request import Request, urlopen
import urllib
from http.cookiejar import CookieJar
import numpy as np
from datetime import datetime, timedelta
cj = CookieJar()
url_Bso = "https://brasil.io/api/dataset/covid19/caso_full/data?state=MG&city=Barroso"
req_Bso = urllib.request.Request(url_Bso, None, {"User-Agent": "python-urllib"})
opener_Bso = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
response_Bso = opener_Bso.open(req_Bso)
raw_response_Bso = response_Bso.read()
json_Bso = pd.read_json(raw_response_Bso)
results_Bso = json_Bso['results']
results_Bso = results_Bso.to_dict().values()
df_Bso = pd.DataFrame(results_Bso)
df_Bso.head(5)
This Api compiles the data released by the state health departments. However, there is a difference between the records of the state and city health departments, and the state records are out of date in relation to those of the cities. I would like to update Thursdays and Saturdays (the day when the epidemiological week ends). I'm trying the following:
saturday = datetime.today() + timedelta(days=-5)
yesterday = datetime.today() + timedelta(days=-1)
last_available_confirmed_day_Bso_saturday = 51
last_available_confirmed_day_Bso_yesterday = 54
df_Bso = df_Bso.loc[df_Bso['date'] == saturday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_saturday
df_Bso = df_Bso.loc[df_Bso['date'] == yesterday, ['last_available_confirmed']] = last_available_confirmed_day_Bso_yesterday
df_Bso
However, I get the error:
> AttributeError: 'int' object has no attribute 'loc'
I need another dataframe with the values of these days updates. Can anyone help?
You have to adjust the date. Your data frame date column is a string. You can convert them to datetime.
today = datetime.now()
last_sat_num = (today.weekday() + 2) % 7
last_thu_num = (today.weekday() + 4) % 7
last_sat = today - timedelta(last_sat_num)
last_thu = today - timedelta(last_thu_num)
last_sat_str = last_sat.strftime('%Y-%m-%d')
last_thu_str = last_thu.strftime('%Y-%m-%d')
last_available_confirmed_day_Bso_sat = 51
last_available_confirmed_day_Bso_thu = 54
df_Bso2 = df_Bso.copy()
df_Bso2.loc[df_Bso2['date'] == last_sat_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_sat
df_Bso2.loc[df_Bso2['date'] == last_thu_str, ['last_available_confirmed']] = last_available_confirmed_day_Bso_thu
df_Bso2[['date', 'last_available_confirmed']].head(10)
Output
date last_available_confirmed
0 2020-07-15 44
1 2020-07-14 43
2 2020-07-13 40
3 2020-07-12 40
4 2020-07-11 51
5 2020-07-10 39
6 2020-07-09 36
7 2020-07-08 36
8 2020-07-07 27
9 2020-07-06 27

How to make "Market Depth" chart in Plotly/Python?

I'm currently trying to make a market depth chart like the one on BitMEX here (not enough rep to post images yet): https://user-images.githubusercontent.com/53675680/85238644-5311e180-b3fd-11ea-9865-94e3451f335c.png
import pandas as pd
from bitmex_websocket import BitMEXWebsocket
ws = BitMEXWebsocket(endpoint="https://testnet.bitmex.com/api/v1", symbol='XBTUSD', api_key=None, api_secret=None)
df = pd.DataFrame(ws.market_depth())
df.head()
symbol id side size price
0 XBTUSD 15500000000 Sell 1003 1000000.0
1 XBTUSD 15500000100 Sell 100001 999999.0
2 XBTUSD 15502119900 Sell 5000 978801.0
3 XBTUSD 15504648350 Sell 2191000 953516.5
4 XBTUSD 15515440800 Sell 300 845592.0
# ommitted fuction that gets current bitcoin price for sake of brevity
def get_current_price(symbol):
pass
current_price = request_history(symbol='XBTUSD', interval_mins=1, load_periods=1)['close'].iloc[0]
print(current_price)
Output: 9301.5
lower_bound = current_price * 0.99
upper_bound = current_price * 1.01
df = df.loc[(df.price > lower_bound) & (df.price < upper_bound)]
df.plot(kind='line', x='price', y='size')
My graph looks like this: https://user-images.githubusercontent.com/53675680/85238728-f8c55080-b3fd-11ea-95b3-1b35c3b958f6.png
I think the first step would be bucketing the prices and then displaying them as a bar chart but I'm not quite sure. Any ideas on how I would go about creating a graph that looks more like the one on BitMEX, preferably using plotly? Thanks.
EDIT
I've made some progress but still not close to matching what's on BitMex.
bins=[n for n in range(int(lower_bound), int(upper_bound), 10)]
lower_df = df.loc[df.price < current_price].price
upper_df = df.loc[df.price > current_price].price
plt.hist([lower_df, upper_df], bins=bins, edgecolor="k", color=['red', 'green'])
plt.xticks(bins[::5]) # display every 5th tick on x-axis
plt.show()
Image of Output: https://user-images.githubusercontent.com/53675680/85241059-7d1cd100-b408-11ea-8b2b-f39997767419.png
groups = df.groupby([pd.cut(df.price, bins)])['size'].sum()
groups.plot(kind='bar')
Image of Output: https://user-images.githubusercontent.com/53675680/85241076-902fa100-b408-11ea-941d-226927acb946.png
I coded a library to get Binance data to show you: https://github.com/nand0san/binpan_studio
Getting some data
import binpan
data = binpan.Symbol(symbol='btcusdt',
tick_interval='5m',
time_zone='UTC')
data.get_orderbook()
Price Quantity Side
0 21068.06 0.00267 ask
1 21068.03 0.00387 ask
2 21068.00 0.00125 ask
3 21067.86 0.00827 ask
4 21067.59 0.01355 ask
... ... ... ...
9995 19395.19 0.01007 bid
9996 19395.10 0.00729 bid
9997 19395.06 0.00112 bid
9998 19394.98 0.00688 bid
9999 19394.80 0.00241 bid
And plotting
data.plot_orderbook()
The function used with plotly is:
def orderbook_depth(df: pd.DataFrame,
accumulated=True,
title='Depth orderbook plot',
height=500,
plot_y="Quantity",
**kwargs):
ob = df.copy(deep=True)
if accumulated:
c_asks = ob[ob['Side'] == 'ask']['Quantity'][::-1].cumsum()
c_bids = ob[ob['Side'] == 'bid']['Quantity'].cumsum()
cumulated = pd.concat([c_asks, c_bids])
ob.loc[:, 'Accumulated'] = cumulated
plot_y = 'Accumulated'
fig = px.line(ob, x="Price", y=plot_y, color='Side', height=height, title=title, **kwargs)
fig.show()

Python: Logic to to pick a variable for a number

I'm creating a smart home system in domoticz and I'm having some issues. So basically I have 32 relays for connections (ON, OFF) and each of one is displayed as a switch (In domoticz). Every switch has his own idx number for API control (turn on/off the switch with requests). My task is to get the status of my relays, for example I can only get 1,0 these are on/off. 1 is equal to on, 0 is equal to off. So I built a little logic to get the relay status, but then I realized that the idx numbers are unique to every device I have, and they are mixed. So I built some type of define, but I don't know how to work with it.
These are my definitions for every relay. This is only an example, these are not my real idx's from domoticz.
# Relays
RELAY_1_IDX = 1
RELAY_2_IDX = 2
RELAY_3_IDX = 3
RELAY_4_IDX = 4
RELAY_5_IDX = 5
RELAY_6_IDX = 6
RELAY_7_IDX = 7
RELAY_8_IDX = 8
RELAY_9_IDX = 9
RELAY_10_IDX = 10
RELAY_11_IDX = 11
RELAY_12_IDX = 12
RELAY_13_IDX = 13
RELAY_14_IDX = 14
RELAY_15_IDX = 15
RELAY_16_IDX = 16
RELAY_17_IDX = 17
RELAY_18_IDX = 18
RELAY_19_IDX = 19
RELAY_20_IDX = 20
RELAY_21_IDX = 21
RELAY_22_IDX = 22
RELAY_23_IDX = 23
RELAY_24_IDX = 24
RELAY_25_IDX = 25
RELAY_26_IDX = 26
RELAY_27_IDX = 27
RELAY_28_IDX = 28
RELAY_29_IDX = 29
RELAY_30_IDX = 30
RELAY_31_IDX = 31
RELAY_32_IDX = 32
# Inputs
INPUT_1_IDX = 50
INPUT_2_IDX = 51
INPUT_3_IDX = 52
INPUT_4_IDX = 53
INPUT_5_IDX = 54
INPUT_6_IDX = 55
This is my code without any logic to give the number a definition.
# Change 32 Relays statuses to real-time in domoticz.
for relay in range(1, 32):
RELAY_INPUT = f"RELAY-READ-255,{relay}"
srvsock.sendto(RELAY_INPUT.encode(), (edsIP, edsPORT))
_relay_status_ = srvsock.recv(4096).decode('utf-8').replace(f'RELAY-READ-255,{relay},','').replace(',OK', '')
_set_ = Domoticz.set_value(relay, int(_relay_status_))
# _relay_status_ is an integer, available - 1,0 (1-on, 0-off)
# ... Logic to find idx number for the current relay which is our for bool....
if _set_ is False: print(f'* Note: Request for relay {relay} was unsuccessful.')
time.sleep(0.15)

Generate a numbers starting from 1 and increasing until the end of the year, then repeat in the next year

I have a model named 'post' , and I want this model to have a field named "id_field" which will take automatic value start from 1 and increasing with every post saved until the end of the year, then start again from 1 in the next year and so on.
for example in 2020:
post_1 --> id_field = 1
post_2 --> id_field = 2
post_3 --> id_field = 3
.
.
.
post_n --> id_field = n
then when 2021 came:
post_n+1 --> id_field = 1
post_n+2 --> id_field = 2
post_n+3 --> id_field = 3
Sorry for my bad english and I hope I explained the problem correctly, thanks.
import datetime
counter =0 #take this value from the database
this_year= 2019 #you can get this value from the database if you want
def new_post():
global counter
counter+=1
# your code
if datetime.datetime.now().year > this_year:
this_year=datetime.datetime.now().year
counter=0

Categories

Resources