Checking HTTP Status (Python) - python

Is there a way to check the HTTP Status Code in the code below, as I have not used the request or urllib libraries which would allow for this.
from pandas.io.excel import read_excel
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
#check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8) #Creates the dataframes
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
#Providing correct names
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)

In pandas:
read_excel try to use urllib2.urlopen(urllib.request.urlopen instead in py3x) to open the url and get .read() of response immediately without store the http request like:
data = urlopen(url).read()
Though you need only part of the excel, pandas will download the whole excel each time. So, I voted #jonnybazookatone.
It's better to store the excel to your local, then you can check the status code and md5 of file first to verify data integrity or others.

Related

DataFrame returns Value Error after adding auto index

This script needs to query the DC server for events. Since this is done live, each time the server is queried, it returns query results of varying lengths. The log file is long and messy, as most logs are. I need to filter only the event names and their codes and then create a DataFrame. Additionally, I need to add a third column that counts the number of times each event took place. I've done most of it but can't figure out how to fix the error I'm getting.
After doing all the filtering from Elasticsearch, I get two lists - action and code - which I have emulated here.
action_list = ['logged-out', 'logged-out', 'logged-out', 'Directory Service Access', 'Directory Service Access', 'Directory Service Access', 'logged-out', 'logged-out', 'Directory Service Access', 'created-process', 'created-process']
code_list = ['4634', '4634', '4634', '4662', '4662', '4662', '4634', '4634', '4662','4688']
I then created a list that contains only the codes that need to be filtered out.
event_code_list = ['4662', '4688']
My script is as follows:
import pandas as pd
from collections import Counter
#Create a dict that combines action and code
lists2dict = {}
lists2dict = dict(zip(action_list,code_list))
# print(lists2dict)
#Filter only wanted eventss
filtered_events = {k: v for k, v in lists2dict.items() if v in event_code_list}
# print(filtered_events)
index = 1 * pd.RangeIndex(start=1, stop=2) #add automatic index to DataFrame
df = pd.DataFrame(filtered_events,index=index)#Create DataFrame from filtered events
#Create Auto Index
count = Counter(df)
action_count = dict(Counter(count))
action_count_values = action_count.values()
# print(action_count_values)
#Convert Columns to Rows and Add Index
new_df = df.melt(var_name="Event",value_name="Code")
new_df['Count'] = action_count_values
print(new_df)
Up until this point, everything works as it should. The problem is what comes next. If there are no events, the script outputs an empty DataFrame. This works fine. However, if there are events, then we should see the events, the codes, and the number of times each event occurred. The problem is that it always outputs 1. How can I fix this? I'm sure it's something ridiculous that I'm missing.
#If no alerts, create empty DataFrame
if new_df.empty:
empty_df = pd.DataFrame(columns=['Event','Code','Count'])
empty_df['Event'] = ['-']
empty_df['Code'] = ['-']
empty_df['Count'] = ['-']
empty_df.to_html()
html = empty_df.to_html()
with open('alerts.html', 'w') as f:
f.write(html)
else: #else, output alerts + codes + count
new_df.to_html()
html = new_df.to_html()
with open('alerts.html', 'w') as f:
f.write(html)
Any help is appreciated.
It is because you are collecting the result as dictionary - the repeated records are ignored. You lost the record count here: lists2dict = dict(zip(action_list,code_list)).
You can do all these operations very easily on dataframe. Just construct a pandas dataframe from given lists, then filter by code, groupby, and aggregate as count:
df = pd.DataFrame({"Event": action_list, "Code": code_list})
df = df[df.Code.isin(event_code_list)] \
.groupby(["Event", "Code"]) \
.agg(Count = ("Code", len)) \
.reset_index()
print(df)
Output:
Event Code Count
0 Directory Service Access 4662 4
1 created-process 4688 2

Python Script it not showing more than 1 page of results of Shopify Orders

I’m having some hard time trying to make this code shows more than 1 page of orders.
I already tried different methods, such as loops and also the one below (which is just a workaround) where I tried to get the page 2.
I just need it to brings me all the orders generated in a specific day - but I got completely stuck.
import requests
import pandas as pd
from datetime import datetime, timedelta
# Set the API token for the Shopify API
api_token = 'MYTOKEN'
# Get the current date and subtract one day
today = datetime.now()
yesterday = today - timedelta(days=1)
# Format the date strings for the API request
start_date = yesterday.strftime('%Y-%m-%dT00:00:00Z')
end_date = yesterday.strftime('%Y-%m-%dT23:59:59Z')
# Set the initial limit to 1
limit = 1
page_info = 2
# Set the initial URL for the API endpoint you want to access, including the limit and date range parameters
url = f'https://MYSTORE.myshopify.com/admin/api/2020-04/orders.json?page_info=%7Bpage_info%7D&limit=%7Blimit%7D&created_at_min=%7Bstart_date%7D&created_at_max=%7Bend_date%7D&'
# Set the API token as a header for the request
headers = {'X-Shopify-Access-Token': api_token}
# Make the GET request
response = requests.get(url, headers=headers)
# Check the status code of the response
if response.status_code == 200:
# Parse the JSON response directly
orders = response.json()['orders']
# Flatten the JSON response into a Pandas DataFrame, including the 'name' column (order number) and renaming the 'id' column to 'order_id'
df = pd.json_normalize(orders, sep='_', record_path='line_items', meta=['name', 'id'], meta_prefix='meta_')
# Flatten the line_items data into a separate DataFrame
line_items_df = pd.json_normalize(orders, 'line_items', ['id'], meta_prefix='line_item_')
# Flatten the 'orders' data into a separate | Added in Dec.26-2022
orders_df = pd.json_normalize(orders, sep='_', record_path='line_items', meta=['created_at', 'id'], meta_prefix='ordersDTbs_')
# Merge the 'df' and 'orders_df' DataFrames | Added in Dec.26-2022
df = pd.merge(df, orders_df[['id', 'ordersDTbs_created_at']], on='id')
# Converting create_at date to DATE only | Added in Dec.26-2022
df['ordersDTbs_created_at'] = df['ordersDTbs_created_at'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S%z').date())
# Concatenate the two dataframes
df = pd.merge(df, line_items_df[['id', 'sku', 'quantity']], on='id')
# Calculate the discount amount and add it as a new column in the dataframe
df['price_set_shop_money_amount'] = pd.to_numeric(df['price_set_shop_money_amount'])
df['total_discount_set_shop_money_amount'] = pd.to_numeric(df['total_discount_set_shop_money_amount'])
df = df.assign(paid_afterdiscount=df['price_set_shop_money_amount'] - df['total_discount_set_shop_money_amount'])
# Print the DataFrame
print(df[['meta_name','ordersDTbs_created_at','sku_y','title','fulfillable_quantity','quantity_x','quantity_y','paid_afterdiscount']])
#Checking if API ran smoothly
else:
print('Something went wrong.')
I already tried different methods, such as loops and also the one below (which is just a workaround) where I tried to get the page 2.

Making the python version of a SAS macro/call

I'm trying to create in Python what a macro does in SAS. I have a list of over 1K tickers that I'm trying to download information for but doing all of them in one step made python crash so I split up the data into 11 portions. Below is the code we're working with:
t0=t.time()
printcounter=0
for ticker in tickers1:
printcounter+=1
print(printcounter)
try:
selected = yf.Ticker(ticker)
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info_1 = company_info_1.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info_1 = company_info_1.append(comb)
print("total run time:", round(t.time()-t0,3),"s")
What I'd like to do is instead of re-writing and running this code for all 11 portions of data and manually changing "tickers1" and "company_info_1" to "tickers2" "company_info_2" "tickers3" "company_info_3" (and so on)... I'd like to see if there is a way to make a python version of a SAS macro/call so that I can get this data more dynamically. Is there a way to do this in python?
You need to generalize your existing code and wrap it in a function.
def comany_info(tickers):
for ticker in tickers:
try:
selected = yf.Ticker(ticker) # you may also have to pass the yf object
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info = company_info.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info = company_info.append(comb)
return company_info # return the dataframe
Create a master dataframe to collect your results from the function call. Loop over the 11 groups of tickers passing each group into your function. Append the results to your master.
# master df to collect results
master = pd.DataFrame()
# assuming you have your tickers in a list of lists
# loop over each of the 11 groups of tickers
for tickers in groups_of_tickers:
df = company_info(tickers) # fetch data from Yahoo Finance
master = master.append(df))
Please note I typed this on the fly. I have no way of testing this. I'm quite sure there are syntactical issues to work through. Hopefully it provides a framework for how to think about the solution.

API Request to Array

I'm using a Crypto API that gives me the Time, Open, High, Low, Close Values of the last weeks back. I just need the first row.
The input:
[[1635260400000, 53744.5, 53744.5, 53430.71, 53430.71], [1635262200000, 53635.49, 53899.73, 53635.49, 53899.73], [1635264000000, 53850.63, 54258.62, 53779.11, 54242.25], [1635265800000, 54264.32, 54264.32, 53909.02, 54003.42]]
I've tried:
resp = pd.read_csv('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(str)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But this doesn't work as I can't process it(i wanted to process it from object to str to double or float). I want to have each value as a double in a different variable. Im kinda stuck at this.
The problem with using pandas is that the JSON array creates one row with several columns.
If you expect to just loop over the JSON array, I suggest using requests rather than pandas.
import requests
resp = requests.get('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
for row in resp.json():
timestamp, open_price, high, low, close = row
...
You just need to use read_json:
resp = pd.read_json('https://api.coingecko.com/api/v3/coins/bitcoin/ohlc?vs_currency=eur&days=1')
resp = resp.astype(float)
Time = resp[resp.columns[0]]
Open = resp[resp.columns[1]]
High = resp[resp.columns[2]]
Low = resp[resp.columns[3]]
Close = resp[resp.columns[4]]
But the previous solution is more compact and understandable.

Python Loop Addition

No matter what I do I don't seem to be able to add all the base volumes and quote volumes together easily! I want to end up with a total base volume and a total quote volume of all the data in the data frame. Can someone help me on how you can do this easily?
I have tried summing and saving the data in a dictionary first and then adding it but I just don't seem to be able to make this work!
import urllib
import pandas as pd
import json
def call_data(): # Call data from Poloniex
global df
datalink = 'https://poloniex.com/public?command=returnTicker'
df = urllib.request.urlopen(datalink)
df = df.read().decode('utf-8')
df = json.loads(df)
global current_eth_price
for k, v in df.items():
if 'ETH' in k:
if 'USDT_ETH' in k:
current_eth_price = round(float(v['last']),2)
print("Current ETH Price $:",current_eth_price)
def calc_volumes(): # Calculate the base & quote volumes
global volume_totals
for k, v in df.items():
if 'ETH' in k:
basevolume = float(v['baseVolume'])*current_eth_price
quotevolume = float(v['quoteVolume'])*float(v['last'])*current_eth_price
if quotevolume > 0:
percentages = (quotevolume - basevolume) / basevolume * 100
volume_totals = {'key':[k],
'basevolume':[basevolume],
'quotevolume':[quotevolume],
'percentages':[percentages]}
print("volume totals:",volume_totals)
print("#"*8)
call_data()
calc_volumes()
A few notes:
For the next 2 years don't use the keyword globals for anything.
put function documentation under the function in quotes
using the requests library will be much easier than urllib. However ...
pandas can fetch the JSON and parse it all in one step
ok it doesn't have to be as split up as this, I'm just showing you how to properly pass variables around instead of globals.
I could not find "ETH" by itself. In the data they sent they have these 3 ['BTC_ETH', 'USDT_ETH', 'USDC_ETH']. So I used "USDT_ETH" I hope the substitution is ok.
calc_volumes is seeming to do the calculation and being some sort of filter (it's picky as to what it prints). This function needs to be broken up in to it's two separate jobs. printing and calculating. (maybe there was a filter step but I leave that for homework)
.
import pandas as pd
eth_price_url = 'https://poloniex.com/public?command=returnTicker'
def get_data(url=''):
""" Call data from Poloniex and put it in a dataframe"""
data = pd.read_json(url)
return data
def get_current_eth_price(data = None):
""" grab the price out of the dataframe """
current_eth_price = data['USDT_ETH']['last'].round(2)
return current_eth_price
def calc_volumes(data=None, current_eth_price=None):
""" Calculate the base & quote volumes """
data = df[df.columns[df.columns.str.contains('ETH')]].loc[['baseVolume', 'quoteVolume', 'last']]
data = data.transpose()
data[['baseVolume','quoteVolume']]*= current_eth_price
data['quoteVolume']*=data['last']
data['percentages']=(data['quoteVolume'] - data['baseVolume']) / data['quoteVolume'] * 100
return data
df = get_data(url = eth_price_url)
the_price = get_current_eth_price(data = df)
print(f'the current eth price is: {the_price}')
volumes = calc_volumes(data=df, current_eth_price=the_price)
print(volumes)
This code seems kind of odd and inconsistent... for example, you're importing pandas and calling your variable df but you're not actually using dataframes. If you used df = pd.read_json('https://poloniex.com/public?command=returnTicker', 'index')* to get a dataframe, most of your data manipulation here would become much easier, and wouldn't require any loops either.
For example, the first function's code would become as simple as current_eth_price = df.loc['USDT_ETH','last'].
The second function's code would basically be
eth_rows = df[df.index.str.contains('ETH')]
total_base_volume = (eth_rows.baseVolume * current_eth_price).sum()
total_quote_volume = (eth_rows.quoteVolume * eth_rows['last'] * current_eth_price).sum()
(*The 'index' argument tells pandas to read the JSON dictionary indexed by rows, then columns, rather than columns, then rows.)

Categories

Resources