Scrape current stock price and create data frame - python

I have a pandas list called Symbols with 30 ticker symbols for stock e.g., Apple ->> AAPL, and I would like to grab the current stock price for each ticker and populate a data frame with this info. Two columns: the first with ticker symbols and the second with current price. I continue getting the following error message when I run this part of my script:
"ValueError: If using all scalar values, you must pass an index"
Stock = []
Price = []
df_temp = []
for symbol in Symbols:
try:
params = {
'symbols': symbol,
'range': '1d',
'interval': '1d',
'indicators': 'close',
'includeTimestamps': 'false',
'includePrePost': 'false',
'corsDomain': 'finance.yahoo.com',
'.tsrc': 'finance'}
url = 'https://query1.finance.yahoo.com/v7/finance/spark'
r = requests.get(url, params=params)
data = r.json()
df_stock = pd.DataFrame({'Ticker' : symbol,
'Current Price' : data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]
})
df_temp.append(df_stock)
df_temp = pd.concat(df_temp, axis = 1)
except KeyError:
continue

Need to change only one part -
df_stock = pd.DataFrame({'Ticker' : [symbol],
'Current Price' : [data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]]
})
Output
Ticker Current Price
0 AAPL 118.119
Full Code
Stock = []
Price = []
df_temp = pd.DataFrame()
for symbol in ['AAPL', 'IBM', 'NKE', 'FB']:
try:
params = {
'symbols': symbol,
'range': '1d',
'interval': '1d',
'indicators': 'close',
'includeTimestamps': 'false',
'includePrePost': 'false',
'corsDomain': 'finance.yahoo.com',
'.tsrc': 'finance'}
url = 'https://query1.finance.yahoo.com/v7/finance/spark'
r = requests.get(url, params=params)
data = r.json()
df_stock = pd.DataFrame({'Ticker' : [symbol],
'Current Price' : [data['spark']['result'][0]['response'][0]['indicators']['quote'][0]['close'][0]]
})
df_temp = df_temp.append(df_stock)
except KeyError:
continue
Explanation
What you were passing as values to the df_stock were scalar values, wrapped in a list solves it.

Related

How to extract a couple of fields nested in response using python

I'm a python beginner. I would like to ask for help regarding the retrieve the response data. Here's my script:
import pandas as pd
import re
import time
import requests as re
import json
response = re.get(url, headers=headers, auth=auth)
data = response.json()
Here's a part of json response:
{'result': [{'display': '',
'closure_code': '',
'service_offer': 'Integration Platforms',
'updated_on': '2022-04-23 09:05:53',
'urgency': '2',
'business_service': 'Operations',
'updated_by': 'serviceaccount45',
'description': 'ALERT returned 400 but expected 200',
'sys_created_on': '2022-04-23 09:05:53',
'sys_created_by': 'serviceaccount45',
'subcategory': 'Integration',
'contact_type': 'Email',
'problem_type': 'Design: Availability',
'caller_id': '',
'action': 'create',
'company': 'aaaa',
'priority': '3',
'status': '1',
'opened': 'smith.j',
'assigned_to': 'doe.j',
'number': '123456',
'group': 'blabla',
'impact': '2',
'category': 'Business Application & Databases',
'caused_by_change': '',
'location': 'All Locations',
'configuration_item': 'Monitor',
},
I would like to extract the data only for one group = 'blablabla'. Then I would like to extract fields such as:
number = data['number']
group = data['group']
service_offer = data['service_offer']
updated = data['updated_on']
urgency = data['urgency']
username = data['created_by']
short_desc = data['description']
How it should be done?
I know that to check the first value I should use:
service_offer = data['result'][0]['service_offer']
I've tried to create a dictionary, but, I'm getting an error:
data_result = response.json()['result']
payload ={
number = data_result['number']
group = data_result['group']
service_offer = data_result['service_offer']
updated = data_result['updated_on']
urgency = data_result['urgency']
username = data_result['created_by']
short_desc = data_result['description']
}
TypeError: list indices must be integers or slices, not str:
So, I've started to create something like below., but I'm stuck:
get_data = []
if len(data) > 0:
for item in range(len(data)):
get_data.append(data[item])
May I ask for help?
If data is your decoded json response from the question then you can do:
# find group `blabla` in result:
g = next(d for d in data["result"] if d["group"] == "blabla")
# get data from the `blabla` group:
number = g["number"]
group = g["group"]
service_offer = g["service_offer"]
updated = g["updated_on"]
urgency = g["urgency"]
username = g["sys_created_by"]
short_desc = g["description"]
print(number, group, service_offer, updated, urgency, username, short_desc)
Prints:
123456 blabla Integration Platforms 2022-04-23 09:05:53 2 serviceaccount45 ALERT returned 400 but expected 200

.diff() function is only returning NaN values in pandas data frame

I want to use the .diff() function on the log_price column in my for loops. What I am after is the old log price value - the new log price value from the df_DC_product data frame. When I try to use .diff() inside the for loops it only returns NaN values. Any thoughts why this might be happening? Thank you for your help.
DC_list = data4['Geography'].drop_duplicates().tolist()
Product_List = data4['Product'].drop_duplicates().tolist()
# create multiple empty lists to store values in:
my_dict = {
"Product" : [],
"Geography" : [],
"Base Dollar Sales": [],
"Base Unit Sales" :[],
"Price Numerator" : [],
"Price Denominator": [],
"Demand Numerator" : [],
"Demand Denominator" : [],
"% Change in Price" : [],
"% Change in Demand": [],
"Price Elasticity of Demand" : []
}
dc_product_ped_with_metrics_all = []
for DC in DC_list:
df_DC = data4.copy()
# # Filtering to the loop's current DC
df_DC = df_DC.loc[(df_DC['Geography'] == DC)]
df_DC = df_DC.copy()
# Making a list of all of the current DC's Product to loop through
Product_list = df_DC['Product'].drop_duplicates().tolist()
for Product in Product_list:
df_DC_product = df_DC.copy()
# # Filtering to the Product
df_DC_product = df_DC_product.loc[(df_DC_product['Product'] == Product)]
df_DC_product = df_DC_product.copy()
# create container:
df_DC_product['pn'] = df_DC_product.iloc[:,5].diff()
df_DC_product['price_d'] = np.divide(df_DC_product.iloc[:,5].cumsum(),2)
df_DC_product['dn'] = df_DC_product.iloc[:,6].diff()
df_DC_product['dd'] = np.divide(df_DC_product.iloc[:,6].cumsum(),2)
df_DC_product['% Change in Demand'] = np.divide(df_DC_product['dn'],df_DC_product['dd'])*100
df_DC_product['% Change in Price'] = np.divide(df_DC_product['pn'],df_DC_product['price_d'])*100
df_DC_product['ped']= np.divide(df_DC_product['% Change in Demand'], df_DC_product['% Change in Price'])
Product = Product,
DC = DC
sales = df_DC_product['Base_Dollar_Sales'].sum()
qty = df_DC_product['Base_Unit_Sales'].sum()
price = df_DC_product['Price'].mean()
log_price = df_DC_product['log_price'].mean()
log_units = df_DC_product['log_units'].sum()
price_numerator = df_DC_product['pn'].mean()
price_denominator = df_DC_product['price_d'].sum()
demand_numerator = df_DC_product['dn'].mean()
demand_denominator = df_DC_product['dd'].sum()
delta_demand = df_DC_product['% Change in Demand'].sum()
delta_price = df_DC_product['% Change in Price'].mean()
ped = df_DC_product['ped'].mean()
dc_product_ped_with_metrics = [
Product,
DC,
sales,
qty,
price,
price_numerator,
price_denominator,
demand_numerator,
demand_denominator,
delta_demand,
delta_price,
ped
]
dc_product_ped_with_metrics_all.append(dc_product_ped_with_metrics)
columns = [
'Product',
'Geography',
'Sales',
'Qty',
'Price',
'Price Numerator',
'Price Denominator',
'Demand Numerator',
'Demand Denominator',
'% Change in Demand',
'% Change in Price',
'Price Elasticity of Demand'
]
dc_product_ped_with_metrics_all = pd.DataFrame(data=dc_product_ped_with_metrics_all, columns=columns)
dc_product_ped_with_metrics_all
.append() doesn't update your dataframe inplace. You need to reassign the dataframe.
for DC in DC_list:
# your code
for Product in Product_list:
# your code
dc_product_ped_with_metrics_all = dc_product_ped_with_metrics_all.append(dc_product_ped_with_metrics)

Check two excel files for common products with Python Pandas and pick the product with the lowest price

I have two excel files from two different wholesalers with products and stock quantity information.
Some of the products in the two files are common, so they exist in both files.
The number of products in the files is different e.g. the first has 65000 products and the second has 9000 products.
I need to iterate through the products of the first file based on the common column 'EAN CODE' and check if the current product exists also in the EAN column of the 2nd file.
Afterwards check which product has the lower price (which has stock > 0) and print the matching row of this product to another output excel file.
import os
import re
from datetime import datetime
import pandas
from utils import recognize_excel_type
dataframes = []
input_directory = 'in'
for file in os.listdir(input_directory):
file_path = os.path.join(input_directory, file)
if file.lower().endswith('xlsx') or file.lower().endswith('xls'):
dataframes.append(
pandas.read_excel(file_path)
)
elif file.lower().endswith('csv'):
dataframes.append(
pandas.read_csv(file_path, delimiter=';')
)
combined_dataframe = pandas.DataFrame(columns=['Price', 'Stock', 'EAN Code'])
for dataframe in dataframes:
this_type = recognize_excel_type(dataframe)
if this_type == 'DIFOX':
dataframe.rename(columns={
'retail price': 'Price',
'availability (steps)': 'Stock',
'EAN number 1': 'EAN Code',
}, inplace=True)
tuned_dataframe = pandas.DataFrame(
dataframe[combined_dataframe.columns],
)
combined_dataframe = combined_dataframe.append(tuned_dataframe, ignore_index=True)
elif this_type == 'ECOM_VGA':
headers = dataframe.iloc[2]
dataframe = dataframe[3:]
dataframe.columns = headers
dataframe.rename(columns={
'Price (€)': 'Price',
'Stock': 'Stock',
'EAN Code': 'EAN Code',
}, inplace=True)
tuned_dataframe = pandas.DataFrame(
dataframe[combined_dataframe.columns],
)
combined_dataframe = combined_dataframe.append(tuned_dataframe, ignore_index=True)
elif this_type == 'MAXCOM':
dataframe.rename(columns={
'VK-Preis': 'Price',
'Verfügbar': 'Stock',
'EAN-Code': 'EAN Code',
}, inplace=True)
tuned_dataframe = pandas.DataFrame(
dataframe[combined_dataframe.columns],
)
combined_dataframe = combined_dataframe.append(tuned_dataframe, ignore_index=True)
combined_dataframe.dropna(inplace=True)
combined_dataframe['Stock'].replace('> ?', '', inplace=True, regex=True)
combined_dataframe['Price'].replace('> ?', '', inplace=True, regex=True)
combined_dataframe = combined_dataframe.astype(
{'Stock': 'int32', 'Price': 'float32'}
)
combined_dataframe = combined_dataframe[combined_dataframe['Stock'] > 0]
combined_dataframe = combined_dataframe.loc[combined_dataframe.groupby('EAN Code')['Price'].idxmin()]
combined_dataframe.to_excel('output_backup/output-{}.xlsx'.format(datetime.now().strftime('%Y-%m-%d')), index=False)
if os.path.exists('output/output.xlsx'):
os.remove("output/output.xlsx")
combined_dataframe.to_excel('output/output.xlsx'.format(datetime.now().strftime('%Y-%m-%d')), index=False)
print('Output saved to output directory')
for file in os.listdir(input_directory):
file_path = os.path.join(input_directory, file)
os.remove(file_path)
print('All input files removed')

How to write to excel sheet only those rows which match the condition using Python pandas

I have a data frame which contains 3 columns(Issue id, Creator, Versions).I need to extract the row which does not contain the value "<JIRA Version" in the "versions" column(Which is the third and fifth row in my case.Similarly there could be multiple rows in the data frame)
Below is the code i'm trying, but this is actually printing all the rows from the data frame. Any help/suggestions are appreciated.
allissues = []
for i in issues:
d = {
'Issue id': i.id,
'creator' : i.fields.creator,
'resolution': i.fields.resolution,
'status.name': i.fields.status.name,
'versions': i.fields.versions,
}
allissues.append(d)
df = pd.DataFrame(allissues, columns=['Issue id', 'creator', 'versions'])
matchers = ['<JIRA Version']
for ind in df.values:
if matchers not in df.values:
print(df['versions'][ind], df['Issue id'][ind])
some minor changes in your code:
allissues = []
for i in issues:
d = {
'Issue id': i.id,
'creator' : i.fields.creator,
'resolution': i.fields.resolution,
'status.name': i.fields.status.name,
'versions': i.fields.versions,
}
allissues.append(d)
df = pd.DataFrame(allissues, columns=['Issue id', 'creator', 'versions'])
matchers = '<JIRA Version'
for ind,row in df.iterrows():
if matchers not in row.versions:
print(row['versions'], row['Issue id'])

Python Exec not passing full variables to exec shell - with working errors

Python "Exec" command is not passing local values in exec shell. I thought this should be a simple question but all seem stumped. Here is a repeatable working version of the problem ... it took me a bit to recreate a working problem (my files are much larger than examples shown here, there are up to 10-dfs per loop, often 1800 items per df )
EXEC was only passing "PRODUCT" (as opposed to "PRODUCT.AREA" before I added "["{ind_id}"]" and then also it also shows an error "<string> in <module>".
datum_0 = {'Products': ['Stocks', 'Bonds', 'Notes'],'PRODUCT.AREA': ['10200', '50291','50988']}
df_0 = pd.DataFrame (datum_0, columns = ['Products','PRODUCT.AREA'])
datum_1 = {'Products': ['Stocks', 'Bonds', 'Notes'],'PRODUCT.CODE': ['66', '55','22']}
df_1 = pd.DataFrame (datum_1, columns = ['Products','PRODUCT.CODE'])
df_0
summary = {'Prodinfo': ['PRODUCT.AREA', 'PRODUCT.CODE']}
df_list= pd.DataFrame (summary, columns = ['Prodinfo'])
df_list
# Create a rankings column for the Prodinfo tables
for rows in df_list.itertuples():
row = rows.Index
ind_id = df_list.loc[row]['Prodinfo']
print(row, ind_id)
exec(f'df_{row}["rank"] = df_{row}["{ind_id}"].rank(ascending=True) ')
Of course its this last line that is throwing exec errors. Any ideas? Have you got a working global or local variable assignment that fixes it? etc... thanks!
I would use list to keep all DataFrames
all_df = [] # list
all_df.append(df_1)
all_df.append(df_2)
and then I would no need exec
for rows in df_list.itertuples():
row = rows.Index
ind_id = df_list.loc[row]['Prodinfo']
print(row, ind_id)
all_df[row]["rank"] = all_df[row][ind_id].rank(ascending=True)
Eventually I would use dictionary
all_df = {} # dict
all_df['PRODUCT.AREA'] = df_1
all_df['PRODUCT.CODE'] = df_2
and then I don't need exec and df_list
for key, df in all_df.items():
df["rank"] = df[key].rank(ascending=True)
Minimal working code with list
import pandas as pd
all_df = [] # list
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.AREA': ['10200', '50291', '50988']
}
all_df.append( pd.DataFrame(datum) )
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.CODE': ['66', '55', '22']
}
all_df.append( pd.DataFrame(datum) )
#print( all_df[0] )
#print( all_df[1] )
print('--- before ---')
for df in all_df:
print(df)
summary = {'Prodinfo': ['PRODUCT.AREA', 'PRODUCT.CODE']}
df_list = pd.DataFrame(summary, columns=['Prodinfo'])
#print(df_list)
for rows in df_list.itertuples():
row = rows.Index
ind_id = df_list.loc[row]['Prodinfo']
#print(row, ind_id)
all_df[row]["rank"] = all_df[row][ind_id].rank(ascending=True)
print('--- after ---')
for df in all_df:
print(df)
Minimal working code with dict
import pandas as pd
all_df = {} # dict
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.AREA': ['10200', '50291', '50988']
}
all_df['PRODUCT.AREA'] = pd.DataFrame(datum)
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.CODE': ['66', '55', '22']
}
all_df['PRODUCT.CODE'] = pd.DataFrame (datum)
print('--- before ---')
for df in all_df.values():
print(df)
for key, df in all_df.items():
df["rank"] = df[key].rank(ascending=True)
print('--- after ---')
for df in all_df.values():
print(df)
Frankly, for two dataframes I wouldn't waste time for df_list and for-loop
import pandas as pd
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.AREA': ['10200', '50291', '50988']
}
df_0 = pd.DataFrame(datum)
datum = {
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.CODE': ['66', '55', '22']
}
df_1 = pd.DataFrame(datum)
print('--- before ---')
print( df_0 )
print( df_1 )
df_0["rank"] = df_0['PRODUCT.AREA'].rank(ascending=True)
df_1["rank"] = df_1['PRODUCT.CODE'].rank(ascending=True)
print('--- after ---')
print( df_0 )
print( df_1 )
And probably I would even put all in one dataframe
import pandas as pd
df = pd.DataFrame({
'Products': ['Stocks', 'Bonds', 'Notes'],
'PRODUCT.AREA': ['10200', '50291', '50988'],
'PRODUCT.CODE': ['66', '55', '22'],
})
print('--- before ---')
print( df )
#df["rank PRODUCT.AREA"] = df['PRODUCT.AREA'].rank(ascending=True)
#df["rank PRODUCT.CODE"] = df['PRODUCT.CODE'].rank(ascending=True)
for name in ['PRODUCT.AREA', 'PRODUCT.CODE']:
df[f"rank {name}"] = df[name].rank(ascending=True)
print('--- after ---')
print( df )
Result:
--- before ---
Products PRODUCT.AREA PRODUCT.CODE
0 Stocks 10200 66
1 Bonds 50291 55
2 Notes 50988 22
--- after ---
Products PRODUCT.AREA PRODUCT.CODE rank PRODUCT.AREA rank PRODUCT.CODE
0 Stocks 10200 66 1.0 3.0
1 Bonds 50291 55 2.0 2.0
2 Notes 50988 22 3.0 1.0
As expected, this was an easy fix. Thanks to answerers who gave much to think about ...
Kudos to #holdenweb and his answer at ... Create multiple dataframes in loop
dfnew = {} # CREATE A DICTIONARY!!! - THIS WAS THE TRICK I WAS MISSING
df_ = {}
for rows in df_list.itertuples():
row = rows.Index
ind_id = df_list.loc[row]['Prodinfo']
dfnew[row] = df_[row] # or pd.read_csv(csv_file) or database_query or ...
dfnew[row].dropna(inplace=True)
dfnew[row]["rank"] = dfnew[row][ind_id].rank(ascending=True)
Works well and very simple...

Categories

Resources