Normalizing deeply nested Json data for pandas dataframe

Normalizing deeply nested Json data for pandas dataframe - python

I am new to working with pandas, and I am having difficulty extracting the data for color variations from a series of makeup products.
My goal is to set up a dataframe with all color variations for each product in their own lists.
Something along these lines:
Name
Type
URL
Price
Description
Images
Shades
Hex
product1
lipstick
...
27.00
...
[.,.,.]
[.,.,.]
[.,.]
I am trying to flatten this data, but I keep receiving key errors.
Here is the initial request.
import requests
import pandas as pd
headers = {
'authority': 'ncsa.sdapi.io',
'accept': 'application/json',
'accept-language': 'en-US,en;q=0.9',
'authorizationtoken': 'Mi1tYy11cy1lbi1lY29tbXYxOmh0dHBzOi8vbS5tYWNjb3NtZXRpY3MuY29t',
'business-unit': '2-mc-us-en-ecommv1',
'cache-control': 'no-cache',
'clientid': 'stardust-fe-client',
'content-type': 'application/json',
'origin': 'https://m.maccosmetics.com',
'referer': 'https://m.maccosmetics.com/',
'sec-ch-ua': '"Chromium";v="110", "Not A(Brand";v="24", "Google Chrome";v="110"',
'sec-ch-ua-mobile': '?1',
'sec-ch-ua-platform': '"Android"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'cross-site',
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Mobile Safari/537.36',
}
json_data = {
'query': '{\n products(environment: {prod:true},\n filter: [{tags:{filter:{key:{in:["lipstick"]}},includeInnerHits:false}}],\n sort: [{tags:{product_display_order:ASCENDING}}]\n ) {\n \n ... product__collection \n \n items {\n ... product_default ... product_productSkinType ... product_form ... product_productCoverage ... product_benefit ... product_productReview ... product_skinConcern ... product_usage ... product_productFinish ... product_usageOptions ... product_brushTypes ... product_brushShapes \n skus {\n total\n items {\n ... product__skus_default ... product__skus_autoReplenish ... product__skus_colorFamily ... product__skus_skuLargeImages ... product__skus_skuMediumImages ... product__skus_skuSmallImages ... product__skus_vtoFoundation ... product__skus_vtoMakeup \n }\n }\n }\n \n \n }\n }\n\nfragment product__collection \n on product_collection {\n items {\n product_id\n skus {\n items {\n inventory_status\n sku_id\n }\n }\n }\n }\n\n\nfragment product_default \n on product {\n default_category {\n id\n value\n }\n description\n display_name\n is_hazmat\n meta {\n description\n }\n product_badge\n product_id\n product_url\n short_description\n tags {\n total\n items {\n id\n value\n key\n }\n }\n }\n\n\nfragment product_productSkinType \n on product {\n skin {\n type {\n key\n value\n }\n }\n }\n\n\nfragment product_form \n on product {\n form {\n key\n value\n }\n }\n\n\nfragment product_productCoverage \n on product {\n coverage {\n key\n value\n }\n }\n\n\nfragment product_benefit \n on product {\n benefit {\n benefits {\n key\n value\n }\n }\n }\n\n\nfragment product_productReview \n on product {\n reviews {\n average_rating\n number_of_reviews\n }\n }\n\n\nfragment product_skinConcern \n on product {\n skin {\n concern {\n key\n value\n }\n }\n }\n\n\nfragment product_usage \n on product {\n usage {\n content\n label\n type\n }\n }\n\n\nfragment product_productFinish \n on product {\n finish {\n key\n value\n }\n }\n\n\nfragment product_usageOptions \n on product {\n usage_options {\n key\n value\n }\n }\n\n\nfragment product_brushTypes \n on product {\n brush {\n types {\n key\n value\n }\n }\n }\n\n\nfragment product_brushShapes \n on product {\n brush {\n shapes {\n key\n value\n }\n }\n }\n\n\nfragment product__skus_default \n on product__skus {\n is_default_sku\n is_discountable\n is_giftwrap\n is_under_weight_hazmat\n iln_listing\n iln_version_number\n inventory_status\n material_code\n prices {\n currency\n is_discounted\n include_tax {\n price\n original_price\n price_per_unit\n price_formatted\n original_price_formatted\n price_per_unit_formatted\n }\n }\n sizes {\n value\n key\n }\n shades {\n name\n description\n hex_val\n }\n sku_id\n sku_badge\n unit_size_formatted\n upc\n }\n\n\nfragment product__skus_autoReplenish \n on product__skus {\n is_replenishable\n }\n\n\nfragment product__skus_colorFamily \n on product__skus {\n color_family {\n key\n value\n }\n }\n\n\nfragment product__skus_skuLargeImages \n on product__skus {\n media {\n large {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuMediumImages \n on product__skus {\n media {\n medium {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_skuSmallImages \n on product__skus {\n media {\n small {\n src\n alt\n height\n width\n }\n }\n }\n\n\nfragment product__skus_vtoFoundation \n on product__skus {\n vto {\n is_foundation_experience\n }\n }\n\n\nfragment product__skus_vtoMakeup \n on product__skus {\n vto {\n is_color_experience\n }\n }\n',
'variables': {},
}
response = requests.post(
'https://ncsa.sdapi.io/stardust-prodcat-product-v3/graphql/core/v1/extension/v1',
headers=headers,
json=json_data,
)
All of these values returned as expected
json_object = response.json()
result_items = json_object['data']['products']['items']
result_items[0]['skus']['items'][0]['prices'][0]['include_tax']['price_formatted']
result_items[0]['skus']['items'][0]['shades']
result_items[0]['skus']['items'][0]['media']['large'][0]['src']
result_items[0]['skus']['items'][0]['media']['large'][0]['alt']
result_items[0]['skus']['items'][0]['color_family'][0]['value']
I was able to access the shade names for a single product like so
shade_list = []
def get_shade_names():
items = result_items[0]['skus']['items']
for item in items:
shades = item['shades']
for shade_data in shades:
shade = shade_data['name']
shade_list.append(shade)
get_shade_names()
print(shade_list)
but several attempts at implementing the nested loop for the list of lists has just resulted in a single list or a series of errors.
This is when I pivoted from DataFrame to json_normalize. However, I keep receiving key errors when trying to use record path and meta.
Can someone show me how to proceed? I tried to go off of the examples in the pandas documentation, but nothing seems to be working. Any help would be greatly appreciated.

Create multiple dataframes
Your output is not really clear but you can use something like:
# Extract base data from top level records
main_cols = ['product_id', 'display_name', 'description']
main_df = pd.json_normalize(result_items)[main_cols]
# Extract sub dataset
shade_df = pd.json_normalize(result_items, ['skus', 'items', 'shades'], 'product_id', record_prefix='shade.')
# Merge base and other sub dataset
df = main_df.merge(shade_df, on='product_id')
Output:
>>> df
product_id display_name description shade.name shade.description shade.hex_val
0 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Marrakesh-mere Intense orange brown #b0594d
1 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Sheer Outrage Grapefruit pink #ca5a5a
2 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Dubonnet Buzz Deep red wine #c95c54
3 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Mull It Over Dirty peach #a45f51
4 99908 Powder Kiss Velvet Blur Slim Stick Experience moisture-matte to the max with our ... Rose Mary Soft mauve #b96161
.. ... ... ... ... ... ...
329 19393 Lipmix / Satin Lipmix is to the makeup artist as tubes of pai... Satin #dedede
330 1625 Lip Erase M·A·C Pro Lip Erase is a professional product ... Pale N27 #e3bd92
331 19392 Lipmix / Gloss Lipmix is to the makeup artist as tubes of pai... Gloss #dfdbcb
332 82134 Lipstick / Frosted Firework A blast of five holiday-exclusive Lustre, Fros... Once Bitten, Ice Shy Sheer white w/ pearl #eae8df
333 52596 Lustre Lipstick M·A·C Lipstick – the iconic product that made ... Lady Bug Yellow tomato #b23532

An example using meta and record_path:
data = response.json()
df = pd.json_normalize(
data=data["data"]["products"]["items"],
meta="product_id",
record_path=["skus", "items", "shades"]
)
Select a product:
shades = df.query("product_id.eq('99908')")["name"].to_list()
print(shades)
Output:
['Marrakesh-mere', 'Sheer Outrage', 'Dubonnet Buzz', 'Mull It Over', 'Rose Mary', 'Sweet Cinnamon', 'Devoted To Chili', 'Wild Rebel', 'Devoted To Danger', 'Love Clove', 'Ruby New', 'Gingerella', 'Stay Curious', 'Peppery Pink', 'All-Star Anise', 'Nice Spice', 'Spice World', 'Over The Taupe', 'Brickthrough', 'Nutmeg Ganache', 'Sorry Not Sorry', 'Pumpkin Spiced', 'Hot Paprika']

Related

Describe AWS SCP Policy to filter region for further automation

While describing scp policies getting an unformatted output that through Python I need to filer out for active regions that's I have enabled. Can anyone please guide me how to achieve that?
scp = response.describe_policy(PolicyId='p-xxxxxx')
print(json.dumps(scp['Policy']['Content'], indent=4))
Output:
$ python test_lambda_copy.py
"{\n \"Version\": \"2012-10-17\",\n \"Statement\": [\n {\n \"Sid\": \"\",\n \"Effect\": \"Deny\",\n \"Action\": [\n \"zocalo:*\",\n \"workmail:*\",\n \"workdocs:*\",\n \"organizations:Update*\",\n \"organizations:Remove*\",\n \"organizations:Register*\",\n \"organizations:Move*\",\n \"organizations:Leave*\",\n \"organizations:Invite*\",\n \"organizations:Enable*\",\n \"organizations:Disable*\",\n \"organizations:Detach*\",\n \"organizations:Deregister*\",\n \"organizations:Delete*\",\n
\"organizations:Decline*\",\n \"organizations:Create*\",\n \"organizations:Cancel*\",\n \"organizations:Attach*\",\n \"lightsail:*\",\n \"kms:ScheduleKeyDeletion\",\n \"kms:DisableKeyRotation\",\n \"guardduty:StopMonitoringMembers\",\n \"guardduty:DisassociateMembers\",\n \"guardduty:DisassociateFromMasterAccount\",\n \"guardduty:DeleteThreatIntelSet\",\n \"guardduty:DeleteMembers\",\n \"guardduty:DeleteIPSet\",\n \"guardduty:DeleteDetector\",\n \"gamelift:*\",\n \"ec2:DeleteFlowLogs\",\n
\"discovery:*\",\n \"cloudtrail:UpdateTrail\",\n \"cloudtrail:StopLogging\",\n \"cloudtrail:DeleteTrail\",\n \"cloudshell:*\",\n \"chime:*\"\n ],\n \"Resource\": \"*\",\n \"Condition\": {\n \"StringNotEquals\": {\n \"aws:RequestedRegion\": [\n \"us-west-2\",\n \"us-west-1\",\n \"us-east-1\",\n
\"eu-west-1\",\n \"eu-central-1\",\n \"ca-central-1\",\n \"ap-southeast-2\"\n ]\n }\n }\n }\n ]\n}"
Out of the above output I am interested in getting below output only for regions. But looks like its not JSON style data, and more like plain text data. Any
"Condition": {\n'
' "StringNotEquals": {\n'
' "aws:RequestedRegion": [\n'
' "us-west-2",\n'
' "us-west-1",\n'
' "us-east-1",\n'
' "eu-west-1",\n'
' "eu-central-1",\n'
' "ca-central-1",\n'
' "ap-southeast-2"\n'
' ]\n'
' }\n'

Web Scraping Table from 'Dune.com' with Python3 and bs4

I am trying to web scrape table data from Dune.com (https://dune.com/queries/1144723). When I 'inspect' the web page, I am able to clearly see the <table></table> element, but when I run the following code I am returned None results.
import bs4
import requests
data = []
r=requests.get('https://dune.com/queries/1144723/1954237')
soup=bs4.BeautifulSoup(r.text, "html5lib")
table = soup.find('table')
How can I successfully find this table data?

The page uses Javascript to load the data. This example will use their API endpoint to load the data to a dataframe:
import requests
import pandas as pd
from bs4 import BeautifulSoup
api_url = "https://app-api.dune.com/v1/graphql"
payload = {
"operationName": "GetExecution",
"query": "query GetExecution($execution_id: String!, $query_id: Int!, $parameters: [Parameter!]!) {\n get_execution(\n execution_id: $execution_id\n query_id: $query_id\n parameters: $parameters\n ) {\n execution_queued {\n execution_id\n execution_user_id\n position\n execution_type\n created_at\n __typename\n }\n execution_running {\n execution_id\n execution_user_id\n execution_type\n started_at\n created_at\n __typename\n }\n execution_succeeded {\n execution_id\n runtime_seconds\n generated_at\n columns\n data\n __typename\n }\n execution_failed {\n execution_id\n type\n message\n metadata {\n line\n column\n hint\n __typename\n }\n runtime_seconds\n generated_at\n __typename\n }\n __typename\n }\n}\n",
"variables": {
"execution_id": "01GN7GTHF62FY5DYYSQ5MSEG2H",
"parameters": [],
"query_id": 1144723,
},
}
data = requests.post(api_url, json=payload).json()
df = pd.DataFrame(data["data"]["get_execution"]["execution_succeeded"]["data"])
df["total_pnl"] = df["total_pnl"].astype(str)
df[["account", "link"]] = df.apply(
func=lambda x: (
(s := BeautifulSoup(x["account"], "html.parser")).text,
s.a["href"],
),
result_type="expand",
axis=1,
)
print(df.head(10)) # <-- print sample data
Prints:
account last_traded rankings total_pnl traded_since link
0 0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46 2022-02-01T13:57:01Z 🥇 #1 1591196.831211874 2021-11-20T18:04:19Z https://www.gmx.house/arbitrum/account/0xff33f5653e547a0b54b86b35a45e8b1c9abd1c46
1 0xcb696fd8e239dd68337c70f542c2e38686849e90 2022-11-23T18:26:04Z 🥈 #2 1367359.0616298981 2022-10-26T06:45:14Z https://www.gmx.house/arbitrum/account/0xcb696fd8e239dd68337c70f542c2e38686849e90
2 190416.eth 2022-12-20T20:30:09Z 🥉 #3 864694.6695150969 2022-09-06T03:07:03Z https://www.gmx.house/arbitrum/account/0xa688bc5e676325cc5fc891ac48fe442f6298a432
3 0x1729f93e3c3c74b503b8130516984ced70bf47d9 2021-09-24T07:30:51Z #4 801075.4878765604 2021-09-22T00:16:43Z https://www.gmx.house/arbitrum/account/0x1729f93e3c3c74b503b8130516984ced70bf47d9
4 0x83b13abab6ec323fff3af6d18a8fd1646ea39477 2022-12-12T21:36:25Z #5 682459.02019836 2022-04-18T14:19:56Z https://www.gmx.house/arbitrum/account/0x83b13abab6ec323fff3af6d18a8fd1646ea39477
5 0x9fc3b6191927b044ef709addd163b15c933ee205 2022-12-03T00:05:33Z #6 652673.6605261166 2022-11-02T18:26:18Z https://www.gmx.house/arbitrum/account/0x9fc3b6191927b044ef709addd163b15c933ee205
6 0xe8c19db00287e3536075114b2576c70773e039bd 2022-12-23T08:59:38Z #7 644020.503240131 2022-10-06T07:20:44Z https://www.gmx.house/arbitrum/account/0xe8c19db00287e3536075114b2576c70773e039bd
7 0x75a34444581f563680003f2ba05ea0c890a10934 2022-11-10T18:08:50Z #8 639684.0495719836 2022-03-06T23:20:41Z https://www.gmx.house/arbitrum/account/0x75a34444581f563680003f2ba05ea0c890a10934
8 omarazhar.eth 2022-09-16T00:27:22Z #9 536522.3114796011 2022-04-11T20:44:42Z https://www.gmx.house/arbitrum/account/0x204495da23507be4e1281c32fb1b82d9d4289826
9 0x023cb9f0662c6612e830b37a82f41125a4c117e1 2022-09-06T01:10:28Z #10 496922.9880152336 2022-04-12T22:31:47Z https://www.gmx.house/arbitrum/account/0x023cb9f0662c6612e830b37a82f41125a4c117e1

Payload to scrape data not working anymore

I'm creating a list called id by scraping from the url below, but the payload I use to extract (which I've used successfully before) is not responsive. I get JSONDecodeError: Expecting value: line 1 column 1 (char 0) in response to data = requests.post(url, json=payload).json().
I'm not familiar with the issue: if it's an IP block from my side (upon passing a limit since I've scraped from this website many times before) or a payload expiration (although it hasn't changed in the developer tools).
I'm not sure what is going on and if someone may provide contextual understanding or perhaps mitigation so I can check for these.
# Accessing data from external URL
url = "https://www.printables.com/graphql/"
# Payload
payload = {
"operationName": "PrintList",
"query": "query PrintList($limit: Int!, $cursor: String, $categoryId: ID, $materialIds: [Int], $userId: ID, $printerIds: [Int], $licenses: [ID], $ordering: String, $hasModel: Boolean, $filesType: [FilterPrintFilesTypeEnum], $includeUserGcodes: Boolean, $nozzleDiameters: [Float], $weight: IntervalObject, $printDuration: IntervalObject, $publishedDateLimitDays: Int, $featured: Boolean, $featuredNow: Boolean, $usedMaterial: IntervalObject, $hasMake: Boolean, $competitionAwarded: Boolean, $onlyFollowing: Boolean, $collectedByMe: Boolean, $madeByMe: Boolean, $likedByMe: Boolean) {\n morePrints(\n limit: $limit\n cursor: $cursor\n categoryId: $categoryId\n materialIds: $materialIds\n printerIds: $printerIds\n licenses: $licenses\n userId: $userId\n ordering: $ordering\n hasModel: $hasModel\n filesType: $filesType\n nozzleDiameters: $nozzleDiameters\n includeUserGcodes: $includeUserGcodes\n weight: $weight\n printDuration: $printDuration\n publishedDateLimitDays: $publishedDateLimitDays\n featured: $featured\n featuredNow: $featuredNow\n usedMaterial: $usedMaterial\n hasMake: $hasMake\n onlyFollowing: $onlyFollowing\n competitionAwarded: $competitionAwarded\n collectedByMe: $collectedByMe\n madeByMe: $madeByMe\n liked: $likedByMe\n ) {\n cursor\n items {\n ...PrintListFragment\n printer {\n id\n __typename\n }\n user {\n rating\n __typename\n }\n __typename\n }\n __typename\n }\n}\n\nfragment PrintListFragment on PrintType {\n id\n name\n slug\n ratingAvg\n likesCount\n liked\n datePublished\n dateFeatured\n firstPublish\n userGcodeCount\n downloadCount\n category {\n id\n path {\n id\n name\n __typename\n }\n __typename\n }\n modified\n images {\n ...ImageSimpleFragment\n __typename\n }\n filesType\n hasModel\n nsfw\n user {\n ...AvatarUserFragment\n __typename\n }\n ...LatestCompetitionResult\n __typename\n}\n\nfragment AvatarUserFragment on UserType {\n id\n publicUsername\n avatarFilePath\n slug\n badgesProfileLevel {\n profileLevel\n __typename\n }\n __typename\n}\n\nfragment LatestCompetitionResult on PrintType {\n latestCompetitionResult {\n placement\n competitionId\n __typename\n }\n __typename\n}\n\nfragment ImageSimpleFragment on PrintImageType {\n id\n filePath\n rotation\n __typename\n}\n",
"variables": {
"categoryId": None,
"collectedByMe": False,
"competitionAwarded": False,
"cursor": None,
"featured": False,
"filesType": ["GCODE"],
"hasMake": False,
"includeUserGcodes": True,
"likedByMe": False,
"limit": 36,
"madeByMe": False,
"materialIds": None,
"nozzleDiameters": None,
"ordering": "-likes_count_7_days",
"printDuration": None,
"printerIds": None,
"publishedDateLimitDays": None,
"weight": None,
},
}
cnt = 0
id = []
while True:
data = requests.post(url, json=payload).json()
# Print all data
# print(json.dumps(data, indent=4))
for i in data["data"]["morePrints"]["items"]:
cnt += 1
id.append(i["id"])
if not data["data"]["morePrints"]["cursor"]:
break
payload["variables"]["cursor"] = data["data"]["morePrints"]["cursor"]
ID = [int(x) for x in id]

The error just means that the response wasn't proper json format. You can check for it by splitting up that data = requests.post(url, json=payload).json() line to something like
res = requests.post(url, json=payload)
# res.raise_for_status() # just in case
if 'application/json' in res.headers['Content-Type']:
data = res.json()
else:
print('unexpected format')
break # or however else you want to handle it
(If I try running your code with this change and then I try exploring res.content, it's just a html with text that boils down to "Printables This site requires JavaScript enabled".)
As for why the response isn't json, are you sure you're using the right url? I get json responses if I send the same requests to "https://api.printables.com/graphql/" (instead of "https://www.printables.com/graphql/")...

How to fix new unable to read URL error in python for yahoo finance

I have been using this code to extract (scrape) stock prices from Yahoo Finance for the last year, but now it produces an error. Does anyone know why this is happening and how to fix it?
# Importing necessary packages
from pandas_datareader import data as web
import datetime as dt
import matplotlib.pyplot as plt
import pandas as pd
import os
import numpy as np
# Stock selection from Yahoo Finance
stock = input("Enter stock symbol or ticket symbol (Exp. General Electric is 'GE'): ")
# Visualizing the stock over time and setting up the dataframe
start_date = (dt.datetime.now() - dt.timedelta(days=40000)).strftime("%m-%d-%Y")
df = web.DataReader(stock, data_source='yahoo', start=start_date)
#THE ERROR IS ON THIS LINE^
plt.plot(df['Close'])
plt.title('Stock Prices Over Time',fontsize=14)
plt.xlabel('Date',fontsize=14)
plt.ylabel('Mid Price',fontsize=14)
plt.show()
RemoteDataError: Unable to read URL: https://finance.yahoo.com/quote/MCD/history?period1=-1830801600&period2=1625284799&interval=1d&frequency=1d&filter=history
Response Text:
b'\n \n \n \n Yahoo\n \n \n \n html {\n height: 100%;\n }\n body {\n background: #fafafc url(https://s.yimg.com/nn/img/sad-panda-201402200631.png) 50% 50%;\n background-size: cover;\n height: 100%;\n text-align: center;\n font: 300 18px "helvetica neue", helvetica, verdana, tahoma, arial, sans-serif;\n }\n table {\n height: 100%;\n width: 100%;\n table-layout: fixed;\n border-collapse: collapse;\n border-spacing: 0;\n border: none;\n }\n h1 {\n font-size: 42px;\n font-weight: 400;\n color: #400090;\n }\n p {\n color: #1A1A1A;\n }\n #message-1 {\n font-weight: bold;\n margin: 0;\n }\n #message-2 {\n display: inline-block;\n *display: inline;\n zoom: 1;\n max-width: 17em;\n _width: 17em;\n }\n \n \n document.write('&test=\'+encodeURIComponent(\'%\')+\'" width="0px" height="0px"/>');var beacon = new Image();beacon.src="//bcn.fp.yahoo.com/p?s=1197757129&t="+ne...

I use this code to extract data from yahoo:
start = pd.to_datetime(['2007-01-01']).astype(int)[0]//10**9 # convert to unix timestamp.
end = pd.to_datetime(['2020-12-31']).astype(int)[0]//10**9 # convert to unix timestamp.
url = 'https://query1.finance.yahoo.com/v7/finance/download/' + stock_ticker + '?period1=' + str(start) + '&period2=' + str(end) + '&interval=1d&events=history'
df = pd.read_csv(url)

I had the same problem. At some recent point pdr stopped working with Yahoo (again). AlphaVantage doesn't carry all the stocks that Yahoo does; googlefinance package only gets current quotes as far as I can tell, not time series; the yahoo-finance package doesn't work (or I failed to get it to work); Econdb sends back some kind of weirdly-formed dataframe (maybe this is fixable); and Quandl has a paywall on non-US stocks.
So because I'm cheap, I looked into the Yahoo CSV download functionality and came up with this, which returns a df pretty much like pdr does:
import pandas as pd
from datetime import datetime as dt
import calendar
import io
import requests
# Yahoo history csv base url
yBase = 'https://query1.finance.yahoo.com/v7/finance/download/'
yHeaders = {
'Accept': 'text/csv;charset=utf-8'
}
def getYahooDf(ticker, startDate, endDate=None): # dates in ISO format
start = dt.fromisoformat(startDate) # To datetime.datetime object
fromDate = calendar.timegm(start.utctimetuple()) # To Unix timestamp format used by Yahoo
if endDate is None:
end=dt.now()
else:
end = dt.fromisoformat(endDate)
toDate = calendar.timegm(end.utctimetuple())
params = {
'period1': str(fromDate),
'period2': str(toDate),
'interval': '1d',
'events': 'history',
'includeAdjustedClose': 'true'
}
response = requests.request("GET", yBase + ticker, headers=yHeaders, params=params)
if response.status_code < 200 or response.status_code > 299:
return None
else:
csv = io.StringIO(response.text)
df = pd.read_csv(csv, index_col='Date')
return df

Also works if you provide headers to your session data object which you then provide to the data reader (e.g. for the caching purpose)
import requests_cache
session = requests_cache.CachedSession(cache_name='cache', backend='sqlite', expire_after=expire_after)
# just add headers to your session and provide it to the reader
session.headers = { 'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0', 'Accept': 'application/json;charset=utf-8' }
data = web.DataReader(stock_names, 'yahoo', start, end, session=session)

pip install yfinance
import pandas_datareader as pdr
from datetime import datetime
TWTR = yf.Ticker('TWTR')
ticker = TWTR.history(period='1y')[['Open', 'High', 'Low', 'Close', 'Volume']] # return is

If you are using Google Colab first upgrade the libraries:
!pip install --upgrade pandas-datareader
!pip install --upgrade pandas
Hope it works! :)
Don't forget to restart the workspace and re-run

Python GraphQL variable not defined

I'm trying to do a GraphQL request that contains some variables using the apollo-boost client on a Flask + Flask-GraphQL + Graphene server.
let data = await client.query({
query: gql`{
addItemToReceipt(receiptId: $receiptId, categoryId: $categoryId, value: $value, count: $count) {
item {
id
category { id }
value
count
}
}
}`,
variables: {
receiptId: this.id,
categoryId: categoryId,
value: value,
count: mult
}
})
But I get "Variable X is not defined" errors.
[GraphQL error]: Message: Variable "$receiptId" is not defined., Location: [object Object],[object Object], Path: undefined
[GraphQL error]: Message: Variable "$categoryId" is not defined., Location: [object Object],[object Object], Path: undefined
[GraphQL error]: Message: Variable "$value" is not defined., Location: [object Object],[object Object], Path: undefined
[GraphQL error]: Message: Variable "$count" is not defined., Location: [object Object],[object Object], Path: undefined
[Network error]: Error: Response not successful: Received status code 400
I've added some debug prints to the graphql_server/__init__.py
# graphql_server/__init__.py:62
all_params = [get_graphql_params(entry, extra_data) for entry in data]
print(len(all_params))
print(all_params[0])
# ...
But from the output I get, everything seems to be OK. The graphql_server.run_http_query() does receive all the variables.
GraphQLParams(query='{\n addItemToReceipt(receiptId: $receiptId, categoryId: $categoryId, value: $value, count: $count) {\n item {\n id\n category {\n id\n __typename\n }\n value\n count\n __typename\n }\n __typename\n }\n}\n', variables={'receiptId': '13', 'categoryId': 'gt', 'value': 0, 'count': 0}, operation_name=None)
What am I doing wrong?

I think this might be a bug in the library itself, if you could report it as an issue with an easy way to reproduce it, it would be great!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.