I need a product's unit of stock(quantity). Is it possible with SP API, if possible how can I get it?
Note: I can get it with SKU like the following code but the product is not listed by my sellers.
from sp_api.api import Inventories
quantity = Inventories(credentials=credentials, marketplace=Marketplaces.FR).get_inventory_summary_marketplace(**{
"details": False,
"marketplaceIds": ["A13V1IB3VIYZZH"],
"sellerSkus": ["MY_SKU_1" , "MY_SKU_2"]
})
print(quantity)
Related
I have an issue with data hiding.When I print the extracted data as text, every data is shown properly. Below code is for printing extracted data and output is also given.
import os
import ocrmypdf
import pdfplumber
path= "G:\\SKM.pdf"
os.system(f'ocrmypdf {path} output.pdf')
ocrmypdf.ocr(path, "output.pdf")
invoice= pdfplumber.open("output.pdf")
count_pages= len(invoice.pages)
page=invoice.pages[count_pages-1]
text=page.extract_text(x_tolerance=2)
print(text)
Output:
Order Number : 202100050 Order Date : 25.11.2021
Client Number : 145 Delivery Date : Pending
Currency : Euro Contact Perso: Martin
Payment Condition : Due Email : martin#def.com
When I convert to DataFrame and print the data, some data such as Order date, delivery date and email address have been partially hid. Output is given.
ds = pd.DataFrame(text.split('\n'))
print(ds)
Output:
1 Order Number : 202100050 Order Date : ...
2 Client Number : 145 Delivery Date : Pen...
3 Currency : Euro Contact Perso: Martin
4 Payment Condition : Due Email : martin#d...
What is the reason. How can I solve this issue?
Try using a pandas printing formater, like tabulate, that you must first install with pip install tabulate, and then you can use it to print the dataframe formated:
ds = pd.DataFrame(text.split('\n'))
print(ds.to_markdown())
I'm learning to collect data from REST APIs to generate custom reports.
For example, one of the APIs I'm dealing with is the POS application MobileBytes. For this API my goal is to model daily sales > group by room > summarized by each category.
A Daily Sales Report uses the debit credit model:
Account
DR
CR
Credit Cards receivable
DR
Cash receivable
DR
Bar liquor
CR
Bar wine
CR
Bar beer
CR
Bar food
CR
Patio Food
CR
Patio liquor
CR
Patio wine
CR
Patio beer
CR
Sales Tax
CR
Tips
CR
As this example of a Sales Report shows, the API's "rooms" represent the different physical areas in the establishment: bar, patio Other rooms could include: dining, banquet, take-out, hrubgub, eberuats. Report categories represent the common categories of sales used in food service accounting: food, liquor, wine, beer, and other categories could include: dessert, retail. The results of each API endpoint are a variable length array of features and their totals. The constraint of the debit-credit model is total debits equal total credits (just like any purchase receipt; total of your purchases + tax equal what you paid).
And just in case you thought it an easy job of querying each endpoint to collect each table for the final report--no--each record item label uses an id which points to the Setup API where the string representation or string label for each record item is found. Putting it all together {"bar" : {"food": $x_1, "liquor":$x_2, "wine": $x_3, "beer": $x_4}, .. is calling at least 3-4 different endpoints--two for each subsection of the sales report (ie. rooms and categories), plus one or more for their labels in the setup and menu endpoints.
How could I manage, organize and combine all of these different API calls?
I'm leaning towards using Pandas DataFrames as described in this example: Query API’s with Json Output in Python (Medium article)
There appears to be no actual ID for you to join against in the data.
You have a list of "report categories" and "rooms". Each with their own IDs. Each with their own quantity, for example.
It's also not clear what each object represents. Days? If so, create a simple loop over each day from start-to-end, then parse each object.
import requests
from datetime import date, timedelta
start = date(2022, 1, 21)
startDate = start.strftime("%Y-%m-%d")
end = date(2022, 1, 22)
endDate= end.strftime("%Y-%m-%d")
# TODO: You need to add API keys in here
sales_api = 'https://api.mobilebytes.com/v2/reports/sales'
categories = requests.get(f'{sales_api}/reportCategories/{startDate}/{endDate}')
rooms = requests.get(f'{sales_api}/rooms/{startDate}/{endDate}')
output = [] # to build your output for a dataframe
if categories.status_code // 100 == 2 and rooms.status_code // 100 == 2:
# These lists should be the same size, so you can zip them
c = categories.json()['application/json']
r = rooms.json()['application/json']
d = start
for category, room in zip(c, r):
print(d, category, room) # For debugging
# TODO: parse both objects and populate your list above
output.append({
'date': d.strftime("%Y-%m-%d"),
'category_id': category['report_category_id'],
# TODO: parse more category values
'room_id': room['room_id']
# TODO: parse more room values
})
d += timedelta(days = 1)
else:
raise Error('Unable to connect')
From a list of dictionaries, it is easy to create a dataframe
import pandas as pd
df = pd.DataFrame(output)
Regarding ORM, you can use swagger-codegen to create Python classes that represent the documented response bodies.
I am trying to pull search results data from an API on a website and put it into a pandas dataframe. I've been able to successfully pull the info from the API into a JSON format.
The next step I'm stuck on is how to loop through the search results on a particular page and then again for each page of results.
Here is what I've tried so far:
#Step 1: Connect to an API
import requests
import json
response_API = requests.get('https://www.federalregister.gov/api/v1/documents.json?conditions%5Bpublication_date%5D%5Bgte%5D=09%2F01%2F2021&conditions%5Bterm%5D=economy&order=relevant&page=1')
#200
#Step 2: Get the data from API
data = response_API.text
#Step 3: Parse the data into JSON format
parse_json = json.loads(data)
#Step 4: Extract data
title = parse_json['results'][0]['title']
pub_date = parse_json['results'][0]['publication_date']
agency = parse_json['results'][0]['agencies'][0]['name']
Here is where I've tried to put this all into a loop:
import numpy as np
import pandas as pd
df=[]
for page in np.arange(0,7):
url = 'https://www.federalregister.gov/api/v1/documents.json?conditions%5Bpublication_date%5D%5Bgte%5D=09%2F01%2F2021&conditions%5Bterm%5D=economy&order=relevant&page={page}'.format(page=page)
response_API = requests.get(url)
print(response_API.status_code)
data = response_API.text
parse_json = json.loads(data)
for i in parse_json:
title = parse_json['results'][i]['title']
pub_date = parse_json['results'][i]['publication_date']
agency = parse_json['results'][i]['agencies'][0]['name']
df.append([title,pub_date,agency])
cols = ["Title", "Date","Agency"]
df = pd.DataFrame(df,columns=cols)
I feel like I'm close to the correct answer, but I'm not sure how to move forward from here. I need to iterate through the results where I placed the i's when parsing through the json data, but I get an error that reads, "Type Error: list indices must be integers or slices, not str". I understand I can't put the i's in those spots, but how else am I supposed to iterate through the results?
Any help would be appreciated!
Thank you!
I think you are very close!
import numpy as np
import pandas as pd
import requests
BASE_URL = "'https://www.federalregister.gov/api/v1/documents.json?conditions%5Bpublication_date%5D%5Bgte%5D=09%2F01%2F2021&conditions%5Bterm%5D=economy&order=relevant&page={page}"
results = []
for page in range(0, 7):
response = requests.get(BASE_URL.format(page=page))
if response.ok:
resp_json = response.json()
for res in resp_json["results"]:
results.append(
[
res["title"],
res["publication_date"],
[agency["name"] for agency in res["agencies"]]
]
)
df = pd.DataFrame(results, columns=["Title", "Date", "Agencies"])
In this block of code, I used the requests library's built-in .json() method, which can automatically convert a response's text to a JSON dict (if it's in the proper format).
The if response.ok is a little less-verbose way provided by requests to check if the status code is < 400, and can prevent errors that might occur when attempting to parse the response if there was a problem with the HTTP call.
Finally, I'm not sure what data you need exactly for your DataFrame, but each object in the
"results" list from the pages pulled from that website has "agencies" as a list of agencies... wasn't sure if you wanted to drop all that data, so I kept the names as a list.
*Edit:
In case the response objects don't contain the proper keys, we can use the .get() method of Python dictionaries.
# ...snip
for res in resp_json["results"]:
results.append(
[
res.get("title"), # This will return `None` as a default, instead of causing a KeyError
res.get("publication_date"),
[
# Here, get the 'raw_name' or None, in case 'name' key doesn't exist
agency.get("name", agency.get("raw_name"))
for agency in res.get("agencies", [])
]
]
)
Slightly different approach: rather than iterating through the response, read into a dataframe then save what you need. The saves the first agency name in the list.
df_list=[]
for page in np.arange(0,7):
url = 'https://www.federalregister.gov/api/v1/documents.json?conditions%5Bpublication_date%5D%5Bgte%5D=09%2F01%2F2021&conditions%5Bterm%5D=economy&order=relevant&page={page}'.format(page=page)
response_API = requests.get(url)
# print(response_API.status_code)
data = response_API.text
parse_json = json.loads(data)
df = pd.json_normalize(parse_json['results'])
df['Agency'] = df['agencies'][0][0]['raw_name']
df_list.append(df[['title', 'publication_date', 'Agency']])
df_final = pd.concat(df_list)
df_final
title publication_date Agency
0 Determination of the Promotion of Economy and ... 2021-09-28 OFFICE OF MANAGEMENT AND BUDGET
1 Corporate Average Fuel Economy Standards for M... 2021-09-03 OFFICE OF MANAGEMENT AND BUDGET
2 Public Hearing for Corporate Average Fuel Econ... 2021-09-14 OFFICE OF MANAGEMENT AND BUDGET
3 Investigation of Urea Ammonium Nitrate Solutio... 2021-09-08 OFFICE OF MANAGEMENT AND BUDGET
4 Call for Nominations To Serve on the National ... 2021-09-08 OFFICE OF MANAGEMENT AND BUDGET
.. ... ... ...
15 Energy Conservation Program: Test Procedure fo... 2021-09-14 DEPARTMENT OF COMMERCE
16 Self-Regulatory Organizations; The Nasdaq Stoc... 2021-09-09 DEPARTMENT OF COMMERCE
17 Regulations To Improve Administration and Enfo... 2021-09-20 DEPARTMENT OF COMMERCE
18 Towing Vessel Firefighting Training 2021-09-01 DEPARTMENT OF COMMERCE
19 Patient Protection and Affordable Care Act; Up... 2021-09-27 DEPARTMENT OF COMMERCE
[140 rows x 3 columns]
I have a stock collection of this form:
{
_id: ObjectId("5e132f29009502d4e85e1293"),
Product: ObjectId("5e132f29009502c4e97e8796"),
Stock: [
{
Qty: 50,
Expiration Date: 2022-05-01T00:00:00.000+00:00
}
]
}
This collection contains the current stock for each product. There are about 5000 entries.
Now I have to assess the stock on a given date. For this I use a simple formula:
stock = actual_stock + total_output - total_input
I have a collection for product inputs (arrival collection) and another for output operations (requisition collection):
arrival collection:
{
_id: ObjectId("5e26eed55c0e07995d9f2cd0"),
Order Number: 200049,
Reception: [
{Product: ObjectId(5e132f3e009502d4e85e2af4), Qty: 10, Expiration Date: 2022-05-01T00:00:00.000+00:00}
],
Date: 2020-01-21T13:30:13.529+00:00
}
requisition collection
{
_id: ObjectId("5e26eed55c0e07995d9f2cd0"),
Requisition Number: 200049,
Products: [
{Product: ObjectId(5e132f3e009502d4e85e2af4), Qty: 10, Expiration Date: 2022-05-01T00:00:00.000+00:00}
],
Date: 2020-01-21T13:30:13.529+00:00
}
There is obviously other information in these documents, this is just an extract to show their composition.
Now here is the python code:
# imports ...
stock_db = mongo.db.Stock
arrival_db = mongo.db.Arrival
requisition_db = mongo.db.Requisitions
def check_arrival_product(product, date):
check_arrival = arrival_db.aggregate([{'$unwind': '$Reception'},
{'$match': {
'Reception.Product': ObjectId(product),
'$and': [
{'Reception.Date':
{'$gte': date}
}]}
}])
qty = 0
for i in check_arrival:
qty += i['Reception'].get('Qty')
return qty
def check_requisition_product(product, date):
check_requisition = requisition_db.aggregate([{'$unwind': '$Products'},
{'$match': {
'Products.Product': ObjectId(product),
'$and': [
{'Date':
{'$gte': date}
}]}
}])
qty = 0
for i in check_requisition:
qty += i['Products']['Qty']
return qty
def main(date):
# ....
check_stock = stock_db.find()
check_stock.batch_size(1000)
for i in check_stock:
stock = 0
for j in i['Stock']:
stock += j['Qty']
total_arrival = check_arrival_product(i['Product'], date)
total_requisition = check_requisition_product(i['Product'], date)
stock = stock + total_requisition - total_arrival
# ....
As you can see in the main function, I iterate on 5000 products and for each I have to evaluate the stock entered and taken out on a given date in order to calculate the stock on that date.
The major problem is that the operation takes up to 4 minutes, which is much too long.
P.S: The database is on the same computer.
So how can I optimize this kind of operation?
My first idea would be to de-normalize the data. I.e: create a new collection e.g. 'transactions', and add an index on productId and date and add all transactions to it, both requisitions and arrivals. You can do that at runtime everytime a new transaction arrives, or as a batch job using two aggregation pipelines with a $out/$merge stage.
For the batch jobs, it should be something like this:
transaction_db.createIndex{
"productId":1,
"date":1
}
requisition_db.aggregate([
{'$unwind': '$Products'},
//TODO: map productId, date, delta=-Qty
{'$out': 'transaction_db'}
])
arrival_db.aggregate([
{'$unwind': '$Products'},
//TODO: map productId, date, delta=+Qty
{'$merge': {into: 'transaction_db'}}
])
On this new collection, creating the inventory per productId would be a single aggregation pipeling using a $group stage.
transaction_db.aggregate([ {
$group: {
_id: {productId: "$productId", date: "$date"},
deltaPerDay: { $sum: "$delta" }
}
} ] )
Another idea would be to take a look at the $lookup stage to join from products to requisitions or arrivals. But for that you need to unwind them first to get individual product transaction, and I'm not sure how to do that.
If you only have 5000 products, you might be quicker if you keep them all in memory and calculate the delta's on the python side.
First do a findAll for the current stock and keep all of them in a dictionary indexed by id.
Then read all requisitions with findAll and update the stock in-memory.
Then read all arrivals with findAll and update the stock in-memory.
If you have enough memory, this is a much easier implementation.
I'm trying to scrape Morningstar.com to get financial data and prices of each fund available on the website. Fortunately I have no problem at scraping financial data (holdings, asset allocation, portfolio, risk, etc.), but when it comes to find the URL that hosts the daily prices in JSON format for each fund, there is a "dataid" value that is not available in the HTML code and without it there is no way to know the exact URL that hosts all the prices.
I have tried to print the whole page as text for many funds, and none of them show in the HTML code the "dataid" value that I need in order to get the prices. The URL that hosts the prices also includes the "secid", which is scrapeable very easily but has no relationship at all with the "dataid" that I need to scrape.
import requests
from lxml import html
import re
import json
quote_page = "https://www.morningstar.com/etfs/arcx/aadr/quote.html"
prices1 = "https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids="
prices2 = "&dataid="
prices3 = "&startdate="
prices4 = "&enddate="
starting_date = "2018-01-01"
ending_date = "2018-12-28"
quote_html = requests.get(quote_page, timeout=10)
quote_tree = html.fromstring(quote_html.text)
security_id = re.findall('''meta name=['"]secId['"]\s*content=['"](.*?)['"]''', quote_html.text)[0]
security_type = re.findall('''meta name=['"]securityType['"]\s*content=['"](.*?)['"]''', quote_html.text)[0]
data_id = "8225"
daily_prices_url = prices1 + security_id + ";" + security_type + prices2 + data_id + prices3 + starting_date + prices4 + ending_date
daily_prices_html = requests.get(daily_prices_url, timeout=10)
json_prices = daily_prices_html.json()
for json_price in json_prices["data"]["r"]:
j_prices = json_price["t"]
for j_price in j_prices:
daily_prices = j_price["d"]
for daily_price in daily_prices:
print(daily_price["i"] + " || " + daily_price["v"])
The code above works for the "AADR" ETF only because I copied and pasted the "dataid" value manually in the "data_id" variable, and without this piece of information there is no way to access the daily prices. I would not like to use Selenium as alternative to find the "dataid" because it is a very slow tool and my intention is to scrape data for more than 28k funds, so I have tried only robot web-scraping methods.
Do you have any suggestion on how to access the Network inspection tool, which is the only source I have found so far that shows the "dataid"?
Thanks in advance
The data id may not be that important. I varied the code F00000412E that is associated with AADR whilst keeping the data id constant.
I got a list of all those codes from here:
https://www.firstrade.com/scripts/free_etfs/io.php
Then add the code of choice into your url e.g.
[
"AIA",
"iShares Asia 50 ETF",
"FOUSA06MPQ"
]
Use FOUSA06MPQ
https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=FOUSA06MPQ;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30
You can verify the values by adding the other fund as a benchmark to your chart e.g. XNAS:AIA
28th december has value of 55.32. Compare this with JSON retrieved:
I repeated this with
[
"ALD",
"WisdomTree Asia Local Debt ETF",
"F00000M8TW"
]
https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=F00000M8TW;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30
dataId 8217 works well for me, irrespective of the security.