how to get nested data with pandas and request - python

I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax

I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.

Related

python parsing soup protocol response header and value

Trying to write the question one more time because the first time the problem was not clear.
I'm trying to extract output from a SOAP protocol using Python.
The output is structured with a lot of subcategories and this is where I can't extract the information properly.
The output I have is very long, so I will bring here a short example from soapenv:Body
the original code (soap body)
<mc:ocb-rule id="boic"><mc:id>boic</mc:id><mc:ocb-conditions><mc:rule-deactivated>true</mc:rule-deactivated><mc:international>true</mc:international></mc:ocb-conditions><mc:cb-actions><mc:allow>false</mc:allow></mc:cb-actions></mc:ocb-rule>
As you can see I also used the command
xmltodict.parse(soap_response)
for turning the output into a dictionary
OrderedDict([('#id',
'boic'),
('mc:id',
'boic'),
('mc:ocb-conditions',
OrderedDict([('mc:rule-deactivated',
'true'),
('mc:international',
'true')])),
('mc:cb-actions',
OrderedDict([('mc:allow',
'false')]))])
If there is an easier way, I would appreciate guidance
As I mentioned, my goal is ultimately to get each category its own value, when if there is a subcategory, a separate category will be added.
In the end I will have to put all the values into the Data frame and display all the categories and their values
for example :
table example
I really hope that this time I was able to explain myself properly
Thanks in advance
i am trying to parsing soup response and insert all value and header to data frame

JSON Parsing with python from Rethink database [Python]

Im trying to retrieve data from a database named RethinkDB, they output JSON when called with r.db("Databasename").table("tablename").insert([{ "id or primary key": line}]).run(), when doing so it outputs [{'id': 'ValueInRowOfid\n'}] and I want to parse that to just the value eg. "ValueInRowOfid". Ive tried with JSON in Python, but I always end up with the typeerror: list indices must be integers or slices, not str, and Ive been told that it is because the Database outputs invalid JSON format. My question is how can a JSON format be invalid (I cant see what is invalid with the output) and also what would be the best way to parse it so that the value "ValueInRowOfid" is left in a Operator eg. Value = ("ValueInRowOfid").
This part imports the modules used and connects to RethinkDB:
import json
from rethinkdb import RethinkDB
r = RethinkDB()
r.connect( "localhost", 28015).repl()
This part is getting the output/value and my trial at parsing it:
getvalue = r.db("Databasename").table("tablename").sample(1).run() # gets a single row/value from the table
print(getvalue) # If I print that, it will show as [{'id': 'ValueInRowOfid\n'}]
dumper = json.dumps(getvalue) # I cant use `json.loads(dumper)` as JSON object must be str. Which the output of the database isnt (The output is a list)
parsevalue = json.loads(dumper) # After `json.dumps(getvalue)` I can now load it, but I cant use the loaded JSON.
print(parsevalue["id"]) # When doing this it now says that the list is a str and it needs to be an integers or slices. Quite frustrating for me as it is opposing it self eg. It first wants str and now it cant use str
print(parsevalue{'id'}) # I also tried to shuffle it around as seen here, but still the same result
I know this is janky and is very hard to comprehend this level of stupidity that I might be on. As I dont know if it is the most simple problem or something that just isnt possible (Which it should or else I cant use my data in the database.)
Thank you for reading this through and not jumping straight into the comments and say that I have to read the JSON documentation, because I have and I havent found a single piece that could help me.
I tried reading the documentation and watching tutorials about JSON and JSON parsing. I also looked for others whom have had the same problems as me and couldnt find.
It looks like it's returning a dictionary ({}) inside a list ([]) of one element.
Try:
getvalue = r.db("Databasename").table("tablename").sample(1).run()
print(getvalue[0]['id'])

Exporting data from Yelp API to csv file

I'm a beginner programmer trying to work with the Yelp API and I've been able to pull the information I need but I can't figure out how to only export a single part of the address into my csv file. This is the code I am working with:
**Convert the JSON string to a dictionary*
business_data = response.json()
c = csv.writer(open('testing.csv', 'a'), lineterminator ='\n')
for biz in business_data['businesses']:
c.writerow([biz['name'], biz['location'], biz['phone'], biz['url']])
In the last line of code, in the for loop, I want to be able to target a specific element of the 'location' like this:
#c.writerow([biz['name'], biz['address1'], biz['city'], biz['state'], biz['zip_code'], biz['phone'], biz['url']])
on the Yelp website it shows that I can target these specifics but I just can't seem to figure out how with c.writerow()
Yelp shows that I can target these like this:
businesses[x].location.address1
businesses[x].location.address2
businesses[x].location.city
From the response, biz['location'] is a python dictionary meaning it consists of key, value pairs.
You can validate this by printing type(biz['location']). To answer your question, all you need is to call the dict key & write the value into the file.
c.writerow([biz['name'], biz['location']['address1'], biz['location']['city'], biz['location']['state'], biz['location']['zip_code'] biz['phone'], biz['url']])

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

How to read and assign variables from an API return that's formatted as Dictionary-List-Dictionary?

So I'm trying to learn Python here, and would appreciate any help you guys could give me. I've written a bit of code that asks one of my favorite websites for some information, and the api call returns an answer in a dictionary. In this dictionary is a list. In that list is a dictionary. This seems crazy to me, but hell, I'm a newbie.
I'm trying to assign the answers to variables, but always get various error messages depending on how I write my {},[], or (). Regardless, I can't get it to work. How do I read this return? Thanks in advance.
{
"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true
}
Edited because I put in the wrong sample code.
You need to show your code, but the de-facto way of doing this is by using the requests module, like this:
import requests
url = 'http://www.example.com/api/v1/something'
r = requests.get(url)
data = r.json() # converts the returned json into a Python dictionary
for item in data['answer']:
print(item['widgets'])
Assuming that you are not using the requests library (see Burhan's answer), you would use the json module like so:
data = '{"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true}'
import json
data = json.loads(data)
# Now you can use it as you wish
data['answer'] # and so on...
First I will mention that to access a dictionary value you need to use ["key"] and not {}. see here an Python dictionary syntax.
Here is a step by step walkthrough on how to build and access a similar data structure:
First create the main dictionary:
t1 = {"a":0, "b":1}
you can access each element by:
t1["a"] # it'll return a 0
Now lets add the internal list:
t1["a"] = ["x",7,3.14]
and access it using:
t1["a"][2] # it'll return 3.14
Now creating the internal dictionary:
t1["a"][2] = {'w1':7,'w2':8,'w3':9}
And access:
t1["a"][2]['w3'] # it'll return 9
Hope it helped you.

Categories

Resources