Python JSON - Retrieve JSON from constantly changing label? - python

Let's say my JSON looks like this
In Post, the labels are constantly changing. If they were stable, I can retrieve the JSON value by just doing this and retrieving the title
['payload']['references']['Post']['CONSTANT']['title']
But, the ['CONSTANT'] or ['4c708604012f'] is always changing if there are new Posts so I'm not sure how I can retrieve it the title?
Thanks for any help

What you need to do is return all of the .keys() of the changing dictionaries, and then reference them in a loop.
titles = []
for constant in json['payload']['references']['Post'].keys():
titles.append(json['payload']['references']['Post'][constant]['title'])

Loop through all the elements of Post:
for post in var['payload']['references']['Post'].items():
print(post['title']
You can collect all of them in a list:
titles = [post['title'] for post in var['payload']['references']['Post'].items()]

Related

how to get nested data with pandas and request

I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.

Convert Json format String to Link{"link":"https://i.imgur.com/zfxsqlk.png"}

I try to convert this String to only the link: {"link":"https://i.imgur.com/zfxsqlk.png"}
I'm trying to create a discord bot, which sends random pictures from the API https://some-random-api.ml/img/red_panda.
With imageURL = json.loads(requests.get(redpandaurl).content) I get the json String, but what do I have to do that I only get the Link like this https://i.imgur.com/zfxsqlk.png
Sorry if my question is confusingly written, I'm new to programming and don't really know how to describe this problem.
You can simply do this:
image_url = requests.get(your_api_url).json()["link"]
Directly use requests.json(), no need to load the string with json.loads and other manual stuff.
What you get from json.loads() is a Python dict. You can access values in the dict by specifying their keys.
In your case, there is only one key-value pair in the dict: "link" is the key and "https://i.imgur.com/zfxsqlk.png" is the value. You can get the link and store it in the value by appending ["link"] to your line of code:
imageURL = json.loads(requests.get(redpandaurl).content)["link"]

OrderedDict of OrderedDict and storing data in YAML Issues

So basically I have an app I'm making that has user data which I want to backup and load in the database. I'm storing the data in yml files. Now, a user has posts. Each post has a timestamp, text and tags. I want to use an ordereddictionary in order to retain order when I write the data in the YAML files. Currently, I'm doing something like this:
def get_posts(user):
posts_arr = []
for post in user.posts.all():
temparr = OrderedDict()
temparr['timestamp'] = post.created_at.strftime("%Y-%m-%d %H:%M %p")
temparr['text'] = post.text
temparr['tags'] = (',').join(list(post.tags.all().values_list('field',flat=True)))
posts_arr.append(temparr)
return posts_arr
As you can see, I'm using an array of orderectionaries and that I think is the reason my posts for each user are not ordered. How can I resolve this issue.
I am returning this posts_arr object to be stored within another ordereddictionary.
Also, I since the posts text is kind of nested and is a large block of text, I want to make sure that text is also stored in string literal block styles.
Basically, your issue is a misunderstanding on how ordered dictionaries work in python. The python documentation states that an OrderedDict is a:
dict subclass that remembers the order entries were added
https://docs.python.org/3/library/collections.html#module-collections
Personally, I'd recommend a list of dictionaries created from a pre-sorted list of posts. In this case, it would look something like this if we were to keep the majority of your code as-is:
def get_posts(user):
posts_arr = []
sorted_posts = sorted(user.posts.all(), key=(lambda post: post.created_at)) # Sorts the posts based on their created_at date
for post in sorted_posts:
temparr = dict()
temparr['timestamp'] = post.created_at.strftime("%Y-%m-%d %H:%M %p")
temparr['text'] = post.text
temparr['tags'] = (',').join(list(post.tags.all().values_list('field',flat=True)))
posts_arr.append(temparr)
return posts_arr
You could use list comprehensions to build this list from the sorted one like chepner suggested, but I don't want to change too much.
Use an ordinary dict (or OrderedDict if you really need to) for each post, and use a list for the collection of all posts. Once you do that, it's a short jump to using a list comprehension to define the return value directly.
def get_posts(user):
return [{
'timestamp': post.created_at.strftime("%Y-%m-%d %H:%M %p"),
'text': post.text,
'tags': ','.join(list(post.tags.all().values_list('field', flat=True)))
} for post in user.posts.all()]

Scraping data from a http & javaScript site

I currently want to scrape some data from an amazon page and I'm kind of stuck.
For example, lets take this page.
https://www.amazon.com/NIKE-Hyperfre3sh-Athletic-Sneakers-Shoes/dp/B01KWIUHAM/ref=sr_1_1_sspa?ie=UTF8&qid=1546731934&sr=8-1-spons&keywords=nike+shoes&psc=1
I wanted to scrape every variant of shoe size and color. That data can be found opening the source code and searching for 'variationValues'.
There we can see sort of a dictionary containing all the sizes and colors and, below that, in 'asinToDimentionIndexMap', every product code with numbers indicating the variant from the variationValues 'dictionary'.
For example, in asinToDimentionIndexMap we can see
"B01KWIUH5M":[0,0]
Which means that the product code B01KWIUH5M is associated with the size '8M US' (position 0 in variationValues size_name section) and the color 'Teal' (same idea as before)
I want to scrape both the variationValues and the asinToDimentionIndexMap, so i can associate the IndexMap numbers to the variationValues one.
Another person in the site (thanks for the help btw) suggested doing it this way.
script = response.xpath('//script/text()').extract_frist()
import re
# capture everything between {}
data = re.findall(script, '(\{.+?\}_')
import json
d = json.loads(data[0])
d['products'][0]
I can sort of understand the first part. We get everything that's a 'script' as a string and then get everything between {}. The issue is what happens after that. My knowledge of json is not that great and reading some stuff about it didn't help that much.
Is it there a way to get, from that data, 2 dictionaries or lists with the variationValues and asinToDimentionIndexMap? (maybe using some regular expressions in the middle to get some data out of a big string). Or explain a little bit what happens with the json part.
Thanks for the help!
EDIT: Added photo of variationValues and asinToDimensionIndexMap
I think you are close Manuel!
The following code will turn your scraped source into easy-to-select boxes:
import json
d = json.loads(data[0])
JSON is a universal format for storing object information. In other words, it's designed to interpret string data into object data, regardless of the platform you are working with.
https://www.w3schools.com/js/js_json_intro.asp
I'm assuming where you may be finding things a challenge is if there are any errors when accessing a particular "box" inside you json object.
Your code format looks correct, but your access within "each box" may look different.
Eg. If your 'asinToDimentionIndexMap' object is nested within a smaller box in the larger 'products' object, then you might access it like this (after running the code above):
d['products'][0]['asinToDimentionIndexMap']
I've hacked and slash a little bit so you can better understand the structure of your particular json file. Take a look at the link below. On the right-hand side, you will see "which boxes are within one another" - which is precisely what you need to know for accessing what you need.
JSON Object Viewer
For example, the following would yield "companyCompliancePolicies_feature_div":
import json
d = json.loads(data[0])
d['updateDivLists']['full'][0]['divToUpdate']
The person helping you before outlined a general case for you, but you'll need to go in an look at structure this way to truly find what you're looking for.
variationValues = re.findall(r'variationValues\" : ({.*?})', ' '.join(script))[0]
asinVariationValues = re.findall(r'asinVariationValues\" : ({.*?}})', ' '.join(script))[0]
dimensionValuesData = re.findall(r'dimensionValuesData\" : (\[.*\])', ' '.join(script))[0]
asinToDimensionIndexMap = re.findall(r'asinToDimensionIndexMap\" : ({.*})', ' '.join(script))[0]
dimensionValuesDisplayData = re.findall(r'dimensionValuesDisplayData\" : ({.*})', ' '.join(script))[0]
Now you can easily convert them to json as use them combine as you wish.

How to read and assign variables from an API return that's formatted as Dictionary-List-Dictionary?

So I'm trying to learn Python here, and would appreciate any help you guys could give me. I've written a bit of code that asks one of my favorite websites for some information, and the api call returns an answer in a dictionary. In this dictionary is a list. In that list is a dictionary. This seems crazy to me, but hell, I'm a newbie.
I'm trying to assign the answers to variables, but always get various error messages depending on how I write my {},[], or (). Regardless, I can't get it to work. How do I read this return? Thanks in advance.
{
"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true
}
Edited because I put in the wrong sample code.
You need to show your code, but the de-facto way of doing this is by using the requests module, like this:
import requests
url = 'http://www.example.com/api/v1/something'
r = requests.get(url)
data = r.json() # converts the returned json into a Python dictionary
for item in data['answer']:
print(item['widgets'])
Assuming that you are not using the requests library (see Burhan's answer), you would use the json module like so:
data = '{"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true}'
import json
data = json.loads(data)
# Now you can use it as you wish
data['answer'] # and so on...
First I will mention that to access a dictionary value you need to use ["key"] and not {}. see here an Python dictionary syntax.
Here is a step by step walkthrough on how to build and access a similar data structure:
First create the main dictionary:
t1 = {"a":0, "b":1}
you can access each element by:
t1["a"] # it'll return a 0
Now lets add the internal list:
t1["a"] = ["x",7,3.14]
and access it using:
t1["a"][2] # it'll return 3.14
Now creating the internal dictionary:
t1["a"][2] = {'w1':7,'w2':8,'w3':9}
And access:
t1["a"][2]['w3'] # it'll return 9
Hope it helped you.

Categories

Resources