Removing text from response body - python

I have some lines of code in Python and thanks to requests and a post request I want to retrieve some data from a server, it should return a JSON file, but the problem is that the response contains a string starting with /*-secure-, then the structure of the normal JSON file and again at the end of the response, after the JSON I can see again something which doesn't belong to JSON structure: */.
How can I get rid of this stuff which leads the JSON decoder to generate a traceback? Thank you!

You can use the strip() function.
In [1]: x = "/*-secure-{'test': 'yes'}-secure-*/"
In [2]: y = x.strip("/*-secure-")
In [3]: y
Out[3]: "{'test': 'yes'}"

This is ugly and I would personally go with #wpercy's answer, but I've not posted a python answer for a while.
>>> x = "/*-secure-{'test': 'yes'}-secure-*/"
>>> x.split("-secure-")[1]
"{'test': 'yes'}"

Do I dare mention this? (Yes, I do.)
>>> x = "/*-secure-{'test': 'yes'}-secure-*/"
>>> x[10:-10]
"{'test': 'yes'}"

Related

The output of OrderedDict has single quotes around it, and I want just the insides without single quotes

So I was trying to use OrderedDict inside json.dumps() and it started off working well. However, when trying to use the output directly inside a payload of an HTTP PUT request, it has these single quotes around it that I believe is screwing with the way the json is being interpreted at the receiving end.
So how do I get around this and have it give me the output without the single quotes?
Example:
out = json.dumps(OrderedDict([("name", 1), ("value", 2)]))
... gives an output such as:
'{"name": 1, "value": 2}'
... when I want it to give me the meat, the json, like:
{"name": 1, "value": 2}
... so that I can put that straight into my
r = requests.post(url, data = out)
... and be on my merry way.
As an aside: is there something VERY basic about strings and string literals (whatever those are) that I am completely missing? My Python knowledge being self taught I am sure there are some gaps.
EDIT:
print(out)
... gives
{"name": 1, "value": 2}
which is what I believe I want.
EDIT2: json = out as mentioned in the selected answer did the trick thank you! However, since I am just starting out with coding in Python, I would love to know whether you have come across any articles/ documentation that might be handy for me to know so as to avoid similar issues in the future. Thanks once again everyone!
requests will encode the data for you. You should be able to pass the OrderedDict directly to post:
out = OrderedDict([("name", 1), ("value", 2)])
r = requests.post(url, json=out)
I hope this helps.
EDIT: I realized there's another answer that may help you and it suggests using json instead of data when making the post call.
Documentation:
http://docs.python-requests.org/en/master/user/quickstart/#more-complicated-post-requests

Extract string with more than one URL

I have a string with several URLs inside it. I have managed to use regex to extract the first URL, but I really need them all. My script so far below:
data = ['https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX342_.jpg":[355,342],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX425_.jpg":[441,425],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL.jpg":[500,482],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX466_.jpg":[483,466],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX385_.jpg":[399,385]}']
url = data[data.find("https://"):]
url[:url.find('"')]
Sorry - above script didn't use regex, but was another way I tried to do this. My regex script below which pretty much does the same thing. I don't really mind what we use, just want to try get all the URLs, since both my scripts only extract the first URL.
url=re.search('(https)://.*?\.(jpg)', data)
if url:
print(url.group(0))
I am scraping amazon products - this is the context. I've also updated the string to one of the actual examples.. Thanks everyone for the comments/help
Maybe this way:
URL_list = [i for i in data.split('"') if 'http' in i]
It doesn't use regex, but in this code I don't see a need for regex.
Your new example string (from data[0]) is missing an opening curly brace and a double quote but after adding that, you can read it as JSON using the standard library. You might have simply copy/pasted it incorrectly.
In[2]: data = ['https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX342_.jpg":[355,342],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX425_.jpg":[441,425],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL.jpg":[500,482],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX466_.jpg":[483,466],"https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX385_.jpg":[399,385]}']
In[3]: import json
In[4]: d = json.loads('{"%s' % data[0])
In[5]: d
Out[5]:
{'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX342_.jpg': [355,
342],
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX425_.jpg': [441,
425],
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL.jpg': [500,
482],
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX466_.jpg': [483,
466],
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX385_.jpg': [399,
385]}
In[6]: list(d.keys())
Out[6]:
['https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX342_.jpg',
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX425_.jpg',
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL.jpg',
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX466_.jpg',
'https://images-na.ssl-images-amazon.com/images/I/41M9WbK3MDL._SX385_.jpg']

How to read and assign variables from an API return that's formatted as Dictionary-List-Dictionary?

So I'm trying to learn Python here, and would appreciate any help you guys could give me. I've written a bit of code that asks one of my favorite websites for some information, and the api call returns an answer in a dictionary. In this dictionary is a list. In that list is a dictionary. This seems crazy to me, but hell, I'm a newbie.
I'm trying to assign the answers to variables, but always get various error messages depending on how I write my {},[], or (). Regardless, I can't get it to work. How do I read this return? Thanks in advance.
{
"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true
}
Edited because I put in the wrong sample code.
You need to show your code, but the de-facto way of doing this is by using the requests module, like this:
import requests
url = 'http://www.example.com/api/v1/something'
r = requests.get(url)
data = r.json() # converts the returned json into a Python dictionary
for item in data['answer']:
print(item['widgets'])
Assuming that you are not using the requests library (see Burhan's answer), you would use the json module like so:
data = '{"answer":
[{"widgets":16,
"widgets_available":16,
"widgets_missing":7,
"widget_flatprice":"156",
"widget_averages":15,
"widget_cost":125,
"widget_profit":"31",
"widget":"90.59"}],
"result":true}'
import json
data = json.loads(data)
# Now you can use it as you wish
data['answer'] # and so on...
First I will mention that to access a dictionary value you need to use ["key"] and not {}. see here an Python dictionary syntax.
Here is a step by step walkthrough on how to build and access a similar data structure:
First create the main dictionary:
t1 = {"a":0, "b":1}
you can access each element by:
t1["a"] # it'll return a 0
Now lets add the internal list:
t1["a"] = ["x",7,3.14]
and access it using:
t1["a"][2] # it'll return 3.14
Now creating the internal dictionary:
t1["a"][2] = {'w1':7,'w2':8,'w3':9}
And access:
t1["a"][2]['w3'] # it'll return 9
Hope it helped you.

JSON Python get field as array

This is my response to a get request for some json goodness.
I'm getting this in Python, everything works up to here.
I've been searching for json documentation and reading quite a bit but can't seam to find my answer.
How would I get all the email addresses?
{u'-InFSLzYdyg-OcTosYYs': {u'email': u'hello#gmail.com', u'time': 1360707022892}, u'- InFYJya4K6tZa8YSzme': {u'email': u'me#gmail.com', u'time': 1360708587511}}
What I'd want is a list like so:
email = ['hello#gmail.com', 'me#gmail.com']
Thanks in advance.
Like wRAR said, once you have it as a python dict, it should be as simple as:
[x['email'] for x in l.itervalues()]
Assuming you're converted you JSON string to a python dict (see loads()):
>>> from json import loads
>>> myJSON = loads(somejsonstring)
>>> emails = [a[x]['email'] for x in a]
>>> emails
['hello#gmail.com', 'me#gmail.com']
Or even better, use itervalues() as Luke mentioned.
Just do json.loads and process the resulting dict as usual. There is nothing JSON-specific here.

Slicing URL with Python

I am working with a huge list of URL's. Just a quick question I have trying to slice a part of the URL out, see below:
http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3
How could I slice out:
http://www.domainname.com/page?CONTENT_ITEM_ID=1234
Sometimes there is more than two parameters after the CONTENT_ITEM_ID and the ID is different each time, I am thinking it can be done by finding the first & and then slicing off the chars before that &, not quite sure how to do this tho.
Cheers
Use the urlparse module. Check this function:
import urlparse
def process_url(url, keep_params=('CONTENT_ITEM_ID=',)):
parsed= urlparse.urlsplit(url)
filtered_query= '&'.join(
qry_item
for qry_item in parsed.query.split('&')
if qry_item.startswith(keep_params))
return urlparse.urlunsplit(parsed[:3] + (filtered_query,) + parsed[4:])
In your example:
>>> process_url(a)
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
This function has the added bonus that it's easier to use if you decide that you also want some more query parameters, or if the order of the parameters is not fixed, as in:
>>> url='http://www.domainname.com/page?other_value=xx&param3&CONTENT_ITEM_ID=1234&param1'
>>> process_url(url, ('CONTENT_ITEM_ID', 'other_value'))
'http://www.domainname.com/page?other_value=xx&CONTENT_ITEM_ID=1234'
The quick and dirty solution is this:
>>> "http://something.com/page?CONTENT_ITEM_ID=1234&param3".split("&")[0]
'http://something.com/page?CONTENT_ITEM_ID=1234'
Another option would be to use the split function, with & as a parameter. That way, you'd extract both the base url and both parameters.
url.split("&")
returns a list with
['http://www.domainname.com/page?CONTENT_ITEM_ID=1234', 'param2', 'param3']
I figured it out below is what I needed to do:
url = "http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
url = url[: url.find("&")]
print url
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
Parsin URL is never as simple I it seems to be, that's why there are the urlparse and urllib modules.
E.G :
import urllib
url ="http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3"
query = urllib.splitquery(url)
result = "?".join((query[0], query[1].split("&")[0]))
print result
'http://www.domainname.com/page?CONTENT_ITEM_ID=1234'
This is still not 100 % reliable, but much more than splitting it yourself because there are a lot of valid url format that you and me don't know and discover one day in error logs.
import re
url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
m = re.search('(.*?)&', url)
print m.group(1)
Look at the urllib2 file name question for some discussion of this topic.
Also see the "Python Find Question" question.
This method isn't dependent on the position of the parameter within the url string. This could be refined, I'm sure, but it gets the point across.
url = 'http://www.domainname.com/page?CONTENT_ITEM_ID=1234&param2&param3'
parts = url.split('?')
id = dict(i.split('=') for i in parts[1].split('&'))['CONTENT_ITEM_ID']
new_url = parts[0] + '?CONTENT_ITEM_ID=' + id
An ancient question, but still, I'd like to remark that query string paramenters can also be separated by ';' not only '&'.
beside urlparse there is also furl, which has IMHO better API.

Categories

Resources