Python URLencoding Specific Results - python

I'm querying from a database in Python 3.8.2
I need the urlencoded results to be:
data = {"where":{"date":"03/30/20"}}
needed_results = ?where=%7B%22date%22%3A%20%2203%2F30%2F20%22%7D
I've tried the following
import urllib.parse
data = {"where":{"date":"03/30/20"}}
print(urllib.parse.quote_plus(data))
When I do that I get the following
Traceback (most recent call last):
File "C:\Users\Johnathan\Desktop\Python Snippets\test_func.py", line 17, in <module>
print(urllib.parse.quote_plus(data))
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 855, in quote_plus
string = quote(string, safe + space, encoding, errors)
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 839, in quote
return quote_from_bytes(string, safe)
File "C:\Users\Johnathan\AppData\Local\Programs\Python\Python38-32\lib\urllib\parse.py", line 864, in quote_from_bytes
raise TypeError("quote_from_bytes() expected bytes")
TypeError: quote_from_bytes() expected bytes
I've tried a couple of other methods and received:?where=%7B%27date%27%3A+%2703%2F30%2F20%27%7D
Long Story Short, I need to url encode the following
data = {"where":{"date":"03/30/20"}}
needed_encoded_data = ?where=%7B%22date%22%3A%20%2203%2F30%2F20%22%7D
Thanks

where is a dictionary - that can't be url-encoded. You need to turn that into a string or bytes object first.
You can do that with json.dumps
import json
import urllib.parse
data = {"where":{"date":"03/30/20"}}
print(urllib.parse.quote_plus(json.dumps(data)))
Output:
%7B%22where%22%3A+%7B%22date%22%3A+%2203%2F30%2F20%22%7D%7D

Related

Unable to read json file with python

I am reading a json file with python using below code:
import json
Ums = json.load(open('commerceProduct.json'))
for um in Ums :
des = um['description']
if des == None:
um['description'] = "Null"
with open("sample.json", "w") as outfile:
json.dump(um, outfile)
break
It is giving me the following error:
Traceback (most recent call last):
File "test.py", line 2, in <module>
Ums = json.load(open('commerceProduct.json'))
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 293, in load
return loads(fp.read(),
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/json/decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 1 column 5528 (char 5527)
while I am checking the json file, it looks fine.
The thing is it has one object on one line with deliminator being '\n'.
It is not corrupted since i have imported the same file in mongo.
Can someone please suggest what can be wrong in it ?
Thanks.
your JSON data is not in a valid format, one miss will mess up the python parser. Try to test your JSON data here, make sure it is in a correct format.
this return _default_decoder.decode(s) is returned when the python parser find somthing wrong in your json.
The code is valid and will work with a valid json doc.
You have one json object per line? That's not a valid json file. You have newline-delimited json, so consider using the ndjson package to read it. It has the same API as the json package you are familiar with.
import ndjson
Ums = ndjson.load(open('commerceProduct.json'))
...

What's the most Pythonic way to parse out this value from a JSON-like blob?

See below. Given a well-known Google URL, I'm trying to retrieve data from that URL. That data will provide me another Google URL from which I can retrieve a list of JWKs.
>>> import requests, json
>>> open_id_config_url = 'https://ggp.sandbox.google.com/.well-known/openid-configuration'
>>> response = requests.get(open_id_config_url)
>>> r.status_code
200
>>> response.text
u'{\n "issuer": "https://www.stadia.com",\n "jwks_uri": "https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com",\n "claims_supported": [\n "iss",\n "aud",\n "sub",\n "iat",\n "exp",\n "s_env",\n "s_app_id",\n "s_gamer_tag",\n "s_purchase_country",\n "s_current_country",\n "s_session_id",\n "s_instance_ip",\n "s_restrict_text_chat",\n "s_restrict_voice_chat",\n "s_restrict_multiplayer",\n "s_restrict_stream_connect",\n ],\n "id_token_signing_alg_values_supported": [\n "RS256"\n ],\n}'
Above I have successfully retrieved the data from the first URL. I can see the entry jwks_uri contains the second URL I need. But when I try to convert that blob of text to a python dictionary, it fails.
>>> response.json()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/saqib.ali/saqib-env-99/lib/python2.7/site-packages/requests/models.py", line 889, in json
self.content.decode(encoding), **kwargs
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
>>> json.loads(response.text)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python#2/2.7.16/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
The only way I can get out the JWKs URL is by doing this ugly regular expression parsing:
>>> re.compile('(?<="jwks_uri": ")[^"]+').findall(response.text)[0]
u'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Is there a cleaner, more Pythonic way to extract this string?
I really wish Google would send back a string that could be cleanly JSON-ified.
The returned json string is incorrect because last item of the dictionary ends with ,, which json cannot parse.
": [\n "RS256"\n ],\n}'
^^^
But ast.literal_eval can do that (as python parsing accepts lists/dicts that end with a comma). As long as you don't have booleans or null values, it is possible and pythonic
>>> ast.literal_eval(response.text)["jwks_uri"]
'https://www.googleapis.com/service_accounts/v1/jwk/stadia-jwt#system.gserviceaccount.com'
Your JSON is invalid because it has an extra comma after the last value in the claims_supported array.
I wouldn't necessarily recommend it, but you could use the similarity of JSON and Python syntax to parse this directly, since Python is much less picky:
ast.literal_eval(response.tezt)
As suggested in this answer use yaml to parse json. It will tolerate the trailing comma as well as other deviations from the json standard.
import yaml
d = yaml.load(response.text)

Getting TypeError: ord() expected string of length 1, but int found error

Code is
from PyPDF2 import PdfFileReader
with open('HTTP_Book.pdf','rb') as file:
pdf=PdfFileReader(file)
pagedd=pdf.getPage(0)
print(pagedd.extractText())
This code raises the error shown below:
TypeError: ord() expected string of length 1, but int found
I searched on internet and found this Troubleshooting "TypeError: ord() expected string of length 1, but int found"
but it doesn't help much. I am aware of what is the background of this error but not sure how is it related here?
Tried changing the pdf file and it works fine. Then what is wrong: pdf file or PyPDF2 is not able to handle it? I know this method is not much reliable as per documentation:
This works well for some PDF files, but poorly for others, depending on the generator used
How should this be handled?
Traceback:
Traceback (most recent call last):
File "pdf_reader.py", line 71, in <module>
print(pagedd.extractText())
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2595, in ex
tractText
content = ContentStream(content, self.pdf)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\pdf.py", line 2673, in __
init__
stream = BytesIO(b_(stream.getData()))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\generic.py", line 841, in
getData
decoded._data = filters.decodeStreamData(self)
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 350, in
decodeStreamData
data = LZWDecode.decode(data, stream.get("/DecodeParms"))
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 255, in
decode
return LZWDecode.decoder(data).decode()
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 228, in
decode
cW = self.nextCode();
File "C:\Users\Jeet\AppData\Local\Programs\Python\Python37\lib\site-packages\PyPDF2\filters.py", line 205, in
nextCode
nextbits=ord(self.data[self.bytepos])
TypeError: ord() expected string of length 1, but int found
I got the issue. This is just a limitation of PyPDF2. I used tika and BeautifulSoup to parse and extract the text, it worked fine. Although it needs little more work.
from tika import parser
from bs4 import BeautifulSoup
raw=parser.from_file('HTTP_Book.pdf',xmlContent=True)['content']
data=BeautifulSoup(raw,'lxml')
message=data.find(class_='page') # for first page
print(message.text)

ElasticSearch : TypeError: expected string or buffer?

I wrote a simple python script to put the JSON file to Elasticsearch.I want to store it based on the id field I am extracting from the JSON file.
But when I try to insert into elastic search.It raises an error TypeError: expected string or buffer
Here is the code I am working on...
#! /usr/bin/python
import requests
from elasticsearch import Elasticsearch
import json
es = Elasticsearch([{'host':'localhost','port':9200}])
r = requests.get('http://127.0.0.1:9200')
i = 1
if r.status_code == 200:
with open('d1.json') as json_data:
d = json.load(json_data)
for i in d['tc'][0]['i]['f']['h']:
if i['name'] == 'm':
m = i['value']
dope=str(m)
print dope
print type(dope)
#print type(md5)
es.index(index='lab', doc_type='report',id=dope,body=json.loads(json_data))
Error Log:
44d88612fea8a8f36de82e1278abb02f
<type 'str'>
Traceback (most recent call last):
File "elastic_insert.py", line 22, in <module>
es.index(index='labs', doc_type='report',id=dope,body=json.loads(json_data))
File "/usr/lib/python2.7/json/__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
TypeError: expected string or buffer
Any suggestions on how to solve this error.I even tried to convert the m to int but it gave another error.
int(m)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: '44d88612fea8a8f36de82e1278abb02f'
P.S: ElasticSearch service is up and running.
The problem is not related to the id. the problem is with
"json_data". it is a file stream so you need json.load and not json.loads in your es.index

ValueError: unknown url type

The title pretty much says it all. Here's my code:
from urllib2 import urlopen as getpage
print = getpage("www.radioreference.com/apps/audio/?ctid=5586")
and here's the traceback error I get:
Traceback (most recent call last):
File "C:/Users/**/Dropbox/Dev/ComServ/citetest.py", line 2, in <module>
contents = getpage("www.radioreference.com/apps/audio/?ctid=5586")
File "C:\Python25\lib\urllib2.py", line 121, in urlopen
return _opener.open(url, data)
File "C:\Python25\lib\urllib2.py", line 366, in open
protocol = req.get_type()
File "C:\Python25\lib\urllib2.py", line 241, in get_type
raise ValueError, "unknown url type: %s" % self.__original
ValueError: unknown url type: www.radioreference.com/apps/audio/?ctid=5586
My best guess is that urllib can't retrieve data from untidy php URLs. if this is the case, is there a work around? If not, what am I doing wrong?
You should first try to add 'http://' in front of the url. Also, do not store the results in print, as it is binding the reference to another (non callable) object.
So this line should be:
page_contents = getpage("http://www.radioreference.com/apps/audio/?ctid=5586")
This returns a file like object. To read its contents you need to use different file manipulation methods, like this:
for line in page_contents.readlines():
print line
You need to pass a full URL: ie it must begin with http://.
Simply use http://www.radioreference.com/apps/audio/?ctid=5586 and it'll work fine.
In [24]: from urllib2 import urlopen as getpage
In [26]: print getpage("http://www.radioreference.com/apps/audio/?ctid=5586")
<addinfourl at 173987116 whose fp = <socket._fileobject object at 0xa5eb6ac>>

Categories

Resources