I am trying to urlencode this string before I submit.
queryString = 'eventName=' + evt.fields["eventName"] + '&' + 'eventDescription=' + evt.fields["eventDescription"];
Python 2
What you're looking for is urllib.quote_plus:
safe_string = urllib.quote_plus('string_of_characters_like_these:$##=?%^Q^$')
#Value: 'string_of_characters_like_these%3A%24%23%40%3D%3F%25%5EQ%5E%24'
Python 3
In Python 3, the urllib package has been broken into smaller components. You'll use urllib.parse.quote_plus (note the parse child module)
import urllib.parse
safe_string = urllib.parse.quote_plus(...)
You need to pass your parameters into urlencode() as either a mapping (dict), or a sequence of 2-tuples, like:
>>> import urllib
>>> f = { 'eventName' : 'myEvent', 'eventDescription' : 'cool event'}
>>> urllib.urlencode(f)
'eventName=myEvent&eventDescription=cool+event'
Python 3 or above
Use urllib.parse.urlencode:
>>> urllib.parse.urlencode(f)
eventName=myEvent&eventDescription=cool+event
Note that this does not do url encoding in the commonly used sense (look at the output). For that use urllib.parse.quote_plus.
Try requests instead of urllib and you don't need to bother with urlencode!
import requests
requests.get('http://youraddress.com', params=evt.fields)
EDIT:
If you need ordered name-value pairs or multiple values for a name then set params like so:
params=[('name1','value11'), ('name1','value12'), ('name2','value21'), ...]
instead of using a dictionary.
Context
Python (version 2.7.2 )
Problem
You want to generate a urlencoded query string.
You have a dictionary or object containing the name-value pairs.
You want to be able to control the output ordering of the name-value pairs.
Solution
urllib.urlencode
urllib.quote_plus
Pitfalls
dictionary output arbitrary ordering of name-value pairs
(see also: Why is python ordering my dictionary like so?)
(see also: Why is the order in dictionaries and sets arbitrary?)
handling cases when you DO NOT care about the ordering of the name-value pairs
handling cases when you DO care about the ordering of the name-value pairs
handling cases where a single name needs to appear more than once in the set of all name-value pairs
Example
The following is a complete solution, including how to deal with some pitfalls.
### ********************
## init python (version 2.7.2 )
import urllib
### ********************
## first setup a dictionary of name-value pairs
dict_name_value_pairs = {
"bravo" : "True != False",
"alpha" : "http://www.example.com",
"charlie" : "hello world",
"delta" : "1234567 !##$%^&*",
"echo" : "user#example.com",
}
### ********************
## setup an exact ordering for the name-value pairs
ary_ordered_names = []
ary_ordered_names.append('alpha')
ary_ordered_names.append('bravo')
ary_ordered_names.append('charlie')
ary_ordered_names.append('delta')
ary_ordered_names.append('echo')
### ********************
## show the output results
if('NO we DO NOT care about the ordering of name-value pairs'):
queryString = urllib.urlencode(dict_name_value_pairs)
print queryString
"""
echo=user%40example.com&bravo=True+%21%3D+False&delta=1234567+%21%40%23%24%25%5E%26%2A&charlie=hello+world&alpha=http%3A%2F%2Fwww.example.com
"""
if('YES we DO care about the ordering of name-value pairs'):
queryString = "&".join( [ item+'='+urllib.quote_plus(dict_name_value_pairs[item]) for item in ary_ordered_names ] )
print queryString
"""
alpha=http%3A%2F%2Fwww.example.com&bravo=True+%21%3D+False&charlie=hello+world&delta=1234567+%21%40%23%24%25%5E%26%2A&echo=user%40example.com
"""
Python 3:
urllib.parse.quote_plus(string, safe='', encoding=None, errors=None)
Try this:
urllib.pathname2url(stringToURLEncode)
urlencode won't work because it only works on dictionaries. quote_plus didn't produce the correct output.
Note that the urllib.urlencode does not always do the trick. The problem is that some services care about the order of arguments, which gets lost when you create the dictionary. For such cases, urllib.quote_plus is better, as Ricky suggested.
In Python 3, this worked with me
import urllib
urllib.parse.quote(query)
for future references (ex: for python3)
>>> import urllib.request as req
>>> query = 'eventName=theEvent&eventDescription=testDesc'
>>> req.pathname2url(query)
>>> 'eventName%3DtheEvent%26eventDescription%3DtestDesc'
If the urllib.parse.urlencode( ) is giving you errors , then Try the urllib3 module .
The syntax is as follows :
import urllib3
urllib3.request.urlencode({"user" : "john" })
For use in scripts/programs which need to support both python 2 and 3, the six module provides quote and urlencode functions:
>>> from six.moves.urllib.parse import urlencode, quote
>>> data = {'some': 'query', 'for': 'encoding'}
>>> urlencode(data)
'some=query&for=encoding'
>>> url = '/some/url/with spaces and %;!<>&'
>>> quote(url)
'/some/url/with%20spaces%20and%20%25%3B%21%3C%3E%26'
import urllib.parse
query = 'Hellö Wörld#Python'
urllib.parse.quote(query) // returns Hell%C3%B6%20W%C3%B6rld%40Python
Another thing that might not have been mentioned already is that urllib.urlencode() will encode empty values in the dictionary as the string None instead of having that parameter as absent. I don't know if this is typically desired or not, but does not fit my use case, hence I have to use quote_plus.
For Python 3 urllib3 works properly, you can use as follow as per its official docs :
import urllib3
http = urllib3.PoolManager()
response = http.request(
'GET',
'https://api.prylabs.net/eth/v1alpha1/beacon/attestations',
fields={ # here fields are the query params
'epoch': 1234,
'pageSize': pageSize
}
)
response = attestations.data.decode('UTF-8')
If you don't want to use urllib.
https://github.com/wayne931121/Python_URL_Decode
#保留字元的百分號編碼
URL_RFC_3986 = {
"!": "%21", "#": "%23", "$": "%24", "&": "%26", "'": "%27", "(": "%28", ")": "%29", "*": "%2A", "+": "%2B",
",": "%2C", "/": "%2F", ":": "%3A", ";": "%3B", "=": "%3D", "?": "%3F", "#": "%40", "[": "%5B", "]": "%5D",
}
def url_encoder(b):
# https://zh.wikipedia.org/wiki/%E7%99%BE%E5%88%86%E5%8F%B7%E7%BC%96%E7%A0%81
if type(b)==bytes:
b = b.decode(encoding="utf-8") #byte can't insert many utf8 charaters
result = bytearray() #bytearray: rw, bytes: read-only
for i in b:
if i in URL_RFC_3986:
for j in URL_RFC_3986[i]:
result.append(ord(j))
continue
i = bytes(i, encoding="utf-8")
if len(i)==1:
result.append(ord(i))
else:
for c in i:
c = hex(c)[2:].upper()
result.append(ord("%"))
result.append(ord(c[0:1]))
result.append(ord(c[1:2]))
result = result.decode(encoding="ascii")
return result
#print(url_encoder("我好棒==%%0.0:)")) ==> '%E6%88%91%E5%A5%BD%E6%A3%92%3D%3D%%0.0%3A%29'
Related
i have following string in python
b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
I want to print the all alphabet next to keyword "name" such that my output should be
waqas
Note the waqas can be changed to any number so i want print any name next to keyword name using string operation or regex?
First you need to decode the string since it is binary b. Then use literal eval to make the dictionary, then you can access by key
>>> s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
>>> import ast
>>> ast.literal_eval(s.decode())['name']
'waqas'
It is likely you should be reading your data into your program in a different manner than you are doing now.
If I assume your data is inside a JSON file, try something like the following, using the built-in json module:
import json
with open(filename) as fp:
data = json.load(fp)
print(data['name'])
if you want a more algorithmic way to extract the value of name:
s = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a",\
"persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],\
"name":"waqas"}'
s = s.decode("utf-8")
key = '"name":"'
start = s.find(key) + len(key)
stop = s.find('"', start + 1)
extracted_string = s[start : stop]
print(extracted_string)
output
waqas
You can convert the string into a dictionary with json.loads()
import json
mystring = b'{"personId":"65a83de6-b512-4410-81d2-ada57f18112a","persistedFaceIds":["792b31df-403f-4378-911b-8c06c06be8fa"],"name":"waqas"}'
mydict = json.loads(mystring)
print(mydict["name"])
# output 'waqas'
First you need to convert the string into a proper JSON Format by removing b from the string using substring in python suppose you have a variable x :
import json
x = x[1:];
dict = json.loads(x) //convert JSON string into dictionary
print(dict["name"])
i've stucked, i have a json with symbols like ' in values and syntax with ' and "
Example mix double qoute and single qoutelink
json ={
'key': "val_'_ue",
'secondkey': 'value'
}
With json loads and json dumps i got a str type not a dict to iterate, any ideas how i get it fixed?
print(postParams)# = {'csrf-token': "TOKEN_INCLUDES_'_'_symbols", 'param2': 'params2value'}
jsn_dict2 = json.loads(json.dumps(postParams))
print(type(jsn_dict2)) # ERROR HERE why str and not dict
for key, val in jsn_dict2.items():
print("key="+str(key))
you dont need to dumps() an already string json data:
jsn_dict = json.loads(json.dumps(res))
should be :
jsn_dict = json.loads(res)
UPDATE
according to comments the data is looks like so:
postParams = "{'csrf-token': \"TOKEN_INCLUDES_'_'_symbols\", 'add-to-your-blog-submit-button': 'add-to-your-blog-submit-button'}"
so i found an library that can help damaged json string like this one:
first run :
pip install demjson
then this code can help you:
from demjson import decode
data = decode(postParams)
data
>>> {'csrf-token': "TOKEN_INCLUDES_'_'_symbols",
'add-to-your-blog-submit-button': 'add-to-your-blog-submit-button'}
In your json u have missed the "," comma separation between two keys. The actual structure of the JSON is
json_new ={
'key': "val_'_ue",
'secondkey': 'value'
}
use
json_actual=json.dumps(json_new)
to read,
json_read=json.loads(json_actual)
I am trying to run a scraper I found online but receive a ValueError: too many values to unpack on this line of code
k, v = piece.split("=")
This line is part of this function
def format_url(url):
# make sure URLs aren't relative, and strip unnecssary query args
u = urlparse(url)
scheme = u.scheme or "https"
host = u.netloc or "www.amazon.com"
path = u.path
if not u.query:
query = ""
else:
query = "?"
for piece in u.query.split("&"):
k, v = piece.split("=")
if k in settings.allowed_params:
query += "{k}={v}&".format(**locals())
query = query[:-1]
return "{scheme}://{host}{path}{query}".format(**locals())
If you have any input it would be appreciated, thank you.
Instead of parsing the urls yourself, you can use urlparse.parse_qs function:
>>> from urlparse import urlparse, parse_qs
>>> URL = 'https://someurl.com/with/query_string?i=main&mode=front&sid=12ab&enc=+Hello'
>>> parsed_url = urlparse(URL)
>>> parse_qs(parsed_url.query)
{'i': ['main'], 'enc': [' Hello '], 'mode': ['front'], 'sid': ['12ab']}
(source)
This is due to the fact that one of the pieces contains two or more '=' characters. In that case you thus return a list of three or more elements. And you cannot assign it to the two values.
You can solve that problem, by splitting at most one '=' by adding an additional parameter to the .split(..) call:
k, v = piece.split("=",1)
But now we still do not have guarantees that there is an '=' in the piece string anyway.
We can however use the urllib.parse module in python-3.x (urlparse in python-2.x):
from urllib.parse import urlparse, parse_qsl
purl = urlparse(url)
quer = parse_qsl(purl.query)
for k,v in quer:
# ...
pass
Now we have decoded the query string as a list of key-value tuples we can process separately. I would advice to build up a URL with the urllib as well.
You haven't shown any basic debugging: what is piece at the problem point? If it has more than a single = in the string, the split operation will return more than 2 values -- hence your error message.
If you want to split on only the first =, then use index to get the location, and grab the slices you need:
pos = piece.index('=')
k = piece[:pos]
v = piece[pos+1:]
I have a very string as output of function as follows:
tmp = <"last seen":1568,"reviews [{"id":15869,"author":"abnbvg","changes":........>
How will I fetch the "id":15869 out of it?
The string content looks like JSON, so either use the json module or use a regular expression to extract the specific string you need.
The data looks like a JSON string. Use:
try:
import json
except ImportError:
import simplejson as json
tmp = '"last seen":1568,"reviews":[{"id":15869,"author":"abnbvg"}]'
data = json.loads('{{{}}}'.format(tmp))
>>> print data
{u'reviews': [{u'id': 15869, u'author': u'abnbvg'}], u'last seen': 1568}
>>> print data['reviews'][0]['id']
15869
Note that I wrapped the string in { and } to make a dictionary. You might not have to do that if the actual JSON string is already encapsulated with braces.
If id is the only thing you need from the string and it will always be something like {"id":15869,"author":"abnbvg"..., then you can go with sinple string split instead of json conversion.
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
Please note that tmp1 line may raise IndexError in case there is no "id" key found in the string. You can use -1 instead of 1 to side step. But in this way, you can report that "id" is not found.
try:
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
except IndexError:
print "id key is not present in the json"
id = None
If you do really need more variables from the json string, please go with mhawke's solution of converting the json to dictionary and getting the value. You can use ast.literal_eval
from ast import literal_eval
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp_dict = literal_eval("""{%s}"""%(tmp))
print tmp_dict["reviews"][0]["id"]
In the second case, if you need to collect all the "id" keys in the list, this will help:
id_list =[]
for id_dict in tmp_dict["reviews"]:
id_list.append(id_dict["id"])
print id_list
I have a section of a log file that looks like this:
"/log?action=End&env=123&id=8000&cat=baseball"
"/log?action=start&get=3210&rsa=456&key=golf"
I want to parse out each section so the results would look like this:
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
('/log?action=', 'start', 'get=3210', 'rsa=456', 'key=golf')
I've looked into regex and matching, but a lot of my logs have different sequences which leads me to believe that it is not possible. Any suggestions?
This is clearly a fragment of a URL, so the best way to parse it is to use URL parsing tools. The stdlib comes with urlparse, which does exactly what you want.
For example:
>>> import urlparse
>>> s = "/log?action=End&env=123&id=8000&cat=baseball"
>>> bits = urlparse.urlparse(s)
>>> variables = urlparse.parse_qs(bits.query)
>>> variables
{'action': ['End'], 'cat': ['baseball'], 'env': ['123'], 'id': ['8000']}
If you really want to get the format you asked for, you can use parse_qsl instead, and then join the key-value pairs back together. I'm not sure why you want the /log to be included in the first query variable, or the first query variable's value to be separate from its variable, but even that is doable if you insist:
>>> variables = urlparse.parse_qsl(s)
>>> result = (variables[0][0] + '=', variables[0][1]) + tuple(
'='.join(kv) for kv in variables[1:])
>>> result
('/log?action=', 'End', 'env=123', 'id=8000', 'cat=baseball')
If you're using Python 3.x, just change the urlparse to urllib.parse, and the rest is exactly the same.
You can split a couple times:
s = '/log?action=End&env=123&id=8000&cat=baseball'
L = s.split("&")
L[0:1]=L[0].split("=")
Output:
['/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball']
It's a bit hard to say without knowing what the domain of possible inputs is, but here's a guess at what will work for you:
log = "/log?action=End&env=123&id=8000&cat=baseball\n/log?action=start&get=3210&rsa=456&key=golf"
logLines = [line.split("&") for line in log.split('\n')]
logLines = [tuple(line[0].split("=")+line[1:]) for line in logLines]
print logLines
OUTPUT:
[('/log?action', 'End', 'env=123', 'id=8000', 'cat=baseball'),
('/log?action', 'start', 'get=3210', 'rsa=456', 'key=golf')]
This assumes that you don't really need the "=" at the end of the first string.