I am new to python and I am struggling with encoding
I have a list of String like this:
keys = ["u'part-00000-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv'",
" u'part-00001-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv'"]
I do this to encode
keys = [x.encode('UTF-8') for x in keys]
However I am getting "b" appended, the result being
[b"u'part-00000-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv'",
b" u'part-00001-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv'"]
I thought it would be simpler to just encode with utf-8
What am I doing wrong?
You should first try fixing the method you use to obtain your original list of strings, but if you have no control on that, you can use the following:
>>> import ast
>>> [ast.literal_eval(i.strip()) for i in keys]
The result should be
[u'part-00000-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv',
u'part-00001-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv']
for Python 2, and
['part-00000-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv',
'part-00001-6edc0ee4-de74-4f82-9f8c-b4c965896224-c000.csv']
for Python 3.
Related
I tried below code but it gave
TypeError: string indices must be integers
from nsepython import *
import pandas as pd
import json
positions = nse_optionchain_scrapper ('HDFC')
json_decode = json.dumps(positions,indent = 4, sort_keys=True,separators =(". ", " = "))
print(json_decode['data'][0]['identifier'])
print(json_decode['filtered']['data'][0]['PE']['identifier'])
json_decode = json.dumps(positions,indent = 4, sort_keys=True,separators =(". ", " = "))
You can't build a JSON string (which is what json.dumps does) and then try to access part of the result as if it were the original data structure. Your json_decode is just a string, not a dict; as far as Python is concerned it has no structure beyond the individual characters that make it up.
If you want to access parts of the data, just use positions directly:
print(positions['data'][0]['identifier'])
You can encode just that bit to JSON if you like:
print(json.dumps(positions['data'][0]['identifier'])
but that's probably just a quoted string in this case.
So I'm not sure what your goal is. If you want to print out the JSON version of positions, great, just print it out. But the JSON form is for input and output only; it's not suitable for messing around with inside your Python code.
My data.json is
{"a":[{"b":{"c":{ "foo1":1, "foo2":2, "foo3":3, "foo4":4}}}],"d":[{"e":{"bar1":1, "bar2":2, "bar3":3, "bar4":4}}]}
I am able to list both key/pair values. My code is:
#! /usr/bin/python
import json
from pprint import pprint
with open('data2.json') as data_file:
data = json.load(data_file)
pprint(data["d"][0]["e"])
Which gives me:
{u'bar1': 1, u'bar2': 2, u'bar3': 3, u'bar4': 4}
But I want to display only the keys without any quotes and u like this:
bar1, bar2, bar3, bar4
Can anybody suggest anything? It need not be only in python, can be in shell script also.
The keys of this object are instances of the unicode string class. Given this, the default printing behavior of the dict instance for which they are the keys will print them as you show in your post.
This is because the dict implementation of representing its contents as a string (__repr__ and/or __str__) seeks to show you what objects reside in the dict, not what the string representation of those objects looks like. This is an important distinction, for example:
In [86]: print u'hi'
hi
In [87]: x = u'hi'
In [88]: x
Out[88]: u'hi'
In [89]: print x
hi
This should work for you, assuming that printing the keys together as a comma-separated unicode is fine:
print ", ".join(data["d"][0]["e"])
You can achieve this using the keys member function from dict too, but it's not strictly necessary.
print ', '.join((data["d"][0]["e"].keys()))
data["d"][0]["e"] returns a dict. In python2, You could use this to get the keys of that dict with something like this:
k = data["d"][0]["e"].keys()
print(", ".join(k))
In python3, wrap k in a list like this
k = list(data["d"][0]["e"].keys())
print(", ".join(k))
Even simpler, join will iterate over the keys of the dict.
print(", ".join(data["d"][0]["e"]))
Thanks to #thefourtheye for pointing this out.
I'm trying to urlencode an dictionary in python with urllib.urlencode. The problem is, I have to encode an array.
The result needs to be:
criterias%5B%5D=member&criterias%5B%5D=issue
#unquoted: criterias[]=member&criterias[]=issue
But the result I get is:
criterias=%5B%27member%27%2C+%27issue%27%5D
#unquoted: criterias=['member',+'issue']
I have tried several things, but I can't seem to get the right result.
import urllib
criterias = ['member', 'issue']
params = {
'criterias[]': criterias,
}
print urllib.urlencode(params)
If I use cgi.parse_qs to decode a correct query string, I get this as result:
{'criterias[]': ['member', 'issue']}
But if I encode that result, I get a wrong result back. Is there a way to produce the expected result?
The solution is far simpler than the ones listed above.
>>> import urllib
>>> params = {'criterias[]': ['member', 'issue']}
>>>
>>> print urllib.urlencode(params, True)
criterias%5B%5D=member&criterias%5B%5D=issue
Note the True. See http://docs.python.org/library/urllib.html#urllib.urlencode the doseq variable.
As a side note, you do not need the [] for it to work as an array (which is why urllib does not include it). This means that you do not not need to add the [] to all your array keys.
You can use a list of key-value pairs (tuples):
>>> urllib.urlencode([('criterias[]', 'member'), ('criterias[]', 'issue')])
'criterias%5B%5D=member&criterias%5B%5D=issue'
To abstract this out to work for any parameter dictionary and convert it into a list of tuples:
import urllib
def url_encode_params(params={}):
if not isinstance(params, dict):
raise Exception("You must pass in a dictionary!")
params_list = []
for k,v in params.items():
if isinstance(v, list): params_list.extend([(k, x) for x in v])
else: params_list.append((k, v))
return urllib.urlencode(params_list)
Which should now work for both the above example as well as a dictionary with some strings and some arrays as values:
criterias = ['member', 'issue']
params = {
'criterias[]': criterias,
}
url_encode_params(params)
>>'criterias%5B%5D=member&criterias%5B%5D=issue'
Listcomp of values:
>>> criterias = ['member', 'issue']
>>> urllib.urlencode([('criterias[]', i) for i in criterias])
'criterias%5B%5D=member&criterias%5B%5D=issue'
>>>
as aws api defines its get url: params.0=foo¶ms.1=bar
however, the disadvantage is that you need to write code to encode and decode by your own, the result is: params=[foo, bar]
I have sample response with friends list from facebook:
[{u'uid': 513351886, u'name': u'Mohammed Hossein', u'pic_small': u'http://profile.ak.fbcdn.net/hprofile-ak-snc4/hs643.snc3/27383_513351886_4933_t.jpg'},
{u'uid': 516583220, u'name': u'Sim Salabim', u'pic_small': u'http://profile.ak.fbcdn.net/hprofile-ak-snc4/hs348.snc4/41505_516583220_5681339_t.jpg'}]
How I could parse through this list encoding key's of the dictionaries to ascii ? I've tried something like this :
response = simplejson.load(urllib.urlopen(REST_SERVER, data))
for k in response:
for id, stuff in k.items():
id.encode("ascii")
logging.debug("id: %s" % id)
return response
But encoded keys are not saved and as a result I'm still getting unicode values.
First: do you really need to do this? The strings are in Unicode for a reason: you simply can't represent everything in plain ASCII that you can in Unicode. This probably won't be a problem for your dictionary keys 'uid', 'name' and 'pic_small'; but it probably won't be a problem to leave them as Unicode, either. (The 'simplejson' library does not know anything about your data, so it uses Unicode for every string - better safe than sorry.)
Anyway:
In Python, strings cannot be modified. The .encode method does not change the string; it returns a new string that is the encoded version.
What you want to do is produce a new dictionary, which replaces the keys with the encoded keys. We can do this by passing each pair of (encoded key, original value) as *args for the dict constructor.
That looks like:
dict((k.encode('ascii'), v) for (k, v) in original.items())
Similarly, we can use a list comprehension to apply this to every dictionary, and create the new list. (We can modify the list in-place, but this way is cleaner.)
response = simplejson.load(urllib.urlopen(REST_SERVER, data))
# We create the list of modified dictionaries, and re-assign 'response' to it:
response = [
dict((k.encode('ascii'), v) for (k, v) in original.items()) # the modified version
for original in response # of each original dictionary.
]
return response
Your other responses hint at this but don't come out and say it: dictionary lookup and string comparison in Python transparently convert between Unicode and ASCII:
>>> x = {u'foo':'bar'} # unicode key, ascii value
>>> x['foo'] # look up by ascii
'bar'
>>> x[u'foo'] # or by unicode
'bar'
>>> x['foo'] == u'bar' # ascii value has a unicode equivalent
True
So for most uses of a dictionary converted from JSON, you don't usually need to worry about the fact that everything's Unicode.
I'm trying to urlencode an dictionary in python with urllib.urlencode. The problem is, I have to encode an array.
The result needs to be:
criterias%5B%5D=member&criterias%5B%5D=issue
#unquoted: criterias[]=member&criterias[]=issue
But the result I get is:
criterias=%5B%27member%27%2C+%27issue%27%5D
#unquoted: criterias=['member',+'issue']
I have tried several things, but I can't seem to get the right result.
import urllib
criterias = ['member', 'issue']
params = {
'criterias[]': criterias,
}
print urllib.urlencode(params)
If I use cgi.parse_qs to decode a correct query string, I get this as result:
{'criterias[]': ['member', 'issue']}
But if I encode that result, I get a wrong result back. Is there a way to produce the expected result?
The solution is far simpler than the ones listed above.
>>> import urllib
>>> params = {'criterias[]': ['member', 'issue']}
>>>
>>> print urllib.urlencode(params, True)
criterias%5B%5D=member&criterias%5B%5D=issue
Note the True. See http://docs.python.org/library/urllib.html#urllib.urlencode the doseq variable.
As a side note, you do not need the [] for it to work as an array (which is why urllib does not include it). This means that you do not not need to add the [] to all your array keys.
You can use a list of key-value pairs (tuples):
>>> urllib.urlencode([('criterias[]', 'member'), ('criterias[]', 'issue')])
'criterias%5B%5D=member&criterias%5B%5D=issue'
To abstract this out to work for any parameter dictionary and convert it into a list of tuples:
import urllib
def url_encode_params(params={}):
if not isinstance(params, dict):
raise Exception("You must pass in a dictionary!")
params_list = []
for k,v in params.items():
if isinstance(v, list): params_list.extend([(k, x) for x in v])
else: params_list.append((k, v))
return urllib.urlencode(params_list)
Which should now work for both the above example as well as a dictionary with some strings and some arrays as values:
criterias = ['member', 'issue']
params = {
'criterias[]': criterias,
}
url_encode_params(params)
>>'criterias%5B%5D=member&criterias%5B%5D=issue'
Listcomp of values:
>>> criterias = ['member', 'issue']
>>> urllib.urlencode([('criterias[]', i) for i in criterias])
'criterias%5B%5D=member&criterias%5B%5D=issue'
>>>
as aws api defines its get url: params.0=foo¶ms.1=bar
however, the disadvantage is that you need to write code to encode and decode by your own, the result is: params=[foo, bar]