Parsing JSON with Python - Case Sensitivity for Keys

Parsing JSON with Python - Case Sensitivity for Keys - python

I'm trying to read a json from mulitple REST end points using json.load - the key I'm interested in however has a different case depending on which end point (some are 'name' others are 'Name'). I've tried the following to get around it:
for x in urls:
response_json = json.load(urllib.request.urlopen(x))
try:
projects.append(response_json['attributes']['Name'])
print(response_json['attributes']['Name'])
except:
projects.append(response_json['attributes']['name'])
print(response_json['attributes']['name'])
However I'm getting KeyErrors for both permutations of the 'name' key. The code works fine if I remove the TRY...EXCEPT and only specify one case but then I need to know what is it for each URL. Why would that be? is there a better approach?

You should only use try and catch for error handling.
Instead of using try and catch to split your code, you can use a simple if-else condition check. It'll allow you to check for more conditions.
Keep in mind that the below might not be the exact code since I don't have your JSON, however, this should guide you towards the solution here.
if 'Name' in response_json['attributes']:
print(response_json['attributes']['Name'])
elif 'name' in response_json['attributes']:
print(response_json['attributes']['name'])
else:
print(response_json['attributes'].keys())

Related

Divide and Conquer Lists in Python (to read sav files using pyreadstat)

I am trying to read sav files using pyreadstat in python but for some rare scenarios I am getting error of UnicodeDecodeError since the string variable has special characters.
To handle this I think instead of loading the entire variable set I will load only variables which do not have this error.
Below is the pseudo-code that I have with me. This is not a very efficient code since I check for error in each item of list using try and except.
# Reads only the medata to get information about the variables
df, meta = pyreadstat.read_sav('Test.sav', metadataonly=True)
list = meta.column_names # All variables are stored in list
result = []
for var in list:
print(var)
try:
df, meta = pyreadstat.read_sav('Test.sav', usecols=[str(var)])
# If no error that means we can store this variable in result
result.append(var)
except:
pass
# This will finally load the sav for non error variables
df, meta = pyreadstat.read_sav('Test.sav', usecols=result)
For a sav file with 1000+ variables it takes a long amount of time to process this.
I was thinking if there is a way to use divide and conquer approach and do it faster. Below is my suggested approach but I am not very good in implementing recursion algorithm. Can someone please help me with pseudo code it would be very helpful.
Take the list and try to read sav file
In case of no error then output can be stored in result and then we read the sav file
In case of error then split the list into 2 parts and run these again ....
Step 3 needs to run again until we have a list where it does not give any error
Using the second approach 90% of my sav files will get loaded on the first pass itself hence I think recursion is a good method
You can try to reproduce the issue for sav file here

For this specific case I would suggest a different approach: you can give an argument "encoding" to pyreadstat.read_sav to manually set the encoding. If you don't know which one it is, what you can do is iterate over the list of encodings here: https://gist.github.com/hakre/4188459 to find out which one makes sense. For example:
# here codes is a list with all the encodings in the link mentioned before
for c in codes:
try:
df, meta = p.read_sav("Test.sav", encoding=c)
print(encoding)
print(df.head())
except:
pass
I did and there were a few that may potentially make sense, assuming that the string is in a non-latin alphabet. However the most promising one is not in the list: encoding="UTF8" (the list contains UTF-8, with dash and that fails). Using UTF8 (no dash) I get this:
నేను గతంలో వాడిన బ
which according to google translate means "I used to come b" in Telugu. Not sure if that fully makes sense, but it's a way.
The advantage of this approach is that if you find the right encoding, you will not be loosing data, and reading the data will be fast. The disadvantage is that you may not find the right encoding.
In case you would not find the right encoding, you anyway would be reading the problematic columns very fast, and you can discard them later in pandas by inspecting which character columns do not contain latin characters. This will be much faster than the algorithm you were suggesting.

Python MySQL if variable does not exists still submit

I'm doing some queries ( inserts ) to a database based on some input.
However not all the times I get all the data from the user. I still would like though to insert the data that I received. I have a table with close to
10 columns but in the data I also have some arrays.
When I'm trying to insert something I get an exception that the say input['name'] does not exists and the query is not executed.
Is there some way to quickly manage that? If a variable does isn't defined simply throw a warning like in PHP and not break the whole loop.
New to python and only thing I can think of is to check for every single variable but I'd really hope there's something more simpler than this and quicker.

Do input.get('name')
From the docs https://docs.python.org/2/library/stdtypes.html#dict.get
Return the value for key if key is in the dictionary, else default.
If default is not given, it defaults to None, so that this method never raises a KeyError.

You should look into exception handling. It sounds like you need to use an try-except-else where you're making use of input['name']

Creating dictionary from xlsx: TypeError: argument of type 'Book' is not iterable

I'm pretty new to Python (and the xlrd module), so my code is probably not nearly as compact as it could be. I'm just using it to analyse some data, so it's more important for me to get what I'm doing rather than for me to make the code as compact as possible (though I do hope to improve, so feel free to give me advice on the coding itself, provided you manage to explain it to a 'newbie' :p )
That being said, here's my issue:
Context
I have an xlsx file with data on errors that people made when translating a text. The first column contains a code for the error relative to the text (conceptual errors), the second column contains a code for the translator that made the error. I want to create a dictionary in which the keys are the conceptual error codes, and the values are lists of the different translators that made that conceptual error.
An short fragment from the xlsx (to give you an idea of the codes in the two columns):
1722_Z1_CF5 1722_HT_EV_Z1_F1
1722_Z1_CF1 1722_PE_AL_Z1_F1
1722_Z1_CF9 1722_PE_EVC_Z1_F1
1722_Z1_CF5 1722_PE_LH_Z1_F1
As you can see, the conceptual error '1722_Z1_CF5' has been made by 2 different people ('1722_HT_EV_Z1_F1' and '1722_PE_LH_Z1_F1). The dictionary for this fragment would look something like:
1722_Z1_CF5: 1722_HT_EV_Z1_F1, 1722_PE_LH_Z1_F1
1722_Z1_CF1: 1722_PE_AL_Z1_F1
1722_Z1_CF9: 1722_PE_EVC_Z1_F1
Code
The code below is what I tried to do to create the dictionary.
def TranslatorsPerError(sheet):
TotalConceptualErrors(sheet)
TranslatorsPerError = {}
for row_index in range(sheet.nrows):
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value not in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)]=[str(sheet.cell(row_index,1).value),]
if sheet.cell(row_index,0).value in ConceptualErrors and sheet.cell(row_index,0).value in TranslatorsPerError:
TranslatorsPerError[str(sheet.cell(row_index,0).value)].append(str(sheet.cell(row_index,1).value))
return TranslatorsPerError
'TotalConceptualErrors' is a function I created that returns a list ('ConceptualErrors') of the conceptual error codes from the first column without duplicates (and it filters out some other information that was also present in the first column, that's why I needed to use this one first).
Problem
The problem is that this function keeps giving me an error: TypeError: argument of type 'Book' is not iterable
I know that problems with iterables can sometimes be solved by casting certain things into a different type, but I'm not sure how I should solve this one. I tried to use 'str()' for different elements, but that didn't solve the problem. Maybe it has something to do with my code, maybe with the nature of dictionaries or xlrd... (looking at the type 'book', my guess would be on the latter).
Any help or feedback on how to fix this would be greatly appreciated. If you need extra information to understand what's going on or what I'm looking for, please ask.

where is ConceptualErrors being set?

check arbitrary list index existence in python

See the following code,
if 'COMPONENTS' in prod.keys() and len(prod['COMPONENTS'])>0 and len(prod['COMPONENTS'][0])>1 and len(prod['COMPONENTS'][0][1])>0 and len(prod['COMPONENTS'][0][1][0])>2:
names = prod['COMPONENTS'][0][1][0][2]
if type(names) == list and len(names)>0 and type(names[0]) == str:
#names is proper. Now fetch prices
if len(prod['COMPONENTS'][0])>3:
lnames = len(names)
prices = [prod['COMPONENTS'][0][3][i][2][1][0][1] for i in range(0, lanmes)]
See how I am using prod['COMPONENTS'][0][1][0][2] and prod['COMPONENTS'][0][3][i][2][1][0][1]. prod is a very deep nested list. I want to check if element exists in such index.
I dind't find any way. Currently I am using a long condition on if statement. See above how long they are. They are terrible.
So is there any way to check if prod can satisfy ['COMPONENTS'][0][3][i][2][1][0][1] indexes?

Simplest way is just to do it and catch the error:
try:
names = prod['COMPONENTS'][0][1][0][2]
except LookupError:
print "It failed"
# Do whatever you need to do in case of failure
LookupError will catch missing indices in a list or missing keys in a dictionary.
Needless to say, though, you've already found the real problem: you're using an unwieldy and awkward data structure. It might be possible to bypass the problem by storing your data in a different way.

sorting lines based on data in the middle of the line (in python)

I have a list of domains and I want to sort them based on tld. whats the fastest way to do this?

Use the key parameter to .sort() to provide a function that can retrieve the proper data to sort by.
import urlparse
def get_tld_from_domain(domain)
return urlparse.urlparse(domain).netloc.split('.')[-1]
list_of_domains.sort(key=get_tld_from_domain)
# or if you want to make a new list, instead of sorting the old one
sorted_list_of_domains = sorted(list_of_domains, key=get_tld_from_domain)
If you preferred, you could not define the function separately but instead just use a lambda function, but defining it separately can often make your code easier to read, which is always a plus.

Also, remember that it is not trivial to get the TLD from a URL. Please check this link on SO. In python you can use the urlparse to parse URLs.

As Gangadhar says, it's hard to know definitively which part of the netloc is the tld, but in your case I would modify Amber's code slightly. This will sort on the entire domain, by the last level first, then the second to last level, and so on.
This may be good enough for what you need without needing to refer to external lists
import urlparse
def get_reversed_domain(domain)
return urlparse.urlparse(domain).netloc.split('.')[::-1]
sorted_list_of_domains = sorted(list_of_domains, key=get_reversed_domain)
Just reread the OP, if the list is already just domains you can simply use
sorted_list_of_domains = sorted(list_of_domains, key=lambda x:x.split('.')[::-1])

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.