sorting multi-language country names - python

I have a list of country names in different languages that I am attempting to sort by their country name. Currently, the sort is based on the index value.
Here is my truncated list of country names:
ADDRESS_COUNTRY_STYLE_TYPES = {}
for language_code in LANGUAGES.iterkeys():
ADDRESS_COUNTRY_STYLE_TYPES[language_code] = OrderedDict()
if 'af' in LANGUAGES.iterkeys():
ADDRESS_COUNTRY_STYLE_TYPES['af'][0] = " Kies 'n land of gebied" # Select a country or territory
ADDRESS_COUNTRY_STYLE_TYPES['af'][1] = "Afganistan" #Afghanistan
ADDRESS_COUNTRY_STYLE_TYPES['af'][2] = "Åland" #Aland
ADDRESS_COUNTRY_STYLE_TYPES['af'][3] = "Albanië" #Albania
....
ADDRESS_COUNTRY_STYLE_TYPES['af'][14] = "Australië" #Australia
ADDRESS_COUNTRY_STYLE_TYPES['af'][15] = "Oostenryk" #Austria
ADDRESS_COUNTRY_STYLE_TYPES['af'][16] = "Aserbeidjan" #Azerbaijan
ADDRESS_COUNTRY_STYLE_TYPES['af'][17] = "Bahamas" #Bahamas
ADDRESS_COUNTRY_STYLE_TYPES['af'][18] = "Bahrein" #Bahrain
ADDRESS_COUNTRY_STYLE_TYPES['af'][19] = "Bangladesj" #Bangladesh
ADDRESS_COUNTRY_STYLE_TYPES['af'][20] = "Barbados" #Barbados
ADDRESS_COUNTRY_STYLE_TYPES['af'][21] = "Wit-Rusland" #Belarus
ADDRESS_COUNTRY_STYLE_TYPES['af'][22] = "België" #Belgium
....
Here is my code I have in my views.py file that calls the country names:
def get_address_country_style_types(available_languages, with_country_style_zero=True):
address_country_style_types = {}
preview_labels = {}
for code, name in available_languages:
address_country_style_types[code] = ADDRESS_COUNTRY_STYLE_TYPES[code].copy()
if not with_country_style_zero:
address_country_style_types[code].pop(0)
preview_labels[code] = ADDRESS_DETAILS_LIVE_PREVIEW_LABELS[code]
# in case preview labels are not defined for the language code
# fall back to 'en', which should always be there
if len(preview_labels[code]) == 0:
preview_labels[code] = ADDRESS_DETAILS_LIVE_PREVIEW_LABELS['en']
address_country_style_types = sorted(address_country_style_types, key=lambda x:x[1])
return address_country_style_types, preview_labels
The above code only returns the index number in the html drop down list. The issue is with the following line of code (or more to the point my lack of knowledge of how to get it working):
address_country_style_types = sorted(address_country_style_types, key=lambda x:x[1])
How do I return the sorted country list ? Am I using lambda in the correct way here? Should I be using lambda here?
I have been working on this over several days, my coding skills are not very strong, and I have read many related posts to no avail, so any help is appreciated.
I have read this blog about sorting a list of multilingual country names that appear in a form HTML select drop down list - which is essentially what I am attempting to do.
EDIT
Commenting out the line of code below in the code above does return a list of country names, but the country names are sorted by the index value not the country name.
address_country_style_types = sorted(address_country_style_types, key=lambda x:x[1])

I have failed to sort the multi-language country names programatically.
Instead, I copied the list into excel and hit the sort button (based on the translated country name - the index value stays uniform), then copied the data back to the file. Works as expected - just a lot of work.
I hope that this helps someone.

Related

How can I take the string of names and preferences & add them to the dictionary with names as the key?

My code as it is right now looks like this:
def read_in_movie_preference():
"""Read the move data, and return a
preference dictionary."""
preference = {}
movies = []
# write code here:
file_location="./data/"
f = open(file_location+"preference.csv","r")
df = f.readlines()
#names as keys and prefrences
for line in df:
name = line[1].strip("\n").split(",")
prefs = line[2:].strip("\n").split(",")
preference[line[1]] = line[2:]
#print(test)
#movie names`
movietitles = df[0].strip("\n").split(",")
for movie in movietitles:
movie=movie.rstrip()
#can't seem to get rid of the spaces at the end
movies+=movietitles[2:]
print(movies)
return [movies, preference]
I cant seem to get the movie titles into the list without spaces at the end of some of them & I also cant add the names and preferences into the dictionary... I am supposed to do this task with basic python and no pandas .. very stuck would appreciate any help!
the dictionary would have names as keys and the preference numbers in number format instead of strings so it would theoretically look like this:
key: pref:
dennis, 0 1 0 1 0 ect
[![enter image description here][1]][1]this is what the data set looks like
here is the data pasted:
So the issue here is that you are using rstrip on a copy of the data but never apply it to the original.
The issue
for movie in movietitles:
movie=movie.rstrip() # Changes the (copy) of the data rather than the original
# We still need to apply this back to movietitles
There are a couple ways to fix this!
# Using indexing
for _ in range(len(movietitles)):
movietitles[_] = movietitles[_].rstrip()
Or we can do this inline with list comprehension
# Using list comprehension
movietitles = [movie.rstrip() for movie in movietitles]
As stated in the other answer, when working with csv data it's recomended to use a csv parser, but completely unnecessary for this scale! Hope this helps

Display list sorted alphabetically

We're trying to create a function that takes the input, some data containing the following information: ID number, Name, as well as a number of columns containing the grades for different assignments, and then sorts the data alphabetically (according to the name) and then displays the data with a column added that also displays the final grade (that we calculate with another function we made). We've tried writing the following code, but can't get it to work... The error-message given is "names = GRADESdata[:,1].tolist() TypeError: string indices must be integers".
Can anyone help us to figure out how to get it working?
def listOfgrades(GRADESdata):
names = GRADESdata[:,1].tolist()
names = names.sort(names)
assignments = GRADESdata[:,2::]
final_grades = computeFinalGrades(GRADESdata)
final_grades = np.array(final_grades.reshape(len(final_grades),1))
List_of_grades = np.hstack((GRADESdata, final_grades))
NOofColumns = np.size(GRADESdata,axis = 1)
display = np.zeros(NOofColumns)
for i in names:
display = np.vstack((display,GRADESdata[GRADESdata[:,1] == i]))
grades = display[1::,2:-1]
gradesfinal = display[1::,-1]
#Column titles
c = {"Student ID": GRADESdata[1::,0], "Name": GRADESdata[1::,1]}
for i in range(GRADESdata.shape[1]):
c["Assign.{}".format(i+1)] = GRADESdata[:,i]
c["Final grade"] = final_grades
d = pd.DataFrame(c)
print(d.to_string())
display = np.array([student_list, names, assignments, final_grades])
return display
The expected output is something like this (with the data below ofc):
ID number Name Assignment 1 Assignment 2 Final Grade
EDIT: the data input is a .csv file containing the following data:ID number,Name,Assignment 1,Assignment 2, etc.
The comma in
names = GRADESdata[:,1].tolist()
is not a valid character. the part between [: and ] must be an integer
From looking at .tolist(), I assume the data structure you're supposed to use is numpy.ndarray.
I managed to replicate the error with the following code:
print("12354"[:,1].tolist())
which makes sense if you're using a file name as input - and that's your mistake.
In order to fix this problem, you need to implement a string parser at the beginning or outside the function.
Add the following to your code at the beginning:
file=open(GRADESdata,"r")
data=file.read()
file.close()
list1=data.split("\n")#Replace \n with appropriate line separator
list2=[e.split(",") for e in list1]
GRADESdata=numpy.array(list2)

python return double entry in dictionary

I am searching for hours and hours on this problem and tried everything possible but I can't get it cracked, I am quiet a dictionary noob.
I work with maya and got clashing names of lights, this happens when you duplicate a group all children are named the same as before, so having a ALL_KEY in one group results in a clashing name with a key_char in another group.
I need to identify a clashing name of the short name and return the long name so I can do a print long name is double or even a cmds.select.
Unfortunately everything I find on this matter in the internet is about returning if a list contains double values or not and only returns True or False, which is useless for me, so I tried list cleaning and list comparison, but I get stuck with a dictionary to maintain long and short names at the same time.
I managed to fetch short names if they are duplicates and return them, but on the way the long name got lost, so of course I can't identify it clearly anymore.
>import itertools
>import fnmatch
>import maya.cmds as mc
>LIGHT_TYPES = ["spotLight", "areaLight", "directionalLight", "pointLight", "aiAreaLight", "aiPhotometricLight", "aiSkyDomeLight"]
#create dict
dblList = {'long' : 'short'}
for x in mc.ls (type=LIGHT_TYPES, transforms=True):
y = x.split('|')[-1:][0]
dblList['long','short'] = dblList.setdefault(x, y)
#reverse values with keys for easier detection
rev_multidict = {}
for key, value in dblList.items():
rev_multidict.setdefault(value, set()).add(key)
#detect the doubles in the dict
#print [values for key, values in rev_multidict.items() if len(values) > 1]
flattenList = set(itertools.chain.from_iterable(values for key, values in rev_multidict.items() if len(values) > 1))
#so by now I got all the long names which clash in the scene already!
#means now I just need to make a for loop strip away the pipes and ask if the object is already in the list, then return the path with the pipe, and ask if the object is in lightlist and return the longname if so.
#but after many many hours I can't get this part working.
##as example until now print flattenList returns
>set([u'ALL_blockers|ALL_KEY', u'ABCD_0140|scSet', u'SARAH_TOPShape', u'ABCD_0140|scChars', u'ALL|ALL_KEY', u'|scChars', u'|scSet', u'|scFX', ('long', 'short'), u'ABCD_0140|scFX'])
#we see ALL_KEY is double! and that's exactly what I need returned as long name
#THIS IS THE PART THAT I CAN'T GET WORKING, CHECK IN THE LIST WHICH VALUES ARE DOUBLE IN THE LONGNAME AND RETURN THE SHORTNAME LIST.
THE WHOLE DICTIONARY IS STILL COMPLETE AS
seen = set()
uniq = []
for x in dblList2:
if x[0].split('|')[-1:][0] not in seen:
uniq.append(x.split('|')[-1:][0])
seen.add(x.split('|')[-1:][0])
thanks for your help.
I'm going to take a stab with this. If this isn't what you want let me know why.
If I have a scene with a hierarchy like this:
group1
nurbsCircle1
group2
nurbsCircle2
group3
nurbsCircle1
I can run this (adjust ls() if you need it for selection or whatnot):
conflictObjs = {}
objs = cmds.ls(shortNames = True, transforms = True)
for obj in objs:
if len( obj.split('|') ) > 1:
conflictObjs[obj] = obj.split('|')[-1]
And the output of conflictObjs will be:
# Dictionary of objects with non-unique short names
# {<long name>:<short name>}
{u'group1|nurbsCircle1': u'nurbsCircle1', u'group3|nurbsCircle1': u'nurbsCircle1'}
Showing me what objects don't have unique short names.
This will give you a list of all the lights which have duplicate short names, grouped by what the duplicated name is and including the full path of the duplicated objects:
def clashes_by_type(*types):
long_names = cmds.ls(type = types, l=True) or []
# get the parents from the lights, not using ls -type transform
long_names = set(cmds.listRelatives(*long_names, p=True, f=True) or [])
short_names = set([i.rpartition("|")[-1] for i in long_names])
short_dict = dict()
for sn in short_names:
short_dict[sn] = [i for i in long_names if i.endswith("|"+ sn)]
clashes = dict((k,v) for k, v in short_dict.items() if len(v) > 1)
return clashes
clashes_by_type('directionalLight', 'ambientLight')The main points to note:
work down from long names. short names are inherently unreliable!
when deriving the short names, include the last pipe so you don't get accidental overlaps of common names
short_names will always be a list of lists since it's created by a comprehension
once you have a dict of (name, [objects with that shortname]) it's easy to get clashes by looking for values longer than 1

Need help parsing XML with ElementTree

I'm trying to parse the following XML data:
http://pastebin.com/UcbQQSM2
This is just an example of the 2 types of data I will run into. Companies with the needed address information and companies without the needed information.
From the data I need to collect 3 pieces of information:
1) The Company name
2) The Company street
3) The Company zipcode
I'm able to do this with the following code:
#Creates list of Company names
CompanyList = []
for company in xmldata.findall('company'):
name = company.find('name').text
CompanyList.append(name)
#Creates list of Company zipcodes
ZipcodeList = []
for company in xmldata.findall('company'):
contact_data = company.find('contact-data')
address1 = contact_data.find('addresses')
for address2 in address1.findall('address'):
ZipcodeList.append(address2.find('zip').text)
#Creates list of Company streets
StreetList = []
for company in xmldata.findall('company'):
contact_data = company.find('contact-data')
address1 = contact_data.find('addresses')
for address2 in address1.findall('address'):
StreetList.append(address2.find('street').text)
But it doesn't really do what I want it to, and I can't figure out how to do what I want. I believe it will be some type of 'if' statement but I don't know.
The problem is that where I have:
for address2 in address1.findall('address'):
ZipcodeList.append(address2.find('zip').text)
and
for address2 in address1.findall('address'):
StreetList.append(address2.find('street').text)
It only adds to the list the places that actually have a street name or zipcode listed in the XML, but I need a placemark for the companies that also DON'T have that information listed so that my lists match up.
I hope this makes sense. Let me know if I need to add more information.
But, basically, I'm trying to find a way to say if there isn't a zipcode/street name for the Company put "None" and if there is then put the zipcode/street name.
Any help/guidance is appreciated.
Well I am going to do a bad thing and suggest you use a conditional (ternary) operator.
StreetList.append(address2.find('street').text if address2.find('street').text else 'None')
So this statement says return address2.find('street').text if **address2.find('street') is not empty else return 'None'.
Additionally you could created a new method to do the same test and call it in both places, note my python is rusty but should get you close:
def returnNoneIfEmpty(testText):
if testText:
return testText
else:
return 'None'
Then just call it:
StreetList.append(returnNoneIfEmpty(address2.find('street').text))

Combining two lists and sorting through reference to dictionary Python

I have (what seems to me is) a pretty convoluted problem. I'm going to try to be as succinct as possible - though in order to understand the issue fully, you might have to click on my profile and look at the (only other) two questions I've posted on StackOverflow. In short: I have two lists -- one is comprised of email strings that contain a facility name, and a date of incident. The other is comprised of the facility ids for each email (I use one of the following regex functions to get this list). I've used Regex to be able to search each string for these pieces of information. The 3 Regex functions are:
def find_facility_name(incident):
pattern = re.compile(r'Subject:.*?for\s(.+?)\n')
findPat1 = re.search(pattern, incident)
facility_name = findPat1.group(1)
return facility_name
def find_date_of_incident(incident):
pattern = re.compile(r'Date of Incident:\s(.+?)\n')
findPat2 = re.search(pattern, incident)
incident_date = findPat2.group(1)
return incident_date
def find_facility_id(incident):
pattern = re.compile('(\d{3})\n')
findPat3 = re.search(pattern, incident)
f_id = findPat3.group(1)
return f_id
I also have a dictionary that is formatted like this:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
I'm trying to COMBINE the two lists and sort by the Key values in the dictionary, followed by the Date of Incident. Since the key values are attached to the facility name, this should automatically caused emails from the same facilities to be grouped together. In order to do that, I've tried to use these two functions:
def get_facility_ids(incident_list):
'''(lst) -> lst
Return a new list from incident_list that inserts the facility IDs from the
get_facilities dictionary into each incident.
'''
f_id = []
for incident in incident_list:
find_facility_name(incident)
for k in d:
if find_facility_name(incident) == d[k]:
f_id.append(k)
return f_id
id_list = get_facility_ids(incident_list)
def combine_lists(L1, L2):
combo_list = []
for i in range(len(L1)):
combo_list.append(L1[i] + L2[i])
return combo_list
combination = combine_lists(id_list, incident_list)
def get_sort_key(incident):
'''(str) -> tup
Return a tuple from incident containing the facility id as the first
value and the date of the incident as the second value.
'''
return (find_facility_id(incident), find_date_of_incident(incident))
final_list = sorted(combination, key=get_sort_key)
Here is an example of what my input might be and the desired output:
d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
input: first_list = ['email_1', 'email_2', etc.]
first output: next_list = ['facility_id_for_1+email_1', 'facility_id_for_2 + email_2', etc.]
DESIRED OUTPUT: FINAL_LIST = sorted(next_list, key=facility_id, date of incident)
The only problem is, the key values are not matching properly with what's found in each individual email string. Some DO, others are completely random. I have no idea why this is happening, but I have a feeling it has something to do with the way I'm combining the two lists. Can anyone help this lowly n00b? Thanks!!!
First off, I would suggest reversing your ID-to-name dictionary. Looking up a value by key is very fast but finding a key by value is very slow.
rd = { name: id_num for id_num, name in d.items() }
Then your first function can be replaced by a list comprehension:
id_list = [rd[find_facility_name(incident)] for incident in incident_list]
This might also expose why you're getting messed up values in your results. If an incident has a facility name that's not in your dictionary, this code will raise a KeyError (whereas your old function would just skip it).
Your combine function is very similar to Python's built in zip function. I'd replace it with:
combination = [id+incident for id, incident in zip(id_list, incident_list)]
However, since you're building the first list from the second one, it might make sense to build the combined version directly, rather than making separate lists and then combining them in a separate step. Here's an update to the list comprehension above that goes right to the combination result:
combination = [rd[find_facility_name(incident)] + incident
for incident in incident_list]
To do the sort, you can use the ID string that we just prepended to the email message, rather than parsing to find the ID again:
combination.sort(key=lambda x: (x[0:3], get_date_of_incident(x)))
The 3 in the slice is based off of your example of "001" and "002" as the id values. If the actual ids are longer or shorter you'll need to adjust that.
So, here is what I think is going on. Please correct me if possible.
The 'incident_list' is a list of email strings. You go in and find the facility names in each email because you have an external dictionary that has the (key:value)=(facility id: facility name). From the dictionary, you can extract the facility id in this 'id_list'.
You combine the lists so that you get a list of strings [facility id + email,...]
Then you want it to sort by a tuple( facility id, date of incidence ).
It looks like you are searching for the facility id and the facility name twice. You can skip a step if they are the same. Then, the best way is to do it all at once with tuples:
incident_list = ['email1', 'email2',...]
unsorted_list = []
for email in incident list:
id = find_facility_id(email)
date = find_date_of_incident(email)
mytuple = ( id, date, id + email )
unsorted_list.append(mytuple)
final_list = sorted(unsorted_list, key=lambda mytup:(mytup[0], mytup[1]))
Then you get an easy list of tuples sorted by first element (id as a string), and then second element (date as a string). If you just need a list of strings ( id + email ), then you need a list with the last element of each tuple part
FINALLIST = [ tup[-1] for tup in final_list ]

Categories

Resources