CSV module sorted output unexpected - python

In the code below (for printing salaries in descending order, ordered by profession),
reader = csv.DictReader(open('salaries.csv','rb'))
rows = sorted(reader)
a={}
for i in xrange(len(rows)):
if rows[i].values()[2]=='Plumbers':
a[rows[i].values()[1]]=rows[i].values()[0]
t = [i for i in sorted(a, key=lambda key:a[key], reverse=True)]
p=a.values()
p.sort()
p.reverse()
for i in xrange(len(a)):
print t[i]+","+p[i]
when i put 'Plumbers' in the conditional statement, the output among the salaries of plumbers comes out to be :
Tokyo,400
Delhi,300
London,100
and when i put 'Lawyers' in the same 'if' condition, output is:
Tokyo,800
London,700
Delhi,400
content of CSV go like:
City,Job,Salary
Delhi,Lawyers,400
Delhi,Plumbers,300
London,Lawyers,700
London,Plumbers,100
Tokyo,Lawyers,800
Tokyo,Plumbers,400
and when i remove --> if rows[i].values()[2]=='Plumbers': <-- from the program,
then it was supposed to print all the outputs but it prints only these 3:
Tokyo,400
Delhi,300
London,100
Though output should look something like:
Tokyo,800
London,700
Delhi,400
Tokyo,400
Delhi,300
London,100
Where is the problem exactly?

First of all, your code works as described... outputs in descending salary order. So works as designed?
In passing, your sorting code seems overly complex. You don't need to split the location/salary pairs into two lists and sort them independently. For example:
# Plumbers
>>> a
{'Delhi': '300', 'London': '100', 'Tokyo': '400'}
>>> [item for item in reversed(sorted(a.iteritems(),key=operator.itemgetter(1)))]
[('Tokyo', '400'), ('Delhi', '300'), ('London', '100')]
# Lawyers
>>> a
{'Delhi': '400', 'London': '700', 'Tokyo': '800'}
>>> [item for item in reversed(sorted(a.iteritems(),key=operator.itemgetter(1)))]
[('Tokyo', '800'), ('London', '700'), ('Delhi', '400')]
And to answer your last question, when you remove the 'if' statement: you are storing location vs. salary in a dictionary and a dictionary can't have duplicate keys. It will contain the last update for each location, which based on your input csv, is the salary for Plumbers.

First of all, reset all indices to index - 1 as currently rows[i].values()[2] cannot equal Plumbers unless the DictReader is a 1-based index system.
Secondly, what is unique about the Tokyo in the first row of you desired output and the Tokyo of the third row? When you create a dict, using the same value as a key will result in overwriting whatever was previously associated with that key. You need some kind of unique identifier, such as Location.Profession for the key. You could simply do the following to get a key that will preserve all of your information:
key = "".join([rows[i].values()[0], rows[i].values()[1]], sep=",")

Related

Compare dictionary with a variable

I am trying to compare a dictionary value with a variable but for some reasons I can't output the part that I want from the dictionary.
The dictionary is an ouput from a html table.
This is the code that I use to prase the html table into a dictionary:
with open('output.csv') as fd:
rd = csv.DictReader(fd, skipinitialspace=True)
for row in rd:
lista = { k: row[k] for k in row if k in ['Name', 'Clan Days']}
This is the output:
{'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
{'Name': 'Darius', 'Clan Days': '127'}
How to I compare for example the clan days from the first dictionary and if the value matches the value that I set in a variable should get the name as a string so I can later use it in another line.
Assuming you first read the data into a list of dictionaries:
data = [{ k: row[k] for k in row if k in ['Name', 'Clan Days']}
for row in rd]
You may use next() to search for the first dictionary in data matching the Clan Days value defaulting to None if no entries matched your search query:
desired_clan_days = '140'
clan_name = next((entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days), None)
Now, next() would return you the first match, if you need all of the matches, just use a list comprehension:
clan_names = [entry["Name"] for entry in data
if entry["Clan Days"] == desired_clan_days]
Note that this kind of search requires you to, in the worst case (entry not found), loop through all the entries in data. If this kind of search is the primary use case of this data structure, consider re-designing it to better fit the problem - e.g. having clan_days value as a key with a list of clan names:
data = {
"140": ["SirFulgeruL2k19"],
"127": ["Darius"]
}
In that state, getting a match would be a constant operation and as easy as data[desired_clan_days]. defaultdict(list) is something that would help you to make that transformation.
Not really sure what exactly you want, but if it's just comparing a dictionary value to a variable and getting the Name part if they match, you would get something like this..
>>> dict = {'Name': 'SirFulgeruL2k19', 'Clan Days': '140'}
>>> target = 140
>>> if int(dict['Clan Days']) == target:
... name = dict['Name']
...
>>> name
'SirFulgeruL2k19'
Edit: Read your post too quickly, considering it's all the rows from a HTML table this code is too simple. Use alecxe's answer :)

Python Programming - to modify a list item in a dictionary

Question: # Write a program to modify the email addresses in the records dictionary to reflect this change
records = {57394: ['Suresh Datta', 'suresh#example.com'], 48539: ['ColetteBrowning', 'colette#example.com'], 58302: ['Skye Homsi','skye#example.com'], 48502: ['Hiroto Yamaguchi', 'hiroto#example.com'], 48291: ['Tobias Ledford', 'tobias#example.com'], 48293: ['Jin Xu', 'jin#example.com'], 23945: ['Joana Dias', 'joana#example.com'], 85823: ['Alton Derosa', 'alton#example.com']}
I have iterated through the dictionary and created a new list with the values and split the email at # and was able to change the the email from .com to .org.
My approach was to join the changed email and change the values of the dictionary. However, I keep on getting a TypeError: sequence item 0: expected str instance, list found
my code :
lst2 = []
for value in records.values():
lst2.append(value[1].split('#'))
for items in lst2:
items[1] = 'examples.org'
for items in lst2:
','.join(lst2)
The issue is in your final for loop:
for items in lst2:
','.join(lst2)
You are joining should be joining items not lst2. However if you fix that it still won't work. You need to create a third list and add the values to it like this:
lst3 = []
for items in lst2:
lst3.append('#'.join(items))
Then, lst3 will have the properly formatted emails.
You can do a one-liner list-comprehension for, then iterate and do join the split of i with ',', so try this:
print([','.join(i[1].replace('.com','.org').split('#')) for i in records.values()])
Output:
['suresh,example.org', 'colette,example.org', 'skye,example.org', 'hiroto,example.org', 'tobias,example.org', 'jin,example.org', 'joana,example.org', 'alton,example.org']
Or:
print(['#'.join(i[1].replace('.com','.org').split('#')) for i in records.values()])
Output:
['suresh#example.org', 'colette#example.org', 'skye#example.org', 'hiroto#example.org', 'tobias#example.org', 'jin#example.org', 'joana#example.org', 'alton#example.org']
Or if want to edit dict:
print({k:[i.replace('.com','.org') for i in v] for k,v in records.items()})
Output:
{57394: ['Suresh Datta', 'suresh#example.org'], 48539: ['ColetteBrowning', 'colette#example.org'], 58302: ['Skye Homsi', 'skye#example.org'], 48502: ['Hiroto Yamaguchi', 'hiroto#example.org'], 48291: ['Tobias Ledford', 'tobias#example.org'], 48293: ['Jin Xu', 'jin#example.org'], 23945: ['Joana Dias', 'joana#example.org'], 85823: ['Alton Derosa', 'alton#example.org']}
Try the following code
records = {57394: ['Suresh Datta', 'suresh#example.com'], 48539: ['ColetteBrowning', 'colette#example.com'], 58302: ['Skye Homsi','skye#example.com'], 48502: ['Hiroto Yamaguchi', 'hiroto#example.com'], 48291: ['Tobias Ledford', 'tobias#example.com'], 48293: ['Jin Xu', 'jin#example.com'], 23945: ['Joana Dias', 'joana#example.com'], 85823: ['Alton Derosa', 'alton#example.com']}
for key, value in records.items():
new_data = []
for data in value:
new_data.append(data.replace('.com', '.org'))
records[key] = new_data
print(records)
{57394: ['Suresh Datta', 'suresh#example.org'], 48539: ['ColetteBrowning', 'colette#example.org'], 58302: ['Skye Homsi', 'skye#example.org'], 48502: ['Hiroto Yamaguchi', 'hiroto#example.org'], 48291: ['Tobias Ledford', 'tobias#example.org'], 48293: ['Jin Xu', 'jin#example.org'], 23945: ['Joana Dias', 'joana#example.org'], 85823: ['Alton Derosa', 'alton#example.org']}

One line conditional assignment if statement

This is the first question I am asking on this forum, so I welcome your feedback on making this more helpful to others.
Say I have this list:
IDs = ['First', 'Second', 'Third']
and this dictionary:
statistics = {('First', 'Name'):"FirstName", ('Second','Name'):"SecondName", ('Third','Name'):"ThirdName"}
Is there a shorter, easier to read one-liner than the following?
firstID = IDs[[statistics[ID,'Name'] for ID in IDs].index('FirstName')]
Many thanks
A more efficient (and probably more readable) approach would be this:
firstID = next(id for id in IDs if statistics[(id,'Name')]=='FirstName')
This defines a generator which checks the IDs in order, and yields values from statistics that equal "FirstName". next(...) is used to retrieve the first value from this iterator. If no matching name is found, this will raise StopIteration.
# If you ever plan to change the order of IDs:
firstID = IDs[IDs.index('First')]
# If you are literally just looking for the first ID in IDs:
firstID = IDs[0]
If you look at your code in these two lines:
IDs = ['First', 'Second', 'Third']
firstID = IDs[[statistics[ID,'Name'] for ID in IDs].index('FirstName')]
The index of 'FirstName' in your newly created list will ALWAYS be equal to the index of 'First' in IDs. Because your list comprehension will iterate IDs in order and put the corresponding dict values in that order, you will always create 'FirstName' at the same index as 'First' appears in IDs. Therefore it is far more efficient simply to call it from that list using one of the above methods.

Finding modes for multiple dictionary keys

I currently have a Python dictionary with keys assigned to multiple values (which have come from a CSV), in a format similar to:
{
'hours': ['4', '2.4', '5.8', '2.4', '7'],
'name': ['Adam', 'Bob', 'Adam', 'John', 'Harry'],
'salary': ['55000', '30000', '55000', '30000', '80000']
}
(The actual dictionary is significantly larger in both keys and values.)
I am looking to find the mode* for each set of values, with the stipulation that sets where all values occur only once do not need a mode. However, I'm not sure how to go about this (and I can't find any other examples similar to this). I am also concerned about the different (implied) data types for each set of values (e.g. 'hours' values are floats, 'name' values are strings, 'salary' values are integers), though I have a rudimentary conversion function included but not used yet.
import csv
f = 'blah.csv'
# Conducts type conversion
def conversion(value):
try:
value = float(value)
except ValueError:
pass
return value
reader = csv.DictReader(open(f))
# Places csv into a dictionary
csv_dict = {}
for row in reader:
for column, value in row.iteritems():
csv_dict.setdefault(column, []).append(value.strip())
*I'm wanting to attempt other types of calculations as well, such as averages and quartiles- which is why I'm concerned about data types- but I'd mostly like assistance with modes for now.
EDIT: the input CSV file can change; I'm unsure if this has any effect on potential solutions.
Ignoring all the csv file stuff which seems tangential to your question, lets say you have a list salary. You can use the Counter class from collections to count the unique list elements.
From that you have a number of different options about how to get from a Counter to your mode.
For example:
from collections import Counter
salary = ['55000', '30000', '55000', '30000', '80000']
counter = Counter(salary)
# This returns all unique list elements and their count, sorted by count, descending
mc = counter.most_common()
print(mc)
# This returns the unique list elements and their count, where their count equals
# the count of the most common list element.
gmc = [(k,c) for (k,c) in mc if c == mc[0][1]]
print(gmc)
# If you just want an arbitrary (list element, count) pair that has the most occurences
amc = counter.most_common()[0]
print(amc)
For the salary list in the code, this outputs:
[('55000', 2), ('30000', 2), ('80000', 1)] # mc
[('55000', 2), ('30000', 2)] # gmc
('55000', 2) # amc
Of course, for your case you'd probably use Counter(csv_dict["salary"]) instead of Counter(salary).
I'm not sure I understand the question, but you could create a dictionary matching each desired mode to those keys, manually, or you could use the 'type' class by asking the values, then if the type returns a string ask other questions/parameters, like length of the item.

Print a tupe key dictionary with multiple values

I have a dictionary with the last and first names of the authors being the key, and the book, quantity, and price being the values. I want to print them out sorted in alphabetical order by the author name, and then by the book name.
The author is: Dickens, Charles
The title is: Hard Times
The qty is: 7
The price is: 27.00
----
The author is: Shakespeare, William
The title is: Macbeth
The qty is: 3
The price is: 7.99
----
The title is: Romeo And Juliet
The qty is: 5
The price is: 5.99
I'm very new to dictionaries and can't understand how you can sort a dictionary. My code so far is this:
def displayInventory(theInventory):
theInventory = readDatabase(theInventory)
for key in theInventory:
for num in theInventory[key]:
print("The author is", ' '.join(str(n) for n in key))
print(' '.join(str(n) for n in num), '\n')
The dictionary, when printed, from which I read this looks like this:
defaultdict(<class 'list'>, {('Shakespeare', 'William'): [['Rome And Juliet', '5', '5.99'], ['Macbeth', '3', '7.99']], ('Dickens', 'Charles'): [['Hard Times', '7', '27.00']]})
fwiw, camelCase is very uncommon in Python; almost everything is written in snake_case. :)
I would do this:
for names, books in sorted(inventory.items()):
for title, qty, price in sorted(books):
print("The author is {0}".format(", ".join(names)))
print(
"The book is {0}, and I've got {1} of them for {2} each"
.format(title, qty, price))
print()
Ignoring for the moment that not everyone has a first and last name...
There are some minor tricks involved here.
First, inventory.items() produces a list of key, value tuples. I can then sort that directly, because tuples sort element-wise — that is, (1, "z") sorts before (2, "a"). So Python will compare the keys first, and the keys are tuples themselves, so it'll compare last names and then first names. Exactly what you want.
I can likewise sort books directly because I actually want to sort by title, and the title is the first thing in each structure.
I can .join the names tuple directly, because I already know everything in it should be a string, and something is wrong if that's not the case.
Then I use .format() everywhere because str() is a bit ugly.
The key is to use sorted() to sort the dictionary by its keys, but then use sort() on the dictionaries values. This is necessary because your values are actually a list of lists and it seems you want only to sort them by the first value in each sub-list.
theInventory = {('Shakespeare', 'William'): [['Rome And Juliet', '5', '5.99'], ['Macbeth', '3', '7.99']], ('Dickens', 'Charles'): [['Hard Times', '7', '27.00']]}
for Author in sorted(theInventory.keys()):
Author_Last_First = Author[0]+", "+Author[1]
Titles = theInventory[Author]
Titles.sort(key=lambda x: x[0])
for Title in Titles:
print("Author: "+str(Author_Last_First))
print("Title: "+str(Title[0]))
print("Qty: "+str(Title[1]))
print("Price: "+str(Title[2]))
print("\n")
Is that what you had in mind? You can of course always put this in a function to make calling it easier.

Categories

Resources