I am practicing with a dataset with customers. Each customer has a first name, last name, city, age, gender and invoice number.
I want to create a dictionary with the customers first and last name as the key value and append the rest of the information to the key value. There can be many invoices per customer, so that customer should only be counted once and have many invoice numbers.
City FirstName LastName Gender Age InvoiceNum
NYC Jane Doe Female 35 1023
NYC Jane Doe Female 35 6523
Jersey City John Smith Male 54 6985
Houston Kay Johnson Female 45 2357
To do so, I want to create a for loop.
class Customers:
city = ""
age = 0
invoices = []
f = open("customers".csv)
import csv
reader = csv.reader (f)
next(reader)
customers = {}
for row in reader:
This is where I am stuck. For every row in reader, I want to check if the customer already exists. If it exists, I want to add the repeating invoice numbers. If it does not exist, this will be a new customer where I will have to append the other values (city, gender, age, single invoice number).
Desired Output:
There are 3 customers. 2 are female, 1 is male. their average age is xxxx.
The count of customers does not repeat Jane Doe. the count of female does not repeat for Jane Doe. The average age will not sum Jane Doe's age twice.
I came up with this:
from collections import defaultdict
from dataclasses import dataclass, field
from typing import List
#dataclass
class Customer:
first_name: str = ''
last_name: str = ''
city: str = ''
age: int = 0
invoices: List = field(init=False, default_factory=list)
def process_entry(self, **row):
self.first_name = row['FirstName']
self.last_name = row['LastName']
self.city = row['City']
self.age = row['Age']
self.invoices.append(row['InvoiceNum'])
fake_reader = [
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 1
},
{
'FirstName': 'John',
'LastName': 'Doe',
'City': 'New York',
'Age': 30,
'InvoiceNum': 2
},
{
'FirstName': 'Clark',
'LastName': 'Kent',
'City': 'Metropolis',
'Age': 35,
'InvoiceNum': 3
}
]
customers = defaultdict(Customer)
for row in fake_reader:
customers[(row['FirstName'], row['LastName'])].process_entry(**row)
print(customers)
Output:
defaultdict(<class '__main__.Customer'>, {('John', 'Doe'): Customer(first_name='John', last_name='Doe', city='New York', age=30, invoices=[1, 2]), ('Clark', 'Kent'): Customer(first_name='Clark', last_name='Kent', city='Metropolis', age=35, invoices=[3])})
The "trick" here is to define the Customer class with default values, this way the real values can get filled using the process_entry method.
I think you're looking for something of the sort:
if name not in customers:
customers[name] = [invoice]
else:
customers[name].append(invoice)
This creates a key-value pair, with the value as an array which can then be appended to every time the for loop finds a new invoice for that name.
Edit: update to match your csv file
customers = {}
# [1:] to ignore file header
for row in reader[1:]:
City, FirstName, LastName, Gender, Age, InvoiceNum = row.split().strip()
newEntry = {'InvoiceNum': int(InvoiceNum), 'City': City, 'Gender': Gender, 'Age': int(Age)}
if (FirstName, LastName) not in customers:
customers[(FirstName, LastName)] = [newEntry]
else:
customers[(FirstName, LastName)].append(newEntry)
Immutable types can be dictionary keys, so I choose a tuple of the first and last name.
Edit: I'm hoping my answer takes you in the right direction, I left the 'csv' details to you, as your row may not correspond to what I did there.
Related
I managed to get the 32 best points. Now I am trying to get the index of 32 best students so that I can show who they are.
The link to my json file is here:
https://drive.google.com/file/d/1OOkX1hAD6Ot-I3h_DUM2gRqdSl5Hy2Pl/view
And the code is below:
import json
file_path = "C:/Users/User/Desktop/leksion 10/testim/u2/olympiad.json"
with open(file_path, 'r') as j:
contents = json.loads(j.read())
print(contents)
print("\n================================================")
class Competitor:
def __init__(self, first_name, last_name, country, point):
self.first_name = first_name
self.last_name = last_name
self.country= country
self.point = int(point)
def __repr__(self):
return f'{self.first_name} {self.last_name} {self.country} {self.point}'
olimpiade=[]
for i in contents:
olimpiade.append(Competitor(i.get('first_name'),
i.get('last_name'),
i.get('country'),
i.get('point'),))
print(olimpiade)
print("\n================================================")
#32 nxënësit më të mirë do të kalojnë në fazën e dytë. Të ndërtohet një funksion i cili kthen konkurentët e fazës së dytë.
print("\n================================================")
print(type(olimpiade))
print(type(contents))
print(type(Competitor))
for i in contents:
print(a)
print("\n================================================")
for i in olimpiade:
for j in i:
L=olimpiade.sort(key=lambda x: x.point)
print(L)
I have tried this for example
pike=[]
for value in contents:
pike.append(value['point'])
print(pike)
n = 32
pike.sort()
print(pike[-n:])
Using the data from your link and downloading to file 'olympiad.json'
Code
import json
def best_students(lst, n=1):
'''
Top n students
'''
return sorted(lst,
key = lambda d: d['point'], # sort based upon points
reverse = True)[:n] # Take n talk students
def best_students_by_country(lst, m=1):
'''
Top m students in each country
'''
# Sort by country
by_country = sorted(lst, key = lambda d: d['country'])
groups = []
for d in by_country:
if not groups:
groups.append([])
elif groups[-1][-1]['country'] != d['country']:
groups.append([]) # add new country
# Append student
groups[-1].append(d) # append student to new country
# List comprehension for best m students in each group
return [best_students(g, m) for g in groups]
Usage
# Deserialize json file
with open('olympiad.json', 'r') as f:
data = json.load(f)
# Top two students overall
print(best_students(data, 2))
# Top two students by country
print(best_students_by_country(data, 2))
Outputs
[{'first_name': 'Harvey',
'last_name': 'Massey',
'country': 'Bolivia',
'point': 9999},
{'first_name': 'Barbra',
'last_name': 'Knight',
'country': 'Equatorial Guinea',
'point': 9998}]
[[{'first_name': 'Wade',
'last_name': 'Dyer',
'country': 'Afghanistan',
'point': 9822},
{'first_name': 'Terrell',
'last_name': 'Martin',
'country': 'Afghanistan',
'point': 8875}],
[{'first_name': 'Delaney',
'last_name': 'Buck',
'country': 'Albania',
'point': 9729},
{'first_name': 'Melton',
'last_name': 'Ford',
'country': 'Albania',
'point': 9359}],
...
I have written how to make a useful dictionary out of your question.
Firstly, I am assuming all your values are in a list, and each value is a string
That would be texts
We can get list of countries from external sources
pip install country-list
from country_list import countries_for_language
countries = dict(countries_for_language('en'))
countries = list(countries.values())
Initialise empty dictionary - scores_dict = {}
for i in texts:
for j in countries:
if j in i:
country = j
score = [int(s) for s in i.split() if s.isdigit()]
try:
scores_dict[country].extend(score)
except:
scores_dict[country] = score
This will give you a dictionary that looks like this
{'Albania': [5287],
'Bolivia': [1666],
'Croatia': [1201],
'Cyprus': [8508]}
From here, you can just iterate through each country to get top 5 students overall and top 5 students for each country.
From your file I created a dataframe in pandas.
The general sorting is 'sorted_all'. 'ascending=False' means that the highest data will come first.
In the national team, Mexico selected the best 7 players.
head() by default, it shows five values.
import pandas as pd
df = pd.read_json('olympiad.json')
sorted_all = df.sort_values(by='point', ascending=False)
sorted_national = df.sort_values(['country','point'], ascending=[True, False])
print(sorted_all.head())
print(sorted_national.loc[sorted_national['country'] == 'Mexico'].head(7))
Output all
first_name last_name country point
1453 Harvey Massey Bolivia 9999
3666 Barbra Knight Equatorial Guinea 9998
5228 Rebecca Navarro Tunisia 9994
338 Jolene Pratt Mexico 9993
5322 Barnett Herrera Comoros 9986
Output national Mexico
first_name last_name country point
338 Jolene Pratt Mexico 9993
5118 Doyle Goodman Mexico 9980
2967 Mindy Watson Mexico 9510
6074 Riley Hall Mexico 9426
5357 Leah Collins Mexico 8798
5596 Luz Bartlett Mexico 8592
3684 Annette Perry Mexico 8457
There should be a grade range and grade of each student, that is what will help you filter the best students.
I am sorry if this was answered before, but I could not find any answer for this problem at all.
Let's say I have this class and list of objects:
def Person:
def __init__(self, name, country, age):
self.name = name
self.country = country
self.age = age
persons = [Person('Tom', 'USA', 20), Person('Matt', 'UK', 19), Person('Matt', 'USA', 20)]
Now I would like the user to search for a person by entering any combination of attribute values and I want to return the objects that have all these values exclusively. For example, if the user enters: 'Matt', 'USA' and no age, I want the program to return the third person only who's Matt and is from the USA and not return all three objects because all of them have some of the entered combination of attribute values.
My implementation currently uses an if statement with the or operator which would return all the objects since the usage of or would return all the objects if one statement is True, which is what I am trying to solve.
Thanks in advance.
You can use a list comprehension for the task. And the if condition should check if the value is None else check in the list.
class Person:
def __init__(self, name, country, age):
self.name = name
self.country = country
self.age = age
def __repr__(self):
return "[{},{},{}]".format(name, country, str(age))
persons = [Person('Tom', 'USA', 20), Person('Matt', 'UK', 19), Person('Matt', 'USA', 20)]
name = "Matt"
country = "USA"
age = None
result = [
p for p in persons
if (name == None or p.name == name) and
(country == None or p.country == country) and
(age == None or p.age == age)
]
print(result) #[[Matt,USA,None]]
The file, data used
Austin = null|Stone Cold Austin|996003892|987045321|Ireland
keller = null|Mathew Keller|02/05/2002|0199999999|0203140819|019607892|9801 2828 5596 0889
The Nested Dictionary
data = {'Austin': {'Full Name': 'Stone Cold Steve Austin', 'Contact Details': '996003892', 'Emergency Contact Number': '987045321', Country: 'Ireland'}}
The class and Object that I want to use to assign the dict data
class member2:
def __init__(self, realname, phone, emergencyContact, country):
self.realname = realname
self.phone = phone
self.emergencyContact = emergencyContact
self.country = country
Assigning text file data into a nested dictionary
with open("something.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
#To assign data to the class (NOT WORKING)
member2.realname = data2[values[2]]
print(member2)
if key == username:
data2 = {key: dict(zip(keys, values[1:]))}
Output
member2.realname = data2[values[2]]
KeyError: 'Stone Cold Steve Austin'
You are referring non existing key 'Stone Cold Steve Austin'
Maybe you wish to access something like data2[key][keys[0]]:
keys = ["Full Name", "Contact Details", "Emergency Contact Number", "Country"]
with open("we.txt", 'r') as f:
for line in f:
key, values = line.strip().split(" = ") # note the space around =, to avoid trailing space in key
values = values.split('|')
data2 = {key: dict(zip(keys, values[1:]))}
print(data2[key][keys[0]])
Output:
Stone Cold Austin
Mathew Keller
I'm trying to take the First Column (Name) and the fourth column (is active) from a CSV file and do the following:
Create a single entry for the Company Name
If 'is active' = yes then increment the value and output the final result.
If 'is active' = NO, then increment that number and give me a 'is active', 'is not active' list with a value at the end.
Data1 and Data2 fields are other columns that I don't care about at this time.
csv =
Name,Data1,Data2, Is Active:
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 1,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,No
Company 2,Data1,Data2,Yes
Company 2,Data1,Data2,Yes
Company 3,Data1,Data2,No
Company 3,Data1,Data2,No
Ideal result would be in the format of:
Company name, Yes-count, no-count
I've started with csvreader to read the columns and I can put them into lists, but i'm unsure how to compare and consolidate names and counts after that.
Any help would be greatly appreciated.
One way to do, Use this:
with open("your_csv_file", "r") as file:
reader = csv.reader(file)
_ = next(reader) # skip header
consolidated = {}
for line in reader:
company_name = line[0]
is_active = line[3]
if company_name not in consolidated:
consolidated[company_name] = { "yes_count": 0, "no_count": 0}
if is_active == "Yes":
consolidated[company_name]["yes_count"] += 1
else:
consolidated[company_name]["no_count"] += 1
Sample Output:
>>> print(consolidated)
{
'Company 1': {'yes_count': 3, 'no_count': 0},
'Company 2': {'yes_count': 3, 'no_count': 1},
'Company 3': {'yes_count': 0, 'no_count': 2}
}
I am new to Python 2.7 and I want the 1st column as the key column in employees and it has to check on dept 1st column and generate results.
Employees comes from a text file and dept comes from a database. I tried a lot but didn't get an easy answer. What is wrong with my code?
**Inputs :**
employees=['1','peter','london']
employees=['2','conor','london']
employees=['3','ciara','london']
employees=['4','rix','london']
dept=['1','account']
dept=['2','developer']
dept=['3','hr']
**Expected Output :**
results=['1','peter','london','account']
results=['2','conor','london','developer']
results=['3','ciara','london','hr']
results=['4','rix','london',null]
your input makes no sense. Each line overwrites the previous one data-wise. Here it seems that the digits (as string) are the keys, and some default action must be done when no info is found in dept.
To keep the spirit, just create 2 dictionaries, then use dictionary comprehension to generate the result:
employees = dict()
dept = dict()
employees['1'] = ['peter','london']
employees['2'] = ['conor','london']
employees['3'] = ['ciara','london']
employees['4'] = ['rix','london']
dept['1']=['account']
dept['2']=['developer']
dept['3']=['hr']
result = {k:v+dept.get(k,[None]) for k,v in employees.items()}
print(result)
which yields a dictionary with all the info. Note that null is None in python:
{'1': ['peter', 'london', 'account'], '4': ['rix', 'london', None], '3': ['ciara', 'london', 'hr'], '2': ['conor', 'london', 'developer']}
You could go for a class. Consider this:
class Employee:
def __init__(self, number, name, location, dept):
self.number = str(number)
self.name = name
self.location = location
self.dept = dept
def data(self):
return [self.number,
self.name,
self.location,
self.dept]
peter = Employee(1, 'peter', 'london', 'account')
print(peter.data())
['1', 'peter', 'london', 'account']
>>>