How to structure complex query - python

I have three databases in a sql database that look like this:
D: Location of dealers
dealer: zip: affiliate:
AAA 32313 Larry
BBB 32322 John
O: Sales record
customer: affiliate: zip: count:
John's Construction Larry 35331 3
Bill's Sales John 12424 300
Jim's Searching Larry 14422 32
Z: Zip distance database
zip1: zip2: dist:
35235 35235 20
32355 15553 14
I am trying to look at Database D (a list of dealers and their location), and see how much their estimated sales are. I am doing this by using Database O, which shows all sales to customers, as well as their location. The logic we are working with is, for each dealer, look through the Database O and find the zip that minimizes distance. We will assume that the dealer that was located closest to the sale was the one who made the sale.
I am having a lot of trouble setting up the SQL query to do this, and am wondering if SQL is even the right place to do this. I know a little python, and a good amount of R. Any help is appreciated.
The query I am currently using:
SELECT d.rowid, d.dealer, d.affiliate, o.count, MIN(z.dist)
FROM database D, database O, zip z
WHERE d.Zip = z.zip1 AND o.zip = z.zip2
GROUP BY d.rowid

I have modified your test data to test the sql query in R. I used sqldf library in R.
## Your Modified Test Data
LocationOfDealers <- data.frame(dealer = c("AAA", "BBB", "CCC"), zip = c(32313, 32322, 35235), affiliate = c("Larry", "John", "Larry"))
SalesRecord <- data.frame(customer=c("John's Construction", "Bill's Sales", "Jim's Searching", "Tim's Sales"), affiliate = c("Larry", "John", "Larry", "James"), zip = c(35331, 12424, 14422, 35235), count = c(3, 300, 32, 20))
ZipDistance <- data.frame(zip1=c(35235, 32355), zip2=c(35235, 15553), dist = c(20, 14))
#LocationOfDealers
# dealer zip affiliate
#1 AAA 32313 Larry
#2 BBB 32322 John
#3 CCC 35235 Larry
# SalesRecord
# customer affiliate zip count
# 1 John's Construction Larry 35331 3
# 2 Bill's Sales John 12424 300
# 3 Jim's Searching Larry 14422 32
# 4 Tim's Sales James 35235 20
# ZipDistance
# zip1 zip2 dist
# 1 35235 35235 20
# 2 32355 15553 14
## Sql query in R using sqldf
library(sqldf)
sqldf({"
SELECT dealer, MIN(dist) as Min_Dist, SUM(count) as dealer_Sold FROM (
SELECT *
FROM LocationOfDealers D
INNER JOIN ZipDistance Z on
D.zip = Z.zip1
INNER JOIN SalesRecord O on
O.zip = Z.zip2) GROUP BY dealer
"})
### There is only one dealer with common Zip between customer and dealers, and its min distance is 20
# dealer Min_Dist dealer_Sold
#1 CCC 20 20

Related

Hash Join Algorithm from MySQL in Python

Assume I want to write the following SQL Query:
SELECT Employee.Name, Employee.ID, InvitedToParty.Name, InvitedToParty.FavoriteFood
FROM Employee, InvitedToParty
WHERE Employee.Name = InvitedToParty.Name
given the required tables :
Employee
ID Name Birthday
1 Heiny 01.01.2000
2 Peter 10.10.1990
3 Sabrina 12.10. 2015
.
.
InvitedToParty
Name FavoriteFood
Michael Pizza
Heiny Pizza
Sabrina Burger
George Pasta
.
.
.
Assume I have this information as two lists in Python inside a dictionary:
tables['Employee'].id = [1, 2, 3 ..]
tables['Employee'].Name = [Heiny, Peter, Sabrina ...]
I hope you get the idea. These keys of the dictionary have attributes, because I created a class for each table.
How can I write this query in Python? My initial idea was (pseudo):
match_counter = 0
for i, value in enumerate(table1.column):
for j in range(len(table2.column)):
if table2.column[j] == value:
table2.column[j], table2.column[i] = table2.column[i], table2.column[]
match_counter += 1
And remove everything after 'match_counter' rows. But I am sure there must be a better way? Moreover, I do not even know if this would give me the correct result
The rough equivalent of your query is:
results = []
for row1 in table1:
for row2 in table2:
if row1.Name == row2.Name:
results.append( row1.Name, row1,ID, row2.FavoriteFood )

How do I print all values of a dictionary in order?

I am a rookie. Collectively, I have about two weeks of experience with any sort of computer code.
I created a dictionary with some baseball player names, their batting order, and the position they play. I'm trying to get it to print out in columns with "order", "name", and "position" as headings, with the order numbers, positions, and names under those. Think spreadsheet kind of layout (stackoverflow won't let me format the way I want to in here).
order name position
1 A Baddoo LF
2 J Schoop 1B
3 R Grossman DH
I'm new here, so apparently you have to click the link to see what I wrote. Dodgy, I know...
As you can see, I have tried 257 times to get this thing to work. I've consulted the google, a python book, and other sources to no avail.
Here is the working code
for order in lineup["order"]:
order -= 1
position = lineup["position"][order]
name = lineup["name"][order ]
print(order + 1, name, position)
Your Dictionary consists of 3 lists - if you want entry 0 you will have to access entry 0 in each list separately.
This would be an easier way to use a dictionary
players = {1: ["LF", "A Baddoo"],
2: ["1B", "J Schoop"]}
for player in players:
print(players[player])
Hope this helps
Pandas is good for creating tables as you describe what you are trying to do. You can convert a dictionary into a a dataframe/table. The only catch here is the number of rows need to be the same length, which in this case it is not. The columns order, position, name all have 9 items in the list, while pitcher only has 1. So what we can do is pull out the pitcher key using pop. This will leave you with a dictionary with just those 3 column names mentioned above, and then a list with your 1 item for pitcher.
import pandas as pd
lineup = {'order': [1,2,3,4,5,6,7,8,9],
'position': ['LF', '1B', 'RF','DH','3B','SS','2B','C','CF'],
'name': ['A Baddoo','J Schoop','R Grossman','M Cabrera','J Candelario', 'H Castro','W Castro','D Garneau','D Hill'],
'pitcher':['Matt Manning']}
pitching = lineup.pop('pitcher')
starting_lineup = pd.DataFrame(lineup)
print("Today's Detroit Tigers starting lineup is: \n", starting_lineup.to_string(index=False), "\nPitcher: ", pitching[0])
Output:
Today's Detriot Tigers starting lineup is:
order position name
1 LF A Baddoo
2 1B J Schoop
3 RF R Grossman
4 DH M Cabrera
5 3B J Candelario
6 SS H Castro
7 2B W Castro
8 C D Garneau
9 CF D Hill
Pitcher: Matt Manning

How to find college wise rankings and global rankings using pandas?

I have the following data frame:
Agent_Name college_name score college_local_ranking global_ranking
Anna Harvard 60 1 4
Mathew oxford 99 1 1
Angel IIT 65 3 6
I'm able to find the global ranking using the rank function.
df['global_ranking'] = df['score'].rank(ascending=False)
Please help me in finding local ranking based on the score of their college.
I tried this but I'm getting error.
df['college_local_ranking'] = df['score'].groupby(by = ['college_name']).rank()
Your command :
df['college_local_ranking'] = df['score'].groupby(by = ['college_name']).rank()
will fail because you are subsetting the dataframe with df[score], and then applying groupby on college_name which won't be present in this subset.
The correct command would be:
df['college_local_ranking'] = df.groupby('college_name')['score'].rank()
Here is what could work for you
df.college_local_ranking=df.groupby("college_name")["score"].rank(ascending=False)
Hope that helps

Python CSV Import as Complex Nested List, Sort, and Output as Text or Back to CSV

I have a data structure, we'll call it an inventory, in CSV that looks similar to:
ResID,Building,Floor,Room,Resource
1.1.1.1,Central Park,Ground,Admin Office,Router
1.1.2.1,Central Park,Ground,Machine Closet,Router
1.3.1.1,Central Park,Mezzanine,Dungeon,Whip
2.1.3.1,Chicago,Roof,Pidgeon Nest,Weathervane
1.13.4.1,Central Park,Secret/Hidden Floor,c:\room,Site-to-site VPN for 1.1.1.1
1.2.1.1,Central Park,Balcony,Restroom,TP
And I am trying to get it to output in a sorted CSV, and in the format of a text file following the format:
1 Central Park
1.1 Ground
1.1.1 Admin Office
1.1.1.1 Router
1.1.2 Machine Closet
1.1.2.1 Router
1.2 Balcony
1.2.1 Restroom
1.2.1.1 TP
1.3 Mezzanine
1.3.1 Dungeon
1.3.1.1 Whip
1.13 Secret/Hidden Floor
1.13.4 c:\room
1.13.4.1 Site-to-site VPN for 1.1.1.1
2 Chicago
2.1 Roof
2.1.3 Pidgeon Nest
2.1.3.1 Weathervane
I envision a data structure similar to:
Building = {
1 : 'Central Park',
2 : 'Chicago'
}
Floor = {
1 : {
1 : 'Ground',
2 : 'Balcony',
3 : 'Mezzanine',
13: 'Secret/Hidden Floor'
},
2 : {
1 : 'Roof'
}
}
Room = {
1 : {
1 : {
1 : 'Admin Office',
2 : 'Machine Closet'
}
2 : {
1 : 'Restroom'
}
3 : {
1 : 'Dungeon'
}
... Hopefully by now you get the idea.
My complication is that I do not know if this is the best way to represent the data and then iterate over it as:
for buildingID in buildings:
for floorID in floors[buildingID]:
for roomID in rooms[buildingID][floorID]:
for resource in resources[buildingID][floorID][roomID]:
do stuff...
Or if there is a MUCH more sane way to represent the data in script, but I need the full document heading numbers AND names intact, and this is the only way I could visualize to do it at my skill level.
I am also at a loss for an effective way to generate this information and build it into the data structure from a CSV in this format.
This may seem trivial to some, but I am not a programmer by trade, and really only dabble on a infrequent basis.
My ultimate goal is to be able to ingest the CSV into a sane data structure, sort it appropriately in ascending numerical order, generate line entries in the text structure shown above that lists each building, floor, room, and resource only once and listed in context with each other, and then ostensibly it would be trivially for me to handle the output to text or back to sorted CSV.
Any recommendations would be GREATLY appreciated.
EDIT: SOLUTION
Leveraging my accepted answer below I was able to generate the following code. Thank you to the guy that deleted his answer and comments that simplified my sorting process too!
import csv
def getPaddedKey(line):
keyparts = line[0].split(".")
keyparts = map(lambda x: x.rjust(5, '0'), keyparts)
return '.'.join(keyparts)
def outSortedCSV(reader):
with open(fOutName, 'w') as fOut:
writer = csv.writer(fOut, delimiter=',')
head = next(reader)
writer.writerow(head)
writer.writerows(sorted(reader, key=getPaddedKey))
s = set()
fInName = 'fIn.csv'
fOutName = 'fOut.csv'
with open(fInName, 'r') as fIn:
reader = csv.reader(fIn, delimiter=',')
outSortedCSV(reader)
fIn.seek(0)
next(fIn)
for row in reader:
ids = row[0].split('.') # split the id
for i in range(1, 5):
s.add(('.'.join(ids[:i]), row[i])) # add a tuple with initial part of id and name
for e in sorted(list(s), key=getPaddedKey):
print e[0] + ' ' + e[1]
If you have no reason to build your proposed structure, you could simply add for each line the building, floor, room and resource along with its id to a set (to automatically eliminate duplicates). Then you convert the set to a list, sort it and you are done.
Possible Python code, assuming rd is a csv.reader on the inventory (*):
next(rd) # skip the headers line
s = set()
for row in rd:
ids = row[0].split('.') # split the id
for i in range(1, 5):
s.add(('.'.join(ids[:i]), row[i])) # add a tuple with initial part of id and name
l = list(s) # convert to a list
l.sort() # sort it
You have now a list of 2-tuples [('1', 'Central Park'), ('1.1', 'Ground'), ('1.1.1', 'Admin Office'), ...], you can use it to build a new csv or just print it as text:
for i in l:
print(" ".join(i))
(*) In Python 3, you would use:
with open(inventory_path, newline = '') as fd:
rd = csv.reader(fd)
...
while in Python 2, it would be:
with open(inventory_path, "rb") as fd:
rd = csv.reader(fd)
...
extract the id's
ids = ['Building_id', 'Floor_id', 'Room_id', 'Resource_id']
labels = ['ResID', 'Building', 'Floor', 'Room', 'Resource']
df2 = df.join(pd.DataFrame(list(df['ResID'].str.split('.')), columns=ids))
df2
ResID Building Floor Room Resource Building_id Floor_id Room_id Resource_id
0 1.1.1.1 Central Park Ground Admin Office Router 1 1 1 1
1 1.1.2.1 Central Park Ground Machine Closet Router 1 1 2 1
2 1.3.1.1 Central Park Mezzanine Dungeon Whip 1 3 1 1
3 2.1.3.1 Chicago Roof Pidgeon Nest Weathervane 2 1 3 1
4 1.13.4.1 Central Park Secret/Hidden Floor c:\room Site-to-site VPN for 1.1.1.1 1 13 4 1
5 1.2.1.1 Central Park Balcony Restroom TP 1 2 1 1
iterate over this
little helper method
def pop_list(list_):
while list_:
yield list_[-1], list_.copy()
list_.pop()
for (id_, remaining_ids), (label, remaining_labels) in zip(pop_list(ids), pop_list(labels)):
print(label, ': ', df2.groupby(remaining_ids)[label].first())
returns
Resource : Building_id Floor_id Room_id Resource_id
1 1 1 1 Router
2 1 Router
13 4 1 Site-to-site VPN for 1.1.1.1
2 1 1 TP
3 1 1 Whip
2 1 3 1 Weathervane
Name: Resource, dtype: object
Room : Building_id Floor_id Room_id
1 1 1 Admin Office
2 Machine Closet
13 4 c:\room
2 1 Restroom
3 1 Dungeon
2 1 3 Pidgeon Nest
Name: Room, dtype: object
Floor : Building_id Floor_id
1 1 Ground
13 Secret/Hidden Floor
2 Balcony
3 Mezzanine
2 1 Roof
Name: Floor, dtype: object
Building : Building_id
1 Central Park
2 Chicago
Name: Building, dtype: object
Explanation
for (id_, remaining_ids), (label, remaining_labels) in zip(pop_list(ids), pop_list(labels)):
print((id_, remaining_ids), (label, remaining_labels))
returns
('Resource_id', ['Building_id', 'Floor_id', 'Room_id', 'Resource_id']) ('Resource', ['ResID', 'Building', 'Floor', 'Room', 'Resource'])
('Room_id', ['Building_id', 'Floor_id', 'Room_id']) ('Room', ['ResID', 'Building', 'Floor', 'Room'])
('Floor_id', ['Building_id', 'Floor_id']) ('Floor', ['ResID', 'Building', 'Floor'])
('Building_id', ['Building_id']) ('Building', ['ResID', 'Building'])
So this just iterates over the different levels in your building structure
res = df2.groupby(remaining_ids)[label].first()
builds per level in your structure a DataFrame representing the items at this level with as (Multi)index the nested ID's to this level. This is the info you want for your eventual datatructure, it just needs to be transformed to a nested dict
Building_id Floor_id
1 1 Ground
13 Secret/Hidden Floor
2 Balcony
3 Mezzanine
2 1 Roof
to text (no nesting)
res.index = res.index.to_series().apply('.'.join)
print(res)
1.1 Ground
1.13 Secret/Hidden Floor
1.2 Balcony
1.3 Mezzanine
2.1 Roof
Name: Floor, dtype: object

Can mysql query be simplified ... or is there a need for a function, procedure?

I am making an small program in Python and with web.py as web end. For that code I have a sql query as below. The query is needed for making a web slideshow with names and titles for scale models to be shown in that slideshow. All Bronze are te bo shown on one page, all Silver medals are to be shown on one page and all gold medals are shown separatly.
As I have it now is that with every tick on a next button in the browser, I will get the next index of the sql query. Say index 0 from the sql query is:
Name Surname Title Results Class_ID outcome
John Doe Test1 Bronze 1 John Doe with: Test1 Jane Doe with: Test1
And index 1 from the sql query is:
Name Surname Title Results Class_ID outcome
Jane Doe Test2 Silver 1 Jane Doe with: Test2 John Doe with: Test2
Model Test1 from John and Jane have bronze medals and are concated with GROUP_CONCAT to get all same results from that particular class on one 'line'. Same goes for the silver medals. All gold medals are to be shown separatly. I start with Class 1 and bronze medals, after that the next index is shown and are the silver medals of Class 1, after that Gold and all that repeats for the 31 classes I have.
Question is:
Can below query be simplified? All I have now is a query for one Class. In total there are 31 different classes, so query should be run over all 31 classes to see if there are medals awarded and if so, concat the Bronze and after that concat the Silver and show the Gold separatly. I tried to make a function in MySQL but could not get it to work...
SELECT naw.Name, naw.Surname, model.Title, model.Results, model.Class_ID,
GROUP_CONCAT(naw.Name, ' ',naw.Surname, ' with: ', model.Title SEPARATOR ' ') AS outcome
FROM naw
LEFT OUTER JOIN model ON model.User_ID = naw.User_ID
WHERE model.Results = 'Bronze' AND Class_ID = '1'
UNION
SELECT naw.Name, naw.Surname, model.Title, model.Results, model.Class_ID,
GROUP_CONCAT(naw.Name, ' ',naw.Surname, ' with: ', model.Title SEPARATOR ' ') AS outcome
FROM naw
LEFT OUTER JOIN model ON model.User_ID = naw.User_ID
WHERE model.Results = 'Silver' AND Class_ID = '1'
UNION
SELECT naw.Name, naw.Surname, model.Title, model.Results, model.Class_ID, model.ID
FROM naw
LEFT OUTER JOIN model ON model.User_ID = naw.User_ID
WHERE model.Results = 'Gold' AND Class_ID = '1'
Please feel free to ask if something is not correctly explained. My first in asking help here.
Thanks and with kind Regards,
UPDATE - Problem solved
Was looking in the complete wrong direction. Below solution to my question.
SELECT naw.*, model.*,
CASE
WHEN Results = "Gold" THEN 4 + naw.User_ID
WHEN Results = "Silver" THEN 2
WHEN Results = "Bronze" THEN 3
ELSE 4
END AS pageCategory,
GROUP_CONCAT(naw.Name, ' ',naw.Surname, ' with: ', model.Title SEPARATOR '<br>') AS categories
FROM naw
INNER JOIN model ON model.User_ID = naw.User_ID
INNER JOIN classes ON model.Class_ID = classes.Class_ID
WHERE
Results <> 'None'
GROUP BY
pageCategory, classes.Class_ID
ORDER BY
classes.Class_ID, FIELD(Results, 'Bronze', 'Silver', 'Gold')

Categories

Resources