Django Query Multiple Models with ForeignKey and ManyToMany Filed with Count - python

I have 3 DB tables (models) that I'm trying to use Django ORM query to get count of restaurants based on cuisine type in every city.
Models:
RESTAURANT
name ..
city = models.ForeignKey('City')
cuisine_types = models.ManyToManyField('Cuisinetype')
CITY
name ..
CUISINETYPE
name ..
If I was using raw SQL query, something like this would give me the desired results:
SELECT
city.`name` AS city,
cuisinetype.`cuisine`,
COUNT(
restaurant_cuisine_types.`cuisinetype_id`
) AS total
FROM
restaurant_cuisine_types
JOIN cuisinetype
ON restaurant_cuisine_types.`cuisinetype_id` = cuisinetype.`id`
JOIN restaurant
ON restaurant_cuisine_types.`restaurant_id` = restaurant.`id`
JOIN city
ON restaurant.`city_id` = city.`id`
GROUP BY cuisinetype_id,
city.name
ORDER BY city,
cuisine
LIMIT 0, 1000;
RESULTS
city cuisine total
Albuquerque American 5
Albuquerque French 1
Albuquerque Italian 1
Albuquerque Southwest 2
Albuquerque Steak 2
Atlanta American 6
Atlanta Asian 1
Atlanta Continental 2
Atlanta Fusion 1
Atlanta International 1
Atlanta Italian 1
...
So what is the Django way to get these results?

Related

Set Limit 3 to a SQL with Case and Left Join?

I have a working SQL code that retrieves all the scores of a hockey team. I would like to set Limit 3 or <= 3:
my = cursor_test.execute('''SELECT Next.ClubHome,
CASE
WHEN Next.ClubHome = Result.ClubHome then Result.ScoreHome
WHEN Next.ClubHome = Result.ClubAway then Result.ScoreAway
END as score
FROM NextMatch Next
LEFT JOIN ResultMatch Result ON Next.ClubHome in (Result.ClubHome, Result.ClubAway)
''')
for row in my.fetchall():
print(row)
Let me explain the question better:
Observe the next Chicago, New York and Dallas hockey matchs in the NextMatch table: are featured in ClubHome
NEXTMATCH
ClubHome
ClubAway
Tournament
Chicago
Minnesota
NHL
New York
Los Angeles
NHL
Dallas
Vegas Gold
NHL
In the ResultMatch table, I would like to retrieve the last 3 overall scores of Chicago, New York and Dallas (ScoreHome or ScoreAway). So I would like this output:
Chicago: 2
Chicago: 0
Chicago: 1
New York: 2
New York: 3
New York: 2
Dallas: 4
Dallas: 3
Dallas: 1
RESULTMATCH
ClubHome
ClubAway
Tournament
Round
ScoreHome
ScoreAway
Toronto
CHICAGO
NHL
8
1
2
New York
Vegas
NHL
8
2
3
CHICAGO
Dallas
NHL
7
0
4
Ottawa
New York
NHL
7
3
3
CHICAGO
Buffalo Sab
NHL
6
1
0
Vegas
CHICAGO
NHL
6
4
2
New York
Dallas
NHL
5
2
3
Dallas
Buffalo Sab
NHL
5
1
2
A code that can be USEFUL for the solution is the following. However, it only retrieves the last 3 Scorehome results (and not the ScoreAway):
x = cursor2.execute('''SELECT ClubHome,
FROM (SELECT NextMatch.ClubHome, NextMatch.ClubAway, ResultMatch.ScoreHome,
ROW_NUMBER() OVER (PARTITION BY NextMatch.ClubHome ORDER BY ResultMatch.ScoreHome DESC) AS rn
FROM NextMatch
INNER JOIN ResultMatch ON NextMatch.ClubHome = ResultMatch.ClubHome) t
WHERE rn <= 3
ORDER BY ClubHome ASC''')
How can I modify my (first code) and add Limit 3 or <= 3 to get what I ask for in the outputs example? Thank you
If you want to do it in SQL only and not filtering the results in Python, you could use the windowing function ROW_NUMBER:
SELECT clubHome, score FROM (
SELECT Next.clubhome,
CASE
WHEN Next.ClubHome = Result.ClubHome then Result.ScoreHome
WHEN Next.ClubHome = Result.ClubAway then Result.ScoreAway
END as score,
ROW_NUMBER() OVER (PARTITION BY next.clubHome ORDER BY round DESC) rowNum
FROM nextmatch Next
JOIN resultmatch Result ON Next.clubhome in (Result.clubhome, Result.clubaway)
) WHERE rowNum <= 3;
SQLFiddle: https://www.db-fiddle.com/f/xrLpLwSu783AQHrwD8Fq4t/0

Get the number of IDs that have the same combination of distinct values in the 'locations' column

I have a table with ids and locations they have been to.
id
Location
1
Maryland
1
Iowa
2
Maryland
2
Texas
3
Georgia
3
Iowa
4
Maryland
4
Iowa
5
Maryland
5
Iowa
5
Texas
I'd like to perform a query that would allow me to get the number of ids per combination.
In this example table, the output would be -
Maryland, Iowa - 2
Maryland, Texas - 1
Georgia, Iowa - 1
Maryland, Iowa, Texas - 1
My original thought was to add the ASCII values of the distinct locations of each id, and see how many have each value, and what the combinations are that correspond to the value. I was not able to do that as SQL server would not let me cast an nvarchar as a numeric data type. Is there any other way I could use SQL to get the number of devices per combination? Using python to get the number of ids per combination is also acceptable, however, SQL is preferred.
If you want to solve this in SQL and you are running SQL Server 2017 or later, you can use a CTE to aggregate the locations for each id using STRING_AGG, and then count the occurrences of each aggregated string:
WITH all_locations AS (
SELECT STRING_AGG(Location, ', ') WITHIN GROUP (ORDER BY Location) AS aloc
FROM locations
GROUP BY id
)
SELECT aloc, COUNT(*) AS cnt
FROM all_locations
GROUP BY aloc
ORDER BY cnt, aloc
Output:
aloc cnt
Georgia, Iowa 1
Iowa, Maryland, Texas 1
Maryland, Texas 1
Iowa, Maryland 2
Note that I have applied an ordering to the STRING_AGG to ensure that someone who visits Maryland and then Iowa is treated the same way as someone who visits Iowa and then Maryland. If this is not the desired behaviour, simply delete the WITHIN GROUP clause.
Demo on dbfiddle
Use groupby + agg + value_counts:
new_df = df.groupby('id')['Location'].agg(list).str.join(', ').value_counts().reset_index()
Output:
>>> new_df
index Location
0 Maryland, Iowa 2
1 Maryland, Texas 1
2 Georgia, Iowa 1
3 Maryland, Iowa, Texas 1
Let us do groupby with join then value_counts
df.groupby('id')['Location'].agg(', '.join).value_counts()
Out[938]:
join
Maryland, Iowa 2
Georgia, Iowa 1
Maryland, Iowa, Texas 1
Maryland, Texas 1
dtype: int64
Use a frozenset to aggregate to ensure having unique groups:
df.groupby('id')['Location'].agg(', '.join).value_counts()
Output:
(Maryland, Iowa) 2
(Texas, Maryland) 1
(Georgia, Iowa) 1
(Texas, Maryland, Iowa) 1
Name: Location, dtype: int64
Or a sorted string join:
df.groupby('id')['Location'].agg(lambda x: ', '.join(sorted(x))).value_counts()
output:
Iowa, Maryland 2
Maryland, Texas 1
Georgia, Iowa 1
Iowa, Maryland, Texas 1
Name: Location, dtype: int64

Validate a dataframe based on another dataframe?

I have two dataframes:
Table1:
Table2:
How to find:
The country-city combinations that are present only in Table2 but not Table1.
Here [India-Mumbai] is the output.
For each country-city combination, that's present in both the tables, find the "Initiatives" that are present in Table2 but not Table1.
Here {"India-Bangalore": [Textile, Irrigation], "USA-Texas": [Irrigation]}
To answer the first question, we can use the merge method and keep only the NaN rows :
>>> df_merged = pd.merge(df_1, df_2, on=['Country', 'City'], how='left', suffixes = ['_1', '_2'])
>>> df_merged[df_merged['Initiative_2'].isnull()][['Country', 'City']]
Country City
13 India Mumbai
For the next question, we first need to remove the NaN rows from the previously merged DataFrame :
>>> df_both_table = df_merged[~df_merged['Initiative_2'].isnull()]
>>> df_both_table
Country City Initiative_1 Initiative_2
0 India Bangalore Plants Plants
1 India Bangalore Plants Textile
2 India Bangalore Plants Irrigtion
3 India Bangalore Industries Plants
4 India Bangalore Industries Textile
5 India Bangalore Industries Irrigtion
6 India Bangalore Roads Plants
7 India Bangalore Roads Textile
8 India Bangalore Roads Irrigtion
9 USA Texas Plants Plants
10 USA Texas Plants Irrigation
11 USA Texas Roads Plants
12 USA Texas Roads Irrigation
Then, we can filter on the rows that are strictly different on columns Initiative_1 and Initiative_2 and use a groupby to get the list of Innitiative_2 :
>>> df_unique_initiative_2 = df_both_table[~(df_both_table['Initiative_1'] == df_both_table['Initiative_2'])]
>>> df_list_initiative_2 = df_unique_initiative_2.groupby(['Country', 'City'])['Initiative_2'].unique().reset_index()
>>> df_list_initiative_2
Country City Initiative_2
0 India Bangalore [Textile, Irrigation, Plants]
1 USA Texas [Irrigation, Plants]
We do the same but this time on Initiative_1 to get the list as well :
>>> df_list_initiative_1 = df_unique_initiative_2.groupby(['Country', 'City'])['Initiative_1'].unique().reset_index()
>>> df_list_initiative_1
Country City Initiative_1
0 India Bangalore [Plants, Industries, Roads]
1 USA Texas [Plants, Roads]
To finish, we use the set to remove the last redondant Initiative_1 elements to get the expected result :
>>> df_list_initiative_2['Initiative'] = (df_list_initiative_2['Initiative_2'].map(set)-df_list_initiative_1['Initiative_1'].map(set)).map(list)
>>> df_list_initiative_2[['Country', 'City', 'Initiative']]
Country City Initiative
0 India Bangalore [Textile, Irrigation]
1 USA Texas [Irrigation]
Alternative approach (df1 your Table1, df2 your Table2):
combos_1, combos_2 = set(zip(df1.Country, df1.City)), set(zip(df2.Country, df2.City))
in_2_but_not_in_1 = [f"{country}-{city}" for country, city in combos_2 - combos_1]
initiatives = {
f"{country}-{city}": (
set(df2.Initiative[df2.Country.eq(country) & df2.City.eq(city)])
- set(df1.Initiative[df1.Country.eq(country) & df1.City.eq(city)])
)
for country, city in combos_1 & combos_2
}
Results:
['India-Delhi']
{'India-Bangalore': {'Irrigation', 'Textile'}, 'USA-Texas': {'Irrigation'}}
I think you got this "The country-city combinations that are present only in Table2 but not Table1. Here [India-Mumbai] is the output" wrong: The combinations India-Mumbai is not present in Table2?

Pandas aggregate data by same ID and comma separate values in column

I have data such as the following:
ID
Category
1
Finance
2
Computer Science
3
Data Science
1
Marketing
2
Finance
My goal is to aggregate the common ID's into one row and add the differing categories all into one column seperate by commas, such as the following:
ID
Category
1
Finance, Marketing
2
Computer Science , Finance
3
Data Science
How would I go about this using Pandas?
Edit:
I also have other columns for the ID's I would like to keep. For example:
ID
Category
Location
1
Finance
New York
2
Computer Science
Los Angeles
3
Data Science
Austin
1
Marketing
New York
2
Finance
Los Angeles
since the additional data from the other columns are the same for all similarr ID's (ex: ID 1 has the same location for all instances, as does ID 2) I would like to not drop any columns and keep other data like this:
ID
Category
Location
1
Finance, Marketing
New York
2
Computer Science , Finance
Los Angeles
3
Data Science
Austin

Merge dataframes inside a dictionary of dataframes

I have a dictionary dict of dataframes such as:
{
‘table_1’: name color type
Banana Yellow Fruit,
‘another_table_1’: city state country
Atlanta Georgia United States,
‘and_another_table_1’: firstname middlename lastname
John Patrick Snow,
‘table_2’: name color type
Red Apple Fruit,
‘another_table_2’: city state country
Arlington Virginia United States,
‘and_another_table_2’: firstname middlename lastname
Alex Justin Brown,
‘table_3’: name color type
Lettuce Green Vegetable,
‘another_table_3’: city state country
Dallas Texas United States,
‘and_another_table_3’: firstname middlename lastname
Michael Alex Smith }
I would like to merge these dataframes together based on their names so that in the end I will have only 3 dataframes:
table
name color type
Banana Yellow Fruit
Red Apple Fruit
Lettuce Green Vegetable
another_table
city state country
Atlanta Georgia United States
Arlington Virginia United States
Dallas Texas United States
and_another_table
firstname middlename lastname
John Patrick Snow
Alex Justin Brown
Michael Alex Smith
Based on my initial research it seems like this should be possible with Python:
By using .split, dictionary comprehension and itertools.groupby to group together dataframes inside the dictionary based on key names
Creating dictionary of dictionaries with these grouped results
Using pandas.concat function to loop through these dictionaries and group dataframes together
I don't have a lot of experience with Python and I am a bit lost on how to actually code this.
I have reviewed
How to group similar items in a list? and
Merge dataframes in a dictionary posts but they were not as helpful because in my case name length of dataframes varies.
Also I do not want to hardcode any dataframe names, because there are more than a 1000 of them.
Here is one way:
Give this dictionary of dataframes:
dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
}
tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])
dict_of_dfs = {i:pd.concat([dd[x] for x in dd.keys() if x.startswith(i)]) for i in tables}
Outputs a new dictionary of combined tables:
dict_of_dfs['table']
# Name color type
# 0 Banana Yellow Fruit
# 0 Apple Red Fruit
dict_of_dfs['another_table']
# city state Country
# 0 Atlanta Georgia United States
# 0 Arlinton Virginia United States
dict_of_dfs['and_another_table']
# firstname middlename lastnme
# 0 John Patrick Snow
# 0 Alex Justin Brown
Another way using defaultdict from collections, create a list of combined dataframes:
from collections import defaultdict
import pandas as pd
dd = {'table_1': pd.DataFrame({'Name':['Banana'], 'color':['Yellow'], 'type':'Fruit'}),
'table_2': pd.DataFrame({'Name':['Apple'], 'color':['Red'], 'type':'Fruit'}),
'another_table_1':pd.DataFrame({'city':['Atlanta'],'state':['Georgia'], 'Country':['United States']}),
'another_table_2':pd.DataFrame({'city':['Arlinton'],'state':['Virginia'], 'Country':['United States']}),
'and_another_table_1':pd.DataFrame({'firstname':['John'], 'middlename':['Patrick'], 'lastnme':['Snow']}),
'and_another_table_2':pd.DataFrame({'firstname':['Alex'], 'middlename':['Justin'], 'lastnme':['Brown']}),
}
tables = set([i.rsplit('_', 1)[0] for i in dd.keys()])
d = defaultdict(list)
[d[i].append(dd[k]) for i in tables for k in dd.keys() if k.startswith(i)]
l_of_dfs = [pd.concat(d[i]) for i in d.keys()]
print(l_of_dfs[0])
print('\n')
print(l_of_dfs[1])
print('\n')
print(l_of_dfs[2])
Output:
city state Country
0 Atlanta Georgia United States
0 Arlinton Virginia United States
firstname middlename lastnme
0 John Patrick Snow
0 Alex Justin Brown
Name color type
0 Banana Yellow Fruit
0 Apple Red Fruit

Categories

Resources