MySQL Foreign Keys Batch Queries using SQL Alchemy

MySQL Foreign Keys Batch Queries using SQL Alchemy - python

What is the best way to insert data on two MySQL tables from data frames using SQLAlchemy with a foreign key constraint. I'm inserting daily data across 2 tables and I want that for a specific date, the ID is the same across tables.
Thanks in Advance.
Table 1
| ID | Day | Info1 |
| -- | ------------ | ------ |
| 0 | 2022-01-01 | apple |
| 1 | 2022-01-03 | banana |
| 2 | 2022-01-02 | mango |
Table 2
| ID | Day | Info2 |
| -- | ------------ | ------ |
| 0 | 2022-01-01 | green |
| 1 | 2022-01-03 | yellow |
| 2 | 2022-01-02 | orange |
Thanks in Advance

Related

How to get the columns with null values in GridDB?

I have GridDB Python client running on my Ubuntu computer. I would like to get the columns having null values using a GridDM query. I know it’s possible to get the rows with null values but I want columns this time.
Take for an example, the timeseries table below:
'''
-- | timestamp | value1 | value2 | value3 | output |
-- |---------------------|--------|--------|---------|--------|
-- | 2021-06-24 12:00:22 | 1.3819 | 2.4214 | | 0 |
-- | 2021-06-25 11:55:23 | 4.8726 | 6.2324 | 9.3424 | 1 |
-- | 2021-06-26 05:40:53 | 6.1313 | | 5.4648 | 0 |
-- | 2021-06-27 08:24:19 | 6.7543 | | 9.7967 | 0 |
-- | 2021-06-28 13:34:51 | 3.5713 | 1.4452 | | 1 |
'''
The solution should basically return value2 and value3 columns. Thanks in advance!

›How to join tables in Python without overwriting existing column data

I need to join multiple tables but I can't get the join in Python to behave as expected. I need to left join table 2 to table 1, without overwriting the existing data in the "geometry" column of table 1. What I'm trying to achieve is sort of like a VLOOKUP in Excel. I want to pull matching values from my other tables (~10) into table 1 without overwriting what is already there. Is there a better way? Below is what I tried:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1 | 123 | |
| 2 | 456 | |
| 3 | 789 | GHI |
TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |----- | -------- |
| 1 | 123 | 123 | ABC |
| 2 | 456 | 456 | DEF |
| 3 | | 789 | GHI |
What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |
| -- | ----- | -------- | --------- |
| 1 | 123 | ABC | |
| 2 | 456 | DEF | |
| 3 | 789 | | GHI |
join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"
When I try this:
join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]
I get this:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |

You could try:
# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})
# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)

Fuzzymatcher returns NaN for best_match_score

I'm observing odd behaviour while performing fuzzy_left_join from fuzzymatcher library. Trying to join two df, left one with 5217 records and right one with 8734, the all records with best_match_score is 71 records, which seems really odd . To achieve better results I even remove all the numbers and left only alphabetical charachters for joining columns. In the merged table the id column from the right table is NaN, which is also strange result.
left table - column for join "amazon_s3_name". First item - limonig
+------+---------+-------+-----------+------------------------------------+
| id | product | price | category | amazon_s3_name |
+------+---------+-------+-----------+------------------------------------+
| 1 | A | 1.49 | fruits | limonig |
| 8964 | B | 1.39 | beverages | studencajfuzelimonilimonetatrevaml |
| 9659 | C | 2.79 | beverages | studencajfuzelimonilimtreval |
+------+---------+-------+-----------+------------------------------------+
right table - column for join "amazon_s3_name" - last item - limoni
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| id | picture | amazon_s3_name |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| 191 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G.jpg | ahmadcajlimonidjindjifilxg |
| 192 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G40g.jpg | ahmadcajlimonidjindjifilxgg |
| 204 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Ahmadcajlimonidjindjifil20x2g40g00051265.jpg | ahmadcajlimonidjindjifilxgg |
| 1608 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Cajstudenfuzetealimonilimonovatreva15lpet.jpg | cajstudenfuzetealimonilimonovatrevalpet |
| 4689 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo.jpg | lesieursalatensosslimonimaslinovomaslo |
| 4690 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo05l500ml01301150.jpg | lesieursalatensosslimonimaslinovomaslolml |
| 4723 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Limoni.jpg | limoni |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
merged table - as we can see in the merged table best_match_score is NaN
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| id | best_match_score | __id_left | __id_right | price | category | amazon_s3_name_left | image_left | amazon_s3_name_left | image_right | amazon_s3_name_right |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| 0 | NaN | 0_left | None | 1.49 | Fruits | Limoni500g09700112 | NaN | limonig | NaN | NaN |
| 2 | NaN | 2_left | None | 1.69 | Bio | Morkovi1kgbr09700132 | NaN | morkovikgbr | NaN | NaN |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+

You could give polyfuzz a try. Use the examples' setup, for example using TF-IDF or Bert, then run:
model = PolyFuzz(matchers).match(df1["amazon_s3_name"].tolist(), df2["amazon_s3_name"].to_list())
df1['To'] = model.get_matches()['To']
then merge:
df1.merge(df2, left_on='To', right_on='amazon_s3_name')

How to make a Pandas count pivot table with related children items?

I'm stuck with a little problem with python and pandas dataframe.
I want to make a pivot table that count some relates items.
I got a dataframe with this structure.
+-----+------------+-----------+-----------+
| ID | Item_Type | Basket_ID | OwnerName |
+-----+------------+-----------+-----------+
| 3 | Basket | | |
| 336 | ChickenEgg | 3 | Henk |
| 841 | SomeEgg | 3 | Henk |
| 671 | EasterEgg | 3 | Piet |
| 9 | Basket | | |
| 336 | Orange | 9 | Piet |
| 841 | Banana | 9 | Piet |
| 671 | Strawberry | 9 | Herman |
| 888 | Apple | 9 | Herman |
| 821 | Apricots | 9 | NaN |
+-----+------------+-----------+-----------+
I want to count how many items are related to the ‘Basket’ item (Parent) and how often the ‘OwnerName’ appears with the related ‘Basket’ item.
I want my dataframe like below.
You can see the total item count from the Items that are related with the parent Item_Type ‘Basket’, and the total count how often the name appears.
You can also see how many ‘Total_Owners’ are and also item without a owner.
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+
| ID | Item_Type | Total_Items_Count | Henk_Count | Piet_Count | Herman_Count | Total_Owners | Total_NaN_Values |
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+
| 3 | Basket | 3 | 2 | 1 | 0 | 3 | |
| 9 | Basket | 5 | 0 | 2 | 2 | 4 | 1 |
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+

Answering your question requires multiple steps but the core idea is that you should use pivot_table.
The table is conceptually a multilevel index. Basket ID is the high-level index and 'ID' is the more granular level index. First thing you have to do is remove the lines where basket_id is missing so that the granularity of the table is consistent.
Let's say you named your dataframe df.
# Preparation steps
df = df[~df["Basket_ID"].isna()] # Remove the rows that shouldnt be counted.
df.loc[df["OwnerName"].isna(),"OwnerName"] = "unknown" # set missing to arbitrary value
# Make a pivot table
df = df.pivot_table(index=['Basket_ID'],columns=['OwnerName'],values=['Item_Type'],aggfunc='count').fillna(0)
From there onwards you should be able to calculate your remaining columns

How to filter model by hour in Django

I am trying to filter and count employee attendance.
I have a table like this:
attedance table:
+-----+-----------------------+----------+-----------+
| id | time | status | emp_id |
+-----+-----------------------+----------+-----------+
| 1 | 2018-04-17 7:03:40 | 1 | 1 |
| 2 | 2018-04-18 7:10:50 | 1 | 1 |
| 3 | 2018-04-19 5:05:32 | 1 | 1 |
| 4 | 2018-04-20 7:07:44 | 1 | 1 |
| 5 | 2018-04-18 7:10:50 | 1 | 2 |
+-----+-----------------------+----------+-----------+
my objective is to filter all attendance data.
got solution with
models.model.objects.filter(emp_id=1).count()
its return 4 in my case.
but also filter that time(hour) must > 6
trying to add datetime.datetime(time).time() > 6 to the filter but its not work.
any can help or suggest me how to make it happen?...
or have other best scenario like use forloop,
or it's not possible?...

You can use __hour to filter by the hour:
MyModel.objects.filter(
emp_id=1,
time__hour__gt=6
).count()

you can filter like this
model.objects.filter(time__hour__gte=6).filter(emp_id=1).count()

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

MySQL Foreign Keys Batch Queries using SQL Alchemy - python

Related

How to get the columns with null values in GridDB?

›How to join tables in Python without overwriting existing column data

Fuzzymatcher returns NaN for best_match_score

How to make a Pandas count pivot table with related children items?

How to filter model by hour in Django

Categories

Resources