What is the best way to insert data on two MySQL tables from data frames using SQLAlchemy with a foreign key constraint. I'm inserting daily data across 2 tables and I want that for a specific date, the ID is the same across tables.
Thanks in Advance.
Table 1
| ID | Day | Info1 |
| -- | ------------ | ------ |
| 0 | 2022-01-01 | apple |
| 1 | 2022-01-03 | banana |
| 2 | 2022-01-02 | mango |
Table 2
| ID | Day | Info2 |
| -- | ------------ | ------ |
| 0 | 2022-01-01 | green |
| 1 | 2022-01-03 | yellow |
| 2 | 2022-01-02 | orange |
Thanks in Advance
Related
I have GridDB Python client running on my Ubuntu computer. I would like to get the columns having null values using a GridDM query. I know it’s possible to get the rows with null values but I want columns this time.
Take for an example, the timeseries table below:
'''
-- | timestamp | value1 | value2 | value3 | output |
-- |---------------------|--------|--------|---------|--------|
-- | 2021-06-24 12:00:22 | 1.3819 | 2.4214 | | 0 |
-- | 2021-06-25 11:55:23 | 4.8726 | 6.2324 | 9.3424 | 1 |
-- | 2021-06-26 05:40:53 | 6.1313 | | 5.4648 | 0 |
-- | 2021-06-27 08:24:19 | 6.7543 | | 9.7967 | 0 |
-- | 2021-06-28 13:34:51 | 3.5713 | 1.4452 | | 1 |
'''
The solution should basically return value2 and value3 columns. Thanks in advance!
I need to join multiple tables but I can't get the join in Python to behave as expected. I need to left join table 2 to table 1, without overwriting the existing data in the "geometry" column of table 1. What I'm trying to achieve is sort of like a VLOOKUP in Excel. I want to pull matching values from my other tables (~10) into table 1 without overwriting what is already there. Is there a better way? Below is what I tried:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1 | 123 | |
| 2 | 456 | |
| 3 | 789 | GHI |
TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |----- | -------- |
| 1 | 123 | 123 | ABC |
| 2 | 456 | 456 | DEF |
| 3 | | 789 | GHI |
What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |
| -- | ----- | -------- | --------- |
| 1 | 123 | ABC | |
| 2 | 456 | DEF | |
| 3 | 789 | | GHI |
join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"
When I try this:
join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]
I get this:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
You could try:
# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})
# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)
I'm observing odd behaviour while performing fuzzy_left_join from fuzzymatcher library. Trying to join two df, left one with 5217 records and right one with 8734, the all records with best_match_score is 71 records, which seems really odd . To achieve better results I even remove all the numbers and left only alphabetical charachters for joining columns. In the merged table the id column from the right table is NaN, which is also strange result.
left table - column for join "amazon_s3_name". First item - limonig
+------+---------+-------+-----------+------------------------------------+
| id | product | price | category | amazon_s3_name |
+------+---------+-------+-----------+------------------------------------+
| 1 | A | 1.49 | fruits | limonig |
| 8964 | B | 1.39 | beverages | studencajfuzelimonilimonetatrevaml |
| 9659 | C | 2.79 | beverages | studencajfuzelimonilimtreval |
+------+---------+-------+-----------+------------------------------------+
right table - column for join "amazon_s3_name" - last item - limoni
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| id | picture | amazon_s3_name |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| 191 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G.jpg | ahmadcajlimonidjindjifilxg |
| 192 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G40g.jpg | ahmadcajlimonidjindjifilxgg |
| 204 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Ahmadcajlimonidjindjifil20x2g40g00051265.jpg | ahmadcajlimonidjindjifilxgg |
| 1608 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Cajstudenfuzetealimonilimonovatreva15lpet.jpg | cajstudenfuzetealimonilimonovatrevalpet |
| 4689 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo.jpg | lesieursalatensosslimonimaslinovomaslo |
| 4690 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo05l500ml01301150.jpg | lesieursalatensosslimonimaslinovomaslolml |
| 4723 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Limoni.jpg | limoni |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
merged table - as we can see in the merged table best_match_score is NaN
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| id | best_match_score | __id_left | __id_right | price | category | amazon_s3_name_left | image_left | amazon_s3_name_left | image_right | amazon_s3_name_right |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| 0 | NaN | 0_left | None | 1.49 | Fruits | Limoni500g09700112 | NaN | limonig | NaN | NaN |
| 2 | NaN | 2_left | None | 1.69 | Bio | Morkovi1kgbr09700132 | NaN | morkovikgbr | NaN | NaN |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
You could give polyfuzz a try. Use the examples' setup, for example using TF-IDF or Bert, then run:
model = PolyFuzz(matchers).match(df1["amazon_s3_name"].tolist(), df2["amazon_s3_name"].to_list())
df1['To'] = model.get_matches()['To']
then merge:
df1.merge(df2, left_on='To', right_on='amazon_s3_name')
I'm stuck with a little problem with python and pandas dataframe.
I want to make a pivot table that count some relates items.
I got a dataframe with this structure.
+-----+------------+-----------+-----------+
| ID | Item_Type | Basket_ID | OwnerName |
+-----+------------+-----------+-----------+
| 3 | Basket | | |
| 336 | ChickenEgg | 3 | Henk |
| 841 | SomeEgg | 3 | Henk |
| 671 | EasterEgg | 3 | Piet |
| 9 | Basket | | |
| 336 | Orange | 9 | Piet |
| 841 | Banana | 9 | Piet |
| 671 | Strawberry | 9 | Herman |
| 888 | Apple | 9 | Herman |
| 821 | Apricots | 9 | NaN |
+-----+------------+-----------+-----------+
I want to count how many items are related to the ‘Basket’ item (Parent) and how often the ‘OwnerName’ appears with the related ‘Basket’ item.
I want my dataframe like below.
You can see the total item count from the Items that are related with the parent Item_Type ‘Basket’, and the total count how often the name appears.
You can also see how many ‘Total_Owners’ are and also item without a owner.
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+
| ID | Item_Type | Total_Items_Count | Henk_Count | Piet_Count | Herman_Count | Total_Owners | Total_NaN_Values |
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+
| 3 | Basket | 3 | 2 | 1 | 0 | 3 | |
| 9 | Basket | 5 | 0 | 2 | 2 | 4 | 1 |
+----+-----------+-------------------+------------+------------+--------------+--------------+------------------+
Answering your question requires multiple steps but the core idea is that you should use pivot_table.
The table is conceptually a multilevel index. Basket ID is the high-level index and 'ID' is the more granular level index. First thing you have to do is remove the lines where basket_id is missing so that the granularity of the table is consistent.
Let's say you named your dataframe df.
# Preparation steps
df = df[~df["Basket_ID"].isna()] # Remove the rows that shouldnt be counted.
df.loc[df["OwnerName"].isna(),"OwnerName"] = "unknown" # set missing to arbitrary value
# Make a pivot table
df = df.pivot_table(index=['Basket_ID'],columns=['OwnerName'],values=['Item_Type'],aggfunc='count').fillna(0)
From there onwards you should be able to calculate your remaining columns
I am trying to filter and count employee attendance.
I have a table like this:
attedance table:
+-----+-----------------------+----------+-----------+
| id | time | status | emp_id |
+-----+-----------------------+----------+-----------+
| 1 | 2018-04-17 7:03:40 | 1 | 1 |
| 2 | 2018-04-18 7:10:50 | 1 | 1 |
| 3 | 2018-04-19 5:05:32 | 1 | 1 |
| 4 | 2018-04-20 7:07:44 | 1 | 1 |
| 5 | 2018-04-18 7:10:50 | 1 | 2 |
+-----+-----------------------+----------+-----------+
my objective is to filter all attendance data.
got solution with
models.model.objects.filter(emp_id=1).count()
its return 4 in my case.
but also filter that time(hour) must > 6
trying to add datetime.datetime(time).time() > 6 to the filter but its not work.
any can help or suggest me how to make it happen?...
or have other best scenario like use forloop,
or it's not possible?...
You can use __hour to filter by the hour:
MyModel.objects.filter(
emp_id=1,
time__hour__gt=6
).count()
you can filter like this
model.objects.filter(time__hour__gte=6).filter(emp_id=1).count()