I have GridDB Python client running on my Ubuntu computer. I would like to get the columns having null values using a GridDM query. I know it’s possible to get the rows with null values but I want columns this time.
Take for an example, the timeseries table below:
'''
-- | timestamp | value1 | value2 | value3 | output |
-- |---------------------|--------|--------|---------|--------|
-- | 2021-06-24 12:00:22 | 1.3819 | 2.4214 | | 0 |
-- | 2021-06-25 11:55:23 | 4.8726 | 6.2324 | 9.3424 | 1 |
-- | 2021-06-26 05:40:53 | 6.1313 | | 5.4648 | 0 |
-- | 2021-06-27 08:24:19 | 6.7543 | | 9.7967 | 0 |
-- | 2021-06-28 13:34:51 | 3.5713 | 1.4452 | | 1 |
'''
The solution should basically return value2 and value3 columns. Thanks in advance!
Related
I need to join multiple tables but I can't get the join in Python to behave as expected. I need to left join table 2 to table 1, without overwriting the existing data in the "geometry" column of table 1. What I'm trying to achieve is sort of like a VLOOKUP in Excel. I want to pull matching values from my other tables (~10) into table 1 without overwriting what is already there. Is there a better way? Below is what I tried:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
TABLE 2
| ID | GEOID | GEOMETRY |
| -- | ----- | -------- |
| 1 | 123 | |
| 2 | 456 | |
| 3 | 789 | GHI |
TABLE 3 (What I want)
| ID | BLOCKCODE | GEOID | GEOMETRY |
| -- | --------- |----- | -------- |
| 1 | 123 | 123 | ABC |
| 2 | 456 | 456 | DEF |
| 3 | | 789 | GHI |
What I'm getting
| ID | GEOID | GEOMETRY_X | GEOMETRY_Y |
| -- | ----- | -------- | --------- |
| 1 | 123 | ABC | |
| 2 | 456 | DEF | |
| 3 | 789 | | GHI |
join = pd.merge(table1, table2, how="left", left_on="BLOCKCODE", right_on="GEOID"
When I try this:
join = pd.merge(table1, table2, how="left", left_on=["BLOCKCODE", "GEOMETRY"], right_on=["GEOID", "GEOMETRY"]
I get this:
TABLE 1
| ID | BLOCKCODE | GEOMETRY |
| -- | --------- | -------- |
| 1 | 123 | ABC |
| 2 | 456 | DEF |
| 3 | 789 | |
You could try:
# rename the Blockcode column in table1 to have the same column ID as table2.
# This is necessary for the next step to work.
table1 = table1.rename(columns={"Blockcode": "GeoID",})
# Overwrites all NaN values in table1 with the value from table2.
table1.update(table2)
I'm observing odd behaviour while performing fuzzy_left_join from fuzzymatcher library. Trying to join two df, left one with 5217 records and right one with 8734, the all records with best_match_score is 71 records, which seems really odd . To achieve better results I even remove all the numbers and left only alphabetical charachters for joining columns. In the merged table the id column from the right table is NaN, which is also strange result.
left table - column for join "amazon_s3_name". First item - limonig
+------+---------+-------+-----------+------------------------------------+
| id | product | price | category | amazon_s3_name |
+------+---------+-------+-----------+------------------------------------+
| 1 | A | 1.49 | fruits | limonig |
| 8964 | B | 1.39 | beverages | studencajfuzelimonilimonetatrevaml |
| 9659 | C | 2.79 | beverages | studencajfuzelimonilimtreval |
+------+---------+-------+-----------+------------------------------------+
right table - column for join "amazon_s3_name" - last item - limoni
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| id | picture | amazon_s3_name |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
| 191 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G.jpg | ahmadcajlimonidjindjifilxg |
| 192 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/AhmadCajLimonIDjindjifil20X2G40g.jpg | ahmadcajlimonidjindjifilxgg |
| 204 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Ahmadcajlimonidjindjifil20x2g40g00051265.jpg | ahmadcajlimonidjindjifilxgg |
| 1608 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Cajstudenfuzetealimonilimonovatreva15lpet.jpg | cajstudenfuzetealimonilimonovatrevalpet |
| 4689 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo.jpg | lesieursalatensosslimonimaslinovomaslo |
| 4690 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Lesieursalatensosslimonimaslinovomaslo05l500ml01301150.jpg | lesieursalatensosslimonimaslinovomaslolml |
| 4723 | https://s3.eu-central-1.amazonaws.com/groceries.pictures/images/Limoni.jpg | limoni |
+------+----------------------------------------------------------------------------------------------------------------------------+--------------------------------------------+
merged table - as we can see in the merged table best_match_score is NaN
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| id | best_match_score | __id_left | __id_right | price | category | amazon_s3_name_left | image_left | amazon_s3_name_left | image_right | amazon_s3_name_right |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
| 0 | NaN | 0_left | None | 1.49 | Fruits | Limoni500g09700112 | NaN | limonig | NaN | NaN |
| 2 | NaN | 2_left | None | 1.69 | Bio | Morkovi1kgbr09700132 | NaN | morkovikgbr | NaN | NaN |
+----+------------------+-----------+------------+-------+----------+----------------------+------------+---------------------+-------------+----------------------+
You could give polyfuzz a try. Use the examples' setup, for example using TF-IDF or Bert, then run:
model = PolyFuzz(matchers).match(df1["amazon_s3_name"].tolist(), df2["amazon_s3_name"].to_list())
df1['To'] = model.get_matches()['To']
then merge:
df1.merge(df2, left_on='To', right_on='amazon_s3_name')
I have multiple Data Frames lets say;
| Date | Name | Value |
|:---- |:------:| -----:|
|01.01 | A | 20 |
|02.01 | B | Null |
|03.01 | C | 10 |
and
| Date | Name | Value_2 |
|:---- |:------:| -------:|
|01.01 | A | 20 |
|02.01 | B | 10 |
|03.01 | C | 10 |
I want to make Value_2 column red then merge these table as
| Date | Name | Value |Value_2 |
|:---- |:------:| -----:|-------:|
|01.01 | A | 20 | 10 |
|02.01 | B | Null | 10 |
|03.01 | C | 10 | 10 |
and then change the null value with value2
Without formatting my codes works perfectly, but if change the value2_2 column name with
'''
col_ref = {'value': 'color: red'}
df=df.style.apply(lambda x: pd.DataFrame(col_ref, index=df.index, columns=df.columns).fillna(''),axis=None)
'''
I cant do merging or dropping. I know after using style it is no more Dataframe so I can't do these operation. But is there any other way to this process. The reason is to save the final form to excel and be able to see where these data coming from (value or value_2)
|-------|------------|--------------|--------------|-------------|------------|------------|--------------|
| Store | Date | Weekly_Sales | Holiday_Flag | Temperature | Fuel_Price | CPI | Unemployment |
|-------|------------|--------------|--------------|-------------|------------|------------|--------------|
| 1 | 05-02-2010 | 1643690.90 | 0 | 42.31 | 2.572 | 211.096358 | 8.106 |
| 1 | 12-02-2010 | 1641957.44 | 1 | 38.51 | 2.548 | 211.242170 | 8.106 |
| 1 | 19-02-2010 | 1611968.17 | 0 | 39.93 | 2.514 | 211.289143 | 8.106 |
| 1 | 26-02-2010 | 1409727.59 | 0 | 46.63 | 2.561 | 211.319643 | 8.106 |
| 1 | 05-03-2010 | 1554806.68 | 0 | 46.50 | 2.625 | 211.350143 | 8.106 |
|-------|------------|--------------|--------------|-------------|------------|------------|--------------|
The Store columns range from 1 -40. How do I get the store with the maximum Weekly_Sales?
There's so many ways to do this and you haven't given an example of how you load the data into python or what format its in so this makes answering the question difficult. I suggest you look into pandas or numpy for data analysis libraries. If this is stored in a .csv format or even a python dictionary, you could try the following:
import pandas as pd
df = pd.read_csv('file.csv', header=0)
#df = pf.from_dict(dct)
value = df.Weekly_Sales.max()
#or
index = df.Weekly_Sales.idxmax()
There are two column both are categorical. I want to groupby by first columns lets age_group and my other column Engagement_category should make new column by each element.
I did the groupby on first column age_group.
| age_group | Engagement_category |
|--------------|---------------------|
| 21-26 | Not Engaged |
| 27-32 | Disengaged |
| 33-38 | Engaged |
| 39-44 | Disengaged |
| 45-50 | Not Engaged |
| 50 and Above | Engaged |
group = df.groupby('age_group')
The below one is required output:
| age_group | Engaged | Nearly Engaged | Not Engaged | Disengaged |
|-----------|---------|----------------|-------------|------------|
| 21-26 | 3 | 4 | 1 | 1 |
| 27-32 | 4 | 0 | 4 | 0 |
| 33-38 | 2 | 0 | 1 | 1 |
Thank you.
You want to group by the first two columns then use the "COUNT" function to aggregate across the dataframe. Try this:
df.groupby(['Age', 'Engagement_category']).agg(['count'])