How do I create a one is to many relationship using python.
I have Excel Table A which has a common field (X) with Excel Table B.
The X values in Table A are unique. The X values in Table B appear multiple times.
I want to a code to go through the values in Table B and every time there is a match with table A, output a join of the row in Table C.
Someone on this forum suggested using
tableA.merge(tableB, left_on='x1', right_on='X2') but it does not work for what I require.
As an example if I have a value of 10 in the X field of Table A it can appear multiple times in table B. every time it appears in Table B i want a join done with table A.
i solved this using pd.merge(tableB,tableA, on=['X'])
Related
I have a 2 queries that will be run repetitively to feed a report and some charts so need to make sure it is tight. First query has 25 columns and will yield out 25-50 rows from a massive table. My second query will result in another 25 columns (a couple matching columns) of 25 to 50 rows from another massive table.
Desired end result is a single document in that Query 1 (Problem) and Query 2 (Problem tasks) could match on a common column (Problem ID) so that row 1 is the problem, row 2-4 is the tasks, row 5 is the next problem and 6-9 are the tasks....ect. Now I realize I could do this manually by running the 2 queries and them just combining them in excel by hand, but looking for a eloquent process that could be reusable in my absence without too much overhead.
I was exploring inserts, union all, and cross join but the 2 queries have different columns that contain different critical data elements to be returned. Also, exploring setting up a Python job to do this by importing the CSVs and interlacing results but I am a early data science student and not yet much past creating charts from imported CSVs.
Any suggestions on how I might attack this challenge? Thanks for the help.
Picture of desired end result.
enter image description here
You can do it with something like
INSERT INTO target_table (<columns...>)
SELECT <your first query>
UNION
SELECT <your second query>
And then to retrieve data
SELECT * from target_table
WHERE <...>
ORDER BY problem_id, task_id
Just ensure both queries return the same columns, i.e. the columns you want to populate in target_table, probably using fixed default values (e.g. the first query may return a default task_id by including NULL as task_id in the column list)
Thanks for the feedback #gimix, I ended up aliasing the columns that I was able to put together from the 2 tables (open_time vs date_opened ect...) so they all matched and selected '' for the null values I needed to. I unioned the 2 selected statements as suggested, Then I finally realized I can just insert my filtering queries twice as sub queries. It will now be nice and quickly repeatable for pulling and dropping into excel 2x per week. Thank you!
I've got a query that I'm trying to form using SQLAlchemy against a postgres DB.
Here's my query:
select id, array_remove(ARRAY[a.value1, b.value2, etc.], null) FROM table t
JOIN a on t.id = a.id
JOIN b on t.id = b.id
This provides a return where the second column is an array of values from different tables. The goal is to have those columns represented in a single column value separated by a comma.
In SQL Alchemy, I'm generating those tables on the so in doing so, I have a list of the Column object themselves. When I'm building my query, I've got the joins and whatnot down but how can I structure the code so that I pass in my list of columns and it results in the expected "ARRAY[column1, column2, etc.]" I expect to see in SQL?
Here's where I'm at so far:
my_query.add_columns(func.array_remove(ARRAY(id_cols), null))
Neither the ARRAY type or array function (literal) appear to take a list of columns. I tried using func.cast to a an array and that also didn't work as it wouldn't take a list of columns. Using a string list of column names isn't ideal because these columns might conflict in name... I guess the fully qualified name might be okay but seems to also be difficult to get with SQLAlchemy.
you need to use the PostgreSQL ARRAY literal instead.
from sqlalchemy.dialects.postgresql import array
my_query.add_columns(func.array_remove(array(id_cols), null))
When you have Columns with two category (Columns_A and Columns_B)
And you have 2 measures (Value1 and Value2) (from different tables, but it doesnt matter)
Then normaly Table metrix shows like this:
But what I need is to switch columns with value in first 2 rows like this:
In other words, I need division of categories for every value.
All in One image (My dataset) :)
Do you have any idea please?
Maybe in python? (I guess)
Thanks
Create a relationship b/w both the tables with Category column and then merge both the tables by following the steps in the screenshot. (Use full outer join while merging)
and then perform unpivot operation to see the below result set.
Now in the visualization tab select the matrix and as below.
I have a list/array of strings:
l = ['jack','jill','bob']
Now I need to create a table in slite3 for python using which I can insert this array into a column called "Names". I do not want multiple rows with each name in each row. I want a single row which contains the array exactly as shown above and I want to be able to retrieve it in exactly the same format. How can I insert an array as an element in a db? What am I supposed to declare as the data type of the array while creating the db itself? Like:
c.execute("CREATE TABLE names(id text, names ??)")
How do I insert values too? Like:
c.execute("INSERT INTO names VALUES(?,?)",(id,l))
EDIT: I am being so foolish. I just realized that I can have multiple entries for the id and use a query to extract all relevant names. Thanks anyway!
You can store an array in a single string field, if you somehow genereate a string representation of it, e.g. sing the pickle module. Then, when you read the line, you can unpickle it. Pickle converts many different complex objects (but not all) into a string, that the object can be restored of. But: that is most likely not what you want to do (you wont be able to do anything with the data in the tabel, except selecting the lines and then unpickle the array. You wont be able to search.
If you want to have anything of varying length (or fixed length, but many instances of similiar things), you would not want to put that in a column or multiple columns. Thing vertically, not horizontally there, meaning: don't thing about columns, think about rows. For storing a vector with any amount of components, a table is a good tool.
It is a little difficult to explain from the little detail you give, but you should think about creating a second table and putting all the names there for every row of your first table. You'd need some key in your first table, that you can use for your second table, too:
c.execute("CREATE TABLE first_table(int id, varchar(255) text, additional fields)")
c.execute("CREATE TABLE names_table(int id, int num, varchar(255) name)")
With this you can still store whatever information you have except the names in first_table and store the array of names in names_table, just use the same id as in first_table and num to store the index positions inside the array. You can then later get back the array by doing someting like
SELECT name FROM names_table
WHERE id=?
ORDER BY num
to read the array of names for any of your rows in first_table.
That's a pretty normal way to store arrays in a DB.
This is not the way to go. You should consider creating another table for names with foreign key to names.
You could pickle/marshal/json your array and store it as binary/varchar/jsonfield in your database.
Something like:
import json
names = ['jack','jill','bill']
snames = json.dumps(names)
c.execute("INSERT INTO nametable " + snames + ";")
Hey all,
I have two databases. One with 145000 rows and approx. 12 columns. I have another database with around 40000 rows and 5 columns. I am trying to compare based on two columns values. For example if in CSV#1 column 1 says 100-199 and column two says Main St(meaning that this row is contained within the 100 block of main street), how would I go about comparing that with a similar two columns in CSV#2. I need to compare every row in CSV#1 to each single row in CSV#2. If there is a match I need to append the 5 columns of each matching row to the end of the row of CSV#2. Thus CSV#2's number of columns will grow significantly and have repeat entries, doesnt matter how the columns are ordered. Any advice on how to compare two columns with another two columns in a separate database and then iterate across all rows. I've been using python and the import csv so far with the rest of the work, but this part of the problem has me stumped.
Thanks in advance
-John
A csv file is NOT a database. A csv file is just rows of text-chunks; a proper database (like PostgreSQL or Mysql or SQL Server or SQLite or many others) gives you proper data types and table joins and indexes and row iteration and proper handling of multiple matches and many other things which you really don't want to rewrite from scratch.
How is it supposed to know that Address("100-199")==Address("Main Street")? You will have to come up with some sort of knowledge-base which transforms each bit of text into a canonical address or address-range which you can then compare; see Where is a good Address Parser but be aware that it deals with singular addresses (not address ranges).
Edit:
Thanks to Sven; if you were using a real database, you could do something like
SELECT
User.firstname, User.lastname, User.account, Order.placed, Order.fulfilled
FROM
User
INNER JOIN Order ON
User.streetnumber=Order.streetnumber
AND User.streetname=Order.streetname
if streetnumber and streetname are exact matches; otherwise you still need to consider point #2 above.