How to fetch postgres table in parallel using python - python

I have a table named "order_lines" since it is has almost 7 million rows it takes me around 30 minutes to pull it and dump into a csv. I was hoping if I can pull it in parallel so as to reduce load time using python. My end objective is to replicate it in redshift.
Can someone please suggests methods to pull this table in python ?
Thanks in advance !

You can use below query with admin privilege :
COPY (select col1,col2 from your_table) TO 'some_file_location/filename.csv' DELIMITER ',' CSV HEADER;
OR on your command prompt of server :
psql -U user -d db_name -c "Copy (Select * From your_table) To STDOUT With CSV HEADER DELIMITER ',';" > filename_data.csv

Related

Execute the Snowflake SQLs from a file and store output in different CSVs

Thanks in advance. I am very new to Python & Python snowflake connector and need help to understand how it can be done.
Scenario: I have list of SQLS that i have put in a text file. Ex.. 5 SQL queries. I want to execute one by one in the Snowflake and store result of each SQL in a different Outputfile.
Please let me know your ideas and thoughts.
Thanks in advance.
You can try using CLI SnowSQL for this.
This is your SQL file my_example.sql you need to run
> cat my_example.sql
select current_database() as database,
current_timestamp() as timestamp,
current_warehouse() as warehouse;
Configure SnowSQL to connect to your account. You can refer this to do it.
> snowsql
* SnowSQL * v1.2.21
Type SQL statements or !help
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>!set output_format=csv
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>!spool my_example.log
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>!source my_example.sql
"DATABASE","TIMESTAMP","WAREHOUSE"
"SF_DEMO_DB","2022-08-10 03:37:12.236 -0700","COMPUTE_WH"
1 Row(s) produced. Time Elapsed: 0.116s
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>!spool off
USER1#COMPUTE_WH#SF_DEMO_DB.SF_DEMO_SCHEMA>!exit
Goodbye!
The spool file my_example.log is the CSV output of your SQL file my_example.sql
> cat my_example.log
"DATABASE","TIMESTAMP","WAREHOUSE"
"SF_DEMO_DB","2022-08-10 03:37:12.236 -0700","COMPUTE_WH"

Is there a SQLite equivalent to COPY from PostgreSQL?

I have local tab delimited raw data files "...\publisher.txt" and "...\field.txt" that I would like to load into a local SQLite database. The corresponding tables are already defined in the local database. I am accessing the database through the python-sql library in an ipython notebook. Is there a simple way to load these text files into the database?
CLI command 'readfile' doesn't seem to work in python context:
INSERT INTO Pub(k,p) VALUES('pubFile.txt',readfile('pubFile.txt'));
Throws error:
(sqlite3.OperationalError) no such function: readfile
[SQL: INSERT INTO Pub(k,p) VALUES('pubFile.txt',readfile('pubFile.txt'));]
(Background on this error at: http://sqlalche.me/e/e3q8)
No, there isn't such a command in SQLite (any longer). That feature was removed, and has been replaced by the SQLite CLI's .import statement.
See the official documentation:
The COPY command is available in SQLite version 2.8 and earlier. The COPY command has been removed from SQLite version 3.0 due to complications in trying to support it in a mixed UTF-8/16 environment. In version 3.0, the command-line shell contains a new command .import that can be used as a substitute for COPY.
The COPY command is an extension used to load large amounts of data into a table. It is modeled after a similar command found in PostgreSQL. In fact, the SQLite COPY command is specifically designed to be able to read the output of the PostgreSQL dump utility pg_dump so that data can be easily transferred from PostgreSQL into SQLite.
A sample code to load a text file into an SQLite database via the CLI is as below:
sqlite3 test.db ".import "test.txt" test_table_name"
You may read the input file into a string and then insert it:
sql = "INSERT INTO Pub (k, p) VALUES ('pubFile.txt', ?)"
with open ("pubFile.txt", "r") as myfile:
data = '\n'.join(myfile.readlines())
cur = conn.cursor()
cur.execute(sql, (data,))
conn.commit()

COPY Postgres table to CSV output, paginated over n files using python

Using psycopg2 to export Postgres data to CSV files (not all at once, 100 000 rows at a time). Currently using LIMIT OFFSET but obviously this is slow on a 100M row db. Any faster way to keep track of the offset each iteration?
for i in (0, 100000000, 100000):
"COPY
(SELECT * from users LIMIT %s OFFSET %s)
TO STDOUT DELIMITER ',' CSV HEADER;"
% (100000, i)
Is the code run in a loop, incrementing i
Let me suggest you a different approach.
Copy the whole table and split it afterward. Something like:
COPY users TO STDOUT DELIMITER ',' CSV HEADER
And finally, from bash execute the split command (btw, you could call it inside your python script):
split -l 100000 --numeric-suffixes users.csv users_chunks_
It'll generate a couple of files called users_chunks_1, users_chunks_2, etc.

Periodically outputting SQL table to a file

I'm trying to write a python script that continuously updates a file with the contents of a table in my database.
The table in the database is changing continuously, so I need to periodically update this file as well. I could do a select * query and get all the entries, but what would be great is if I could get the output table when applying the formatting of .mode column and .headers on.
What I've tried to do is create a SQL cursor and execute ".output file.txt", but that gives me an sqlite syntax error. I tried to call from the script os.sys("sqlite3 dbname.db 'select * from table;' > file.txt") but that doesn't seem to work either ("'module' object is not callable").
Is there a way for me to get the nicely formatted sqlite table?
import subprocess
f=open('file.txt','w')
subprocess.call(['sqlite3', 'dbname.db', '-csv', 'select * from table'], stdout=f)

how to import file csv without using bulk insert query?

i have tried import file csv using bulk insert but it is failed, is there another way in query to import csv file without using bulk insert ?
so far this is my query but it use bulk insert :
bulk insert [dbo].[TEMP] from
'C:\Inetpub\vhosts\topimerah.org\httpdocs\SNASPV0374280960.txt' with
(firstrow=2,fieldterminator = '~', rowterminator = ' ');
My answer is to work with bulk-insert.
1. Make sure you have bulk-admin permission in server.
2. Use SQL authentication login (For me most of the time window authentication login haven't worked.) for bulk-insert operation.

Categories

Resources