I have below code to use cassandra python driver for pagination
I tried both overriding query and set session default_fetch_size. but none of them working, the results always all rows from the table. what am I missing?
from cassandra.cluster import Cluster
from cassandra.query import SimpleStatement
# setup
cluster = Cluster(["10.40.10.xxx","10.40.10.xxx","10.40.22.xxx","10.40.22.xxx"])
session = cluster.connect()
session.set_keyspace("ad_realtime")
# session.default_fetch_size = 10
query = "SELECT * from campaign_offset"
statement = SimpleStatement(query, fetch_size=10)
results = session.execute(statement)
for row in results:
print row
Paging in the Python Driver doesn't mean getting only part of your query. It means only getting parts of your query at a time.
Your code
for row in results:
print row
Is invoking the paging machinery. Basically this is creating an iterator which only requests fetch_size rows at a time out of the resultset defined by your query.
Use LIMIT and WHERE clauses to restrict your actual result.
cassandra pagination: the difference between driver and CQL
You can get the rows of the current page by using current_rows, like:
for row in results.current_rows:
print row
Following code might help you to fetch the results in a paginated way-
def fetch_rows(stmt: Statement, fetch_size: int = 10):
stmt.fetch_size = fetch_size
rs: ResultSet = session.execute(stmt)
has_pages = rs.has_more_pages
while has_pages:
yield from rs.current_rows
print ('-----------------------------------------')
has_pages = rs.has_more_pages
rs = session.execute(ps, paging_state=rs.paging_state)
def execute():
query = "SELECT col1, col2 FROM my_table WHERE some_partition_key='part_val1' AND some_clustering_col='clus_val1'"
ps = session.prepare(query)
for row in fetch_rows(ps, 20):
print(row)
# Process the row and perform desired operation
Related
I'm using pyscopg2 for connecting my flask application to my postgres database.
I'm having a weird situation
conn = self.engine.connect().execution_options(autocommit=True)
sqlstr = f""" SELECT * FROM name_table WHERE id = 22 """
res = conn.execute(sqlstr)
logger.info(res.fetchone())
conn.close()
self.engine.dispose()
The thing is res.fetchone() retuns me None value, but there is one row with data when I run the same query directly in pgadmin4.
The same fetchone returns me data when there are more than one row present in the result.
Can anyone help me with this?
I am currently writing an application in IronPython/WPF and will heavily utilize SQL select and set statements during production.
I have successfully connected to the server and can grab the data I wish via queries. However I am having issues parsing the response data. See below code
import clr
clr.AddReference('System.Data')
from System.Data import *
query = 'sqlquery'
conn = SqlClient.SqlConnection(--sql connection properties--)
conn.Open()
result = SqlClient.SqlCommand(query, conn)
data = result.ExecuteReader()
while data.Read():
print(data[0])
data.Close()
conn.Close()
The issue I am having is print(data[0]) is required to print any of the SQL response, a simple print(data) returns:
<System.Data.SqlClient.SqlDataReader object at 0x00000000000002FF [System.Data.SqlClient.SqlDataReader]>
However a print(data[0]) only returns the index of the row in SQL, [1] the next column etc etc.
I would like to access all data from the row (where rows can be variable lengths, different queries etc)
How could I get this to work?
EDIT:
I have successfully extracted all data from one row of the response with the following code;
for i in range(1, data.FieldCount):
print(data.GetName(i))
print(data.GetValue(i))
Just need to determine how to perform this iteration over all returned rows so I can pass it to a dict/datagrid
I have found a solution that works for me.
Utilizing the SqlAdapter and Datasets to store/view data later as required
def getdataSet(self, query):
conn = SqlClient.SqlConnection(---sql cred---))
conn.Open()
result = SqlClient.SqlCommand(query, conn)
data = SqlClient.SqlDataAdapter(result)
ds = DataSet()
data.Fill(ds)
return ds
def main(self):
ds = self.getdataSet("query")
self.maindataGrid.ItemsSource = ds.Tables[0].DefaultView
I'm trying to read data from an oracle db.
I have to read on python the results of a simple select that returns a million of rows.
I use the fetchall() function, changing the arraysize property of the cursor.
select_qry = db_functions.read_sql_file('src/data/scripts/03_perimetro_select.sql')
dsn_tns = cx_Oracle.makedsn(ip, port, sid)
con = cx_Oracle.connect(user, pwd, dsn_tns)
start = time.time()
cur = con.cursor()
cur.arraysize = 1000
cur.execute('select * from bigtable where rownum < 10000')
res = cur.fetchall()
# print res # uncomment to display the query results
elapsed = (time.time() - start)
print(elapsed, " seconds")
cur.close()
con.close()
If I remove the where condition where rownum < 10000 the python environment freezes and the fetchall() function never ends.
After some trials I found a limit for this precise select, it works till 50k lines, but it fails if I select 60k lines.
What is causing this problem? Do I have to find another way to fetch this amount of data or the problem is the ODBC connection? How can I test it?
Consider running in batches using Oracle's ROWNUM. To combine back into single object append to a growing list. Below assumes total row count for table is 1 mill. Adjust as needed:
table_row_count = 1000000
batch_size = 10000
# PREPARED STATEMENT
sql = """SELECT t.* FROM
(SELECT *, ROWNUM AS row_num
FROM
(SELECT * FROM bigtable ORDER BY primary_id) sub_t
) AS t
WHERE t.row_num BETWEEN :LOWER_BOUND AND :UPPER_BOUND;"""
data = []
for lower_bound in range(0, table_row_count, batch_size):
# BIND PARAMS WITH BOUND LIMITS
cursor.execute(sql, {'LOWER_BOUND': lower_bound,
'UPPER_BOUND': lower_bound + batch_size - 1})
for row in cur.fetchall():
data.append(row)
You are probably running out of memory on the computer running cx_Oracle. Don't use fetchall() because this will require cx_Oracle to hold all result in memory. Use something like this to fetch batches of records:
cursor = connection.cursor()
cursor.execute("select employee_id from employees")
res = cursor.fetchmany(numRows=3)
print(res)
res = cursor.fetchmany(numRows=3)
print(res)
Stick the fetchmany() calls in a loop, process each batch of rows in your app before fetching the next set of rows, and exit the loop when there is no more data.
What ever solution you use, tune cursor.arraysize to get best performance.
The already given suggestion to repeat the query and select subsets of rows is also worth considering. If you are using Oracle DB 12 there is a newer (easier) syntax like SELECT * FROM mytab ORDER BY id OFFSET 5 ROWS FETCH NEXT 5 ROWS ONLY.
PS cx_Oracle does not use ODBC.
I've been learning Python recently and have learned how to connect to the database and retrieve data from a database using MYSQLdb. However, all the examples show how to get multiple rows of data. I want to know how to retrieve only one row of data.
This is my current method.
cur.execute("SELECT number, name FROM myTable WHERE id='" + id + "'")
results = cur.fetchall()
number = 0
name = ""
for result in results:
number = result['number']
name = result['name']
It seems redundant to do for result in results: since I know there is only going to be one result.
How can I just get one row of data without using the for loop?
.fetchone() to the rescue:
result = cur.fetchone()
use .pop()
if results:
result = results.pop()
number = result['number']
name = result['name']
I have the data in pandas dataframe which I am storing in SQLITE database using Python. When I am trying to query the tables inside it, I am able to get the results but without the column names. Can someone please guide me.
sql_query = """Select date(report_date), insertion_order_id, sum(impressions), sum(clicks), (sum(clicks)+0.0)/sum(impressions)*100 as CTR
from RawDailySummaries
Group By report_date, insertion_order_id
Having report_date like '2014-08-12%' """
cursor.execute(sql_query)
query1 = cursor.fetchall()
for i in query1:
print i
Below is the output that I get
(u'2014-08-12', 10187, 2024, 8, 0.3952569169960474)
(u'2014-08-12', 12419, 15054, 176, 1.1691244851866613)
What do I need to do to display the results in a tabular form with column names
In DB-API 2.0 compliant clients, cursor.description is a sequence of 7-item sequences of the form (<name>, <type_code>, <display_size>, <internal_size>, <precision>, <scale>, <null_ok>), one for each column, as described here. Note description will be None if the result of the execute statement is empty.
If you want to create a list of the column names, you can use list comprehension like this: column_names = [i[0] for i in cursor.description] then do with them whatever you'd like.
Alternatively, you can set the row_factory parameter of the connection object to something that provides column names with the results. An example of a dictionary-based row factory for SQLite is found here, and you can see a discussion of the sqlite3.Row type below that.
Step 1: Select your engine like pyodbc, SQLAlchemy etc.
Step 2: Establish connection
cursor = connection.cursor()
Step 3: Execute SQL statement
cursor.execute("Select * from db.table where condition=1")
Step 4: Extract Header from connection variable description
headers = [i[0] for i in cursor.description]
print(headers)
Try Pandas .read_sql(), I can't check it right now but it should be something like:
pd.read_sql( Q , connection)
Here is a sample code using cx_Oracle, that should do what is expected:
import cx_Oracle
def test_oracle():
connection = cx_Oracle.connect('user', 'password', 'tns')
try:
cursor = connection.cursor()
cursor.execute('SELECT day_no,area_code ,start_date from dic.b_td_m_area where rownum<10')
#only print head
title = [i[0] for i in cursor.description]
print(title)
# column info
for x in cursor.description:
print(x)
finally:
cursor.close()
if __name__ == "__main__":
test_oracle();