Fetching a long text column - python

I've got a problem with fetching a result from a MySql database with Python3.6.
The long text is a numpy array transformed into a string.
When I check the database and look into the column img_array everything is just fine. All the data is written.
Next I try to retrieve the text column like this:
con = .. # SQL connection which is successful and working fine
cur = con.cursor() # Getting the cursor
cur.execute('SELECT img_array FROM table WHERE id = 1')
result = cur.fetchone()[0] # Result is a tuple with the array at 0
print(result)
[136 90 87 ... 66 96 125]
The problem here is that the ... is like a string. I'm missing all the values.
When I try the following it works just fine:
cur.execute('SELECT img_array FROM table LIMIT 1')
result = cur.fetchone()[0] # this gives me the entire string in the DB
print(result)
# The entire array will be printed here without missing values
I really don't know how to fetch a column with the where clause via python.
Any ideas?
EDIT: Ok, the last edit was wrong... I've checked it again and the buffered cursor doesn't change it. I'm confused because it seemed to work.

Related

extra data after last expected column psycopg2

I'm losing my sanity after this error, I've uploaded dozens of tables before, but this one keep giving me the error in the title.
I have a data-frame with 11 columns and an SQL table already set up for it. All columns match name.
df_rates = df_rates.replace('\t', '', regex=True)
data_to_upload_output = io.StringIO() # Create object to store csv output in
df_rates.to_csv(data_to_upload_output, sep='\t', header=False, index=False,date_format='%Y-%m-%d') # Send my_data to csv
data_to_upload_output.seek(0) # Return to start of file
conn = psycopg2.connect(host='xxxxx-xxx-x-x',
dbname='xxxx',
user=uid,
password=pwd,
port=xxxx,
options="-c search_path=dbo,development")
db_table = 'sandbox.gm_dt_input_dist_rates'
with conn:
with conn.cursor() as cur:
cur.copy_from(data_to_upload_output, db_table, null='', columns=my_data.columns) # null values become '', columns should be lowercase, at least for PostgreSQL
conn.commit()
conn.close()
The error continues saying:
CONTEXT: COPY gm_dt_input_dist_rates, line 43:
"IE00B44CGS96 USD 0.9088 0.9088 10323906 97.2815 97.2815 2022-05-12 2022-05-11 cfsad 2022-05-20"
Which makes me think the "/t" hasn't been recognized. But this same code works perfectly for all the other tables I'm uploading. I've checked for posts with same errors but I couldn't find a way to apply the solution they got to what I'm experiencing.
Thanks for your help!
It is much appreciated, have a great weekend!

Python Iterate over rows and update SQL Server table

My Python code works to this point and returns several rows. I need to take each row and process it in a loop in Python. The first row works fine and does its trick, but the second row never runs. Clearly, I am not looping correctly. I believe I am not iterating over each row in the results. Here is the code:
for row in results:
print(row[0])
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000014693-21-000091\full-submission.txt
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000894189-21-001890\full-submission.txt
F:\FinancialResearch\SEC\myEdgar\sec-edgar-filings\A\10-K\0000894189-21-001895\full-submission.txt
for row in results:
with open(row[0],'r') as f:
contents = f.read()
bill = row
for x in range(0, 3):
VanHalen = 'Hello'
cnxn1 = pyodbc.connect('Driver={SQL Server};'
'Server=XXX;'
'Database=00010KData;'
'Trusted_Connection=yes;')
curs1 = cnxn1.cursor()
curs1.execute('''
Update EdgarComments SET Comments7 = ? WHERE FullPath = ?
''', (VanHalen,bill))
curs1.commit()
curs1.close()
cnxn1.close()
print(x)
Error: ('HY004', '[HY004] [Microsoft][ODBC SQL Server Driver]Invalid SQL data type (0) (SQLBindParameter)')
The bill variable that you are storing in the FullPath column contains all the rows - is this what you want?
I would normally expect to see the file path (row[0]) being stored given the column name FullPath.
Since this is an Invalid Type error on the binding parameters, you can always check the type of the bill variable before inserting and make sure it is a type that the SQL driver accepts - usually you wanna convert strange types to strings before using them as binding-params.

can I get only the updated data from database instead of all the data

I am using sqlite3 in python 3 I want to get only the updated data from the database. what I mean by that can be explained as follows: the database already has 2 rows of data and I add 2 more rows of data. How can I read only the updated rows instead of total rows
Note: indexing may not help here because the no of rows updating will change.
def read_all():
cur = con.cursor()
cur.execute("SELECT * FROM CVT")
rows = cur.fetchall()
# print(rows[-1])
assert cur.rowcount == len(rows)
lastrowids = range(cur.lastrowid - cur.rowcount + 1, cur.lastrowid + 1)
print(lastrowids)
If you insert rows "one by one" like that
cursor.execute('INSERT INTO foo (xxxx) VALUES (xxxx)')
You then can retrieve the last inserted rows id :
last_inserted_id = cursor.lastrowid
BUT it will work ONLY if you insert a single row with execute. It will return None if you try to use it after a executemany.
If you are trying to get multiple ids of rows that were inserted at the same time see that answer that may help you.

How to update row with blob data in sqlite3?

I'm trying to update existing row in my database with blob data, and cannot understand how to do this. Is it only insert aviliable? Insert works well:
b = requests.get(url=url)
img = b.content
con = sqlite3.connect(db)
cur = con.cursor()
cur.execute('replace INTO byte(b) where n = 1 VALUES (?)', [img])
con.commit()
con.close()
this give my new row with blob data, but I need to update existing, but if i try some update code it gives me errors:
cur.execute('update byte set b = {}'.format(img))
Well I found the way. At first convert byte to hex string and update db with it, then select hex and convert to byte. So the question may be closed.

Retrieve huge data table from MySQL within Jupyter notebook

I'm currently trying to fetch 100 million of rows from a MySQL table using the Jupyter Notebook. I have made some attempts with pymysql.cursors provided for open a MySQL connection. Actually I have tried to use batches in order to speed-up the query selection process cause it's a too much heavy operation to select all the rows together. Here below my test:
import pymysql.cursors
# Connect to the database
connection = pymysql.connect(host='XXX',
user='XXX',
password='XXX',
db='XXX',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
try:
with connection.cursor() as cursor:
print(cursor.execute("SELECT count(*) FROM `table`"))
count = cursor.fetchone()[0]
batch_size = 50
for offset in xrange(0, count, batch_size):
cursor.execute(
"SELECT * FROM `table` LIMIT %s OFFSET %s",
(batch_size, offset))
for row in cursor:
print(row)
finally:
connection.close()
For now the test should just print out each row (more or less not so worth), but the best solution in my opinion would be to store everything in a pandas dataframe.
Unfortunately when I run it, I got this error:
KeyError Traceback (most recent call
last) in ()
print(cursor.execute("SELECT count(*) FROM `table`"))
---> count = cursor.fetchone()[0]
batch_size = 50
KeyError: 0
Someone has an idea of what would be the problem?
Maybe the use of chunksize would be a better idea?
Thanks in advance!
UPDATE
I have rewrite again the code without batch_size and storing the query result in a pandas dataframe. Finally it seems running but of course the execution time seems pretty much 'infinite' due to the fact that are 100mln rows as volume of data:
connection = pymysql.connect(user='XXX', password='XXX', database='XXX', host='XXX')
try:
with connection.cursor() as cursor:
query = "SELECT * FROM `table`"
cursor.execute(query)
cursor.fetchall()
df = pd.read_sql(query, connection)
finally:
connection.close()
What should be a correct approach for speed-up the process? Maybe by passing as parameter chunksize = 250?
And also If I try to print the type of df then it ouputs that is a generator. Actually this is not a dataframe.
If I print df the output is:
<generator object _query_iterator at 0x11358be10>
How can I get the data in a dataframe format? Cause if I print the fetch_all command I can see the correct output selection of the query, so till that point everything works as expected.
If I try to use Dataframe() with the result of the fetchAll command I get:
ValueError: DataFrame constructor not properly called!
Another UPDATE
I was able to output the result by iterating pd.read_sql like this:
for chunk in pd.read_sql(query, connection, chunksize = 250):
chunks.append(chunk)
result = pd.concat(chunks, ignore_index=True)
print(type(result))
#print(result)
And finally I got just one dataframe called result.
Now the questions are:
Is it possible to query all the data without a LIMIT?
What exactly influence the process benchmark?

Categories

Resources