As part of my Python program, I have created a method which runs sql queries on a Db2 server. Here it is:
def run_query(c, query, return_results=False):
stmt = db.exec_immediate(c, query)
if return_results:
df = {}
row = db.fetch_assoc(stmt)
for key in [key.lower() for key in row.keys()]:
df[key] = []
while row:
for key in [key .lower() for key in row.keys()]:
df[key].append(row[key.upper()])
row = db.fetch_assoc(stmt)
return pd.DataFrame(df)
It uses the ibm_db API library and its goal is to run an SQL query. If the results are wanted, it converts the resultset into a pandas dataframe for use in the program. When I run the program to print out the returned dataframe with print(run_query(conn, "SELECT * FROM ppt_products;", True)), it does not print anything but instead exits with this error code: Process finished with exit code 136 (interrupted by signal 8: SIGFPE) (btw I am using PyCharm Professional). However, when I debug the program with pydev debugger in PyCharm, the program runs smoothly and prints out the desired output, which should look like:
id brand model url
0 2392 sdf rtsg asdfasdfasdf
1 23452345 sdf rtsg asdfasdfasdf
2 6245 sdf rtsg asdfasdfasdf
3 8467 sdf rtsg asdfasdfasdf
I had tried debugging the floating-point-exception but could only find solutions for Python 2 with a module called fpectl which can be used to turn on and off floating-point-exceptions.
I would appreciate any assistance.
The error was only occurring in PyCharm. When I run it using the command line, the error did not occur. This leads me to believe that the error may have been in the JetBrains mechanism for running scripts. Thank you data_henrik for the suggestion to use pandas.read_sql because it simplified the process of getting the result set from the SQL queries.
Related
I am new to SQL Alchemy and need a way to run a script whenever a new entry is added to a table. I am currently using the following method to get the task done but I am sure there has to be a more efficient way.
I am using python 2 for my project and MS SQL as database.
Suppose my table is carData and I add a new row for car details from website. The new car data is added to carData. My code works as follows
class CarData:
<fields for table class>
with session_scope() as session:
car_data = session.query(CarData)
reference_df = pd.read_sql_query(car_data.statement, car_data.session.bind)
while True:
with session_scope() as session:
new_df = pd.read_sql_query(car_data.statement, car_data.session.bind)
if len(new_df) > len(reference_df):
print "New Car details added"
<code to get the id of new row added>
<run script>
reference_df = new_df
sleep(10)
The above is ofcourse a much simpler version of the code that I am using but the idea is to have a reference point then keep checking every 10 seconds if there is a new entry. However even after using session_scope() I have seen connection issues after a few days as this script is suppose to run indefinitely.
Is there a better way to know that a new row has been added, get the id of the new row and run the required script?
I believe the error you've described is a connectivity issue with the database e.g. a temporary network problem
OperationalError: TCP Provider: Error code 0x68
So what you need to do is cater this with error handling!
try:
new_df = pd.read_sql_query(car_data.statement, car_data.session.bind)
except:
print("Problem with query, will try again shortly")
I am trying to pass custom input to my lambda function (Python 3.7 runtime) in JSON format from the rule set in CloudWatch.
However I am facing difficulty accessing elements from the input correctly.
Here's what the CW rule looks like.
Here is what the lambda function is doing.
import sqlalchemy # Package for accessing SQL databases via Python
import psycopg2
def lambda_handler(event,context):
today = date.today()
engine = sqlalchemy.create_engine("postgresql://some_user:userpassword#som_host/some_db")
con = engine.connect()
dest_table = "dest_table"
print(event)
s={'upload_date': today,'data':'Daily Ingestion Data','status':event["data"]} # Error points here
ingestion = pd.DataFrame(data = [s])
ingestion.to_sql(dest_table, con, schema = "some_schema", if_exists = "append",index = False, method = "multi")
When I test the event with default test event values, the print(event) statement prints the default test values ("key1":"value1") but the syntax for adding data to the database ingestion.to_sql() i.e the payload from input "Testing Input Data" is inserted to the database successfully.
However the lambda function still shows an error while running the function at event["data"] as Key error.
1) Am I accessing the Constant JSON input the right way?
2) If not then why is the data still being ingested as the way it is intended despite throwing an error at that line of code
3) The data is ingested when the function is triggered as per the schedule expression. When I test the event it shows an error with the Key. Is it the test event which is not similar to the actual input which is causing this error?
There is alot of documentation and articles on how to take input but I could not find anything that shows how to access the input inside the function. I have been stuck at this point for a while and it frustrates me that why is this not documented anywhere.
Would really appreciate if someone could give me some clarity to this process.
Thanks in advance
Edit:
Image of the monitoring Logs:
[ERROR] KeyError: 'data' Traceback (most recent call last): File "/var/task/test.py"
I am writing this answer based on the comments.
The syntax that was originally written is valid and I am able to access the data correctly. There was a need to implement a timeout as the function was constantly hitting that threshold followed by some change in the iteration.
For some time I am getting the following error (warning?):
ERROR! Session/line number was not unique in database. History logging moved to new session
when working with Jupyter notebook (<XXXX> is a number, e.g. 9149).
As the same error has been reported for Spyder (Spyder's Warning: "Session/line number not unique in database") my guess is that there is some problem with the IPython kernel logging.
The question is: may there be any relation between running my code and the error?
Is it likely the error is caused by my code? I touch IPython API as following:
import IPython
def beep():
Python.display.display(IPython.display.Audio(url="http://www.w3schools.com/html/horse.ogg", autoplay=True))
def play_sound(self, etype, value, tb, tb_offset=None):
self.showtraceback((etype, value, tb), tb_offset=tb_offset)
beep()
get_ipython().set_custom_exc((Exception,), play_sound)
I use the beep() function in my code. I also work with large data which results in MemoryError exceptions.
And more importantly, may the error affect my code behaviour (given I do not try to access the logs)?
[EDIT]
It seems the issue is different than Spyder's Warning: "Session/line number not unique in database" as I am able to reproduce it with Jupyter Notebook but not with Spyder.
It is only a partial answer - the bounty is still eligible.
The error does depend on my code - at least when there is SyntaxError.
I have reproduced it with three following cells.
In [31]: print(1)
1
In [31]: print 2
File "<ipython-input-32-9d8034018fb9>", line 1
print 2
^
SyntaxError: Missing parentheses in call to 'print'
In [32]: print(2)
2
ERROR! Session/line number was not unique in database. History logging moved to new session 7
As you can see the line counter has been not increased in the second cell (with syntax issues).
Inspired by #zwer's comment, I have queried the $HOME/.ipython/profile_default/history.sqlite database:
sqlite> select session, line, source from history where line > 30;
6|31|print(1)
6|32|print 2
7|32|print(2)
It is clear that the line counter for the second cell has been increased in the database, but not in the notebook.
Thus when the third cell has been executed successfully, the notebook attempted to store its source with the same line, which offended the PRIMARY KEY constraint:
sqlite> .schema history
CREATE TABLE history
(session integer, line integer, source text, source_raw text,
PRIMARY KEY (session, line));
As a result, a failsafe has been triggered which issued the warning and created a new session.
I guess the issue is not affecting my code behaviour, however I miss a credible source for such statement.
I experienced the same error when I was trying to run some asyncio code in a jupyter notebook. The gist was like this (might make sense to those familiar with asyncio)
cell #1
output = loop.run_until_complete(future)
cell #2
print(output)
Run both cells together, and I would get OP's error.
Merge the cells together like so, and it ran cleanly
cell #1
output = loop.run_until_complete(future)
print(output)
This problem arises in the Jupyter Notebook cells when the cells have the same line number.
What you can do - if you are in Jupyter Notebook - is just restart the kernel.
The error will be solved.
I need to run an async query using the gcloud python BigQuery library. Furthermore I need to run the query using the beta standard sql instead of the default legacy sql.According to the documentation here, here, and here I believe I should be able to just set the use_legacy_sql property on the job to False. However, this still results in an error due to the query being processed against Legacy SQL. How do I successfully use this property to indicate which SQL standard I want the query to be processed with?
Example Python code below:
stdz_table = stdz_dataset.table('standardized_table1')
job_name = 'asyncjob-test'
query = """
SELECT TIMESTAMP('2016-03-30 10:32:15', 'America/Chicago') AS special_date
FROM my_dataset.my_table_20160331;
"""
stdz_job = bq_client.run_async_query(job_name,query)
stdz_job.use_legacy_sql = False
stdz_job.allow_large_results = True
stdz_job.create_disposition = 'CREATE_IF_NEEDED'
stdz_job.destination = stdz_table
stdz_job.write_disposition = 'WRITE_TRUNCATE'
stdz_job.begin()
# wait for job to finish
while True:
stdz_job.reload()
if stdz_job.state == 'DONE':
# print use_legacy_sql value, and any errors (will be None if job executed successfully)
print stdz_job.use_legacy_sql
print json.dumps(stdz_job.errors)
break
time.sleep(1)
This outputs:
False
[{"reason": "invalidQuery", "message": "2.20 - 2.64: Bad number of arguments. Expected 1 arguments.", "location": "query"}]
which is the same error you'd get if you ran it in the BigQuery console using Legacy SQL. When I copy paste the query in BigQuery console and run it using Standard SQL, it executes fine. Note: The error location (2.20 - 2.64) might not be exactly correct for the query above since it is a sample and I have obfuscated some of my personal info in it.
The use_legacy_sql property did not exist as of version 0.17.0, so you'd have needed to check out the current master branch. However it does now exist of as release 0.18.0 so after upgrading gcloud-python via pip you should be good to go.
I am trying to run a very simple python script via hive and hadoop.
This is my script:
#!/usr/bin/env python
import sys
for line in sys.stdin:
line = line.strip()
nums = line.split()
i = nums[0]
print i
And I want to run it on the following table:
hive> select * from test;
OK
1 3
2 2
3 1
Time taken: 0.071 seconds
hive> desc test;
OK
col1 int
col2 string
Time taken: 0.215 seconds
I am running:
hive> select transform (col1, col2) using './proba.py' from test;
But always get something like:
...
2011-11-18 12:23:32,646 Stage-1 map = 0%, reduce = 0%
2011-11-18 12:23:58,792 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201110270917_20215 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
I have tried many different modifications of this procedure but I constantly fail. :(
Am I do something wrong or there is a problem with my hive/hadoop installation?
A few things I'd check for if I were debugging this:
1) Is the python file set to be executable (chmod +x file.py)
2) Make sure the python file is in the same place on all machines. Probably better - put the file in hdfs then you can use " using 'hdfs://path/to/file.py' " instead of a local path
3) Take a look at your job on the hadoop dashboard (http://master-node:9100), if you click on a failed task it will give you the actual java error and stack trace so you can see what actually went wrong with the execution
4) make sure python is installed on all the slave nodes! (I always overlook this one)
Hope that helps...
Check hive.log and/or the log from the hadoop job (job_201110270917_20215 in your example) for a more detailed error message.
"FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask" is a generic error that hive returns when something goes wong in the underlying map/reduce task. You need to go to hive log files(located on the HiveServer2 machine) and find the actual exception stack trace.