AWS Lambda OSError(30, 'Read-only file system') - python

I am trying to run tabula-py on AWS Lambda on Python3.7 environment. The code is quite straight-forward :
import tabula
def main(event, context):
try:
print(event['Url'])
df = tabula.read_pdf(event['Url'])
print(str(df))
return {
"StatusCode":200,
"ResponseCode":0,
"ResponseMessage": str(df)
}
except Exception as e:
print('exception = %r' % e)
return {
"ResponseCode":1,
"ErrorMessage": str(e)
}
As you can see, there's just one real line of code having tabula.read_pdf(). I am not writing the files to anywhere yet I am getting exception as exception = OSError(30, 'Read-only file system')
FYI, the tabula details are available here
Following is what I've already tried and didn't work :
Verified if the url is read correctly. Also tried by a harc-coded link in the code.
Checking on Google, Stackoverflow & Co. but did not find something which can solve this issue.
Removed __pycache__ directory from the ZIP before uploading it to update the code. Also ensured nothing OS-specific local directory is in the lambda deployment package.
Any help will be highly appreciated.

tabula is writing to os, whereas you can try different pdf table scrap package for now camelot .

Related

Could not load the file : Error when calling a dll in python project

We have a python library(Lets call it TestLibrary), made as a 'whl' file.
We are consuming that library in another python project(Main Project)(flask based).
In test library, we are calling a dll(C#, Net standard 2.0) , which has few encryption methods, which returns us encrypted data.
Now this test library gives error when called those encryption methods from TestLibrary.
How can we consume those dll's in TestLibrary, and get the data on main project.
// below code is in TestLibrary
def get_encrypted_data():
try:
clr.AddReference('folder/dlls/EncryptionDLL')
from EncryptionDLL import EncryptionClass
encryptionObj = EncryptionClass()
encryptedData = encryptionObj.Encrypt('Data', 'Encryption Key')
return encryptedData
except Exception as e:
return e
//Below Code is in Flask Application
//pip install TestLibrary
from TestLibrary import get_encrypted_data
encryptedData = get_encrypted_data(); //Error here, not able to read dll
I have tried it with, PythonNet, LibMono installation. It works fine when created a POC with only that dll in python.
When we place it another library and consume that library, we are getting error.

Create file system/container if not found

I'm trying to export a CSV to an Azure Data Lake Storage but when the file system/container does not exist the code breaks. I have also read through the documentation but I cannot seem to find anything helpful for this situation.
How do I go about creating a container in Azure Data Lake Storage if the container specified by the user does not exist?
Current Code:
try:
file_system_client = service_client.get_file_system_client(file_system="testfilesystem")
except Exception:
file_system_client = service_client.create_file_system(file_system="testfilesystem")
Traceback:
(FilesystemNotFound) The specified filesystem does not exist.
RequestId:XXXX
Time:2021-03-31T13:39:21.8860233Z
The try catch pattern should be not used here since the Azure Data lake gen2 library has the built in exists() method for file_system_client.
First, make sure you've installed the latest version library: azure-storage-file-datalake 12.3.0. If you're not sure which version you're using, please use pip show azure-storage-file-datalake command to check the current version.
Then you can use the code below:
from azure.storage.filedatalake import DataLakeServiceClient
service_client = DataLakeServiceClient(account_url="{}://{}.dfs.core.windows.net".format(
"https", "xxx"), credential="xxx")
#the get_file_system_client method will not throw error if the file system does not exist, if you're using the latest library 12.3.0
file_system_client = service_client.get_file_system_client("filesystem333")
print("the file system exists: " + str(file_system_client.exists()))
#create the file system if it does not exist
if not file_system_client.exists():
file_system_client.create_file_system()
print("the file system is created.")
#other code
I've tested it locally, it can work successfully:

Unable to run another python script in azure function using python

I have created an event grid triggered azure function in python. I have deployed my solution to azure successfully and the execution is working fine. But, I have an issue with calling another python script in the same folder location. My code is given below: -
import os, json, subprocess
import logging
import azure.functions as func
def main(event: func.EventGridEvent):
try:
correctionsMessages = event.get_json()
for correctionMessage in correctionsMessages:
strMessage = json.dumps(correctionMessage)
full_path_to_script = os.path.join(os.path.dirname(os.path.realpath(__file__)) + '/' + correctionMessage['ScriptName'] + '.py')
logging.info('Script Path: %s', full_path_to_script)
logging.info('Parameter: %s', json.dumps(detectionMessage))
subprocess.check_call('python '+ full_path_to_script + ' ' + json.dumps(strMessage))
result = json.dumps({
'id': event.id,
'data': event.get_json(),
'topic': event.topic,
'subject': event.subject,
'event_type': event.event_type,
})
logging.info('Python EventGrid trigger processed an event: %s', result)
except Exception as e:
logging.info('Error: %s', e)
The above code is giving error for subprocess.check_call. Error is "Error: [Errno 2] No such file or directory: 'python /home/site/wwwroot/Detections/Script1.py". Script1.py is in same folder with init.py. When i am running this function locally, it is working absolutely fine.
Per my experience, the error was caused by the subprocess.check_call function not know the call path of python, not due to the Script1.py path.
On your local for Azure Functions development environment, the python path has been configured in the local environment variable, so the subprocess.check_call function could invoke python via search the python execute file from the paths of environment variable. But on cloud, there is not a python path value pre-configured in the same environment variable, only the Azure Function Host know the real absoluted path for Python.
So the solution is to find out the real absoluted path of Python and use it instead of python in your code.
However, in Azure Function for Python stack runtime, I think it's not a good idea for using subprocess.check_call to spawn a child process to do some processing for a given message. The safe and correct way is to define a function in Script1.py or directly in __init__.py to pass the given message as parameters to realize the same feature.

Connect Oracle from AWS Lambda

I have a Lambda function that needs to use pandas, sqlalchemy, and cx_Oracle.
Installing and packaging all these libraries together exceeds the 250MB deployment package limit of AWS Lambda.
I would like to include only the .zip of the Oracle Basic Light Package, then extract and use it at runtime.
What I have tried
My project is structured as follows:
cx_Oracle-7.2.3.dist-info/
dateutil/
numpy/
pandas/
pytz/six-1.12.0.dist-info/
sqlalchemy/
SQLAlchemy-1.3.8.egg-info/
cx_Oracle.cpython-36m-x86_64-linux-hnu.so
instantclient-basiclite-linux.x64-19.3.0.0.0dbru.zip
main.py
six.py
template.yml
In main.py, I run the following:
import json, traceback, os
import sqlalchemy as sa
import pandas as pd
def main(event, context):
try:
unzip_oracle()
return {'statusCode': 200,
'body': json.dumps(run_query()),
'headers': {'Content-Type': 'application/json', 'Access-Control-Allow-Origin':'*'}}
except:
em = traceback.format_exc()
print("Error encountered. Error is: \n" + str(em))
return {'statusCode': 500,
'body': str(em),
'headers': {'Content-Type': 'application/json', 'Access-Control-Allow-Origin':'*'}}
def unzip_oracle():
print('extracting oracle drivers and copying results to /var/task/lib')
os.system('unzip /var/task/instantclient-basiclite-linux.x64-19.3.0.0.0dbru.zip -d /tmp')
print('extraction steps complete')
os.system('export ORACLE_HOME=/tmp/instantclient_19_3')
def get_db_connection():
return sa.engine.url.URL('oracle+cx_oracle',
username='do_not_worry', password='about_any',
host='of_these', port=1521,
query=dict(service_name='details')
)
def run_query():
query_text = """SELECT * FROM dont_worry_about_it"""
conn = sa.create_engine(get_db_connection())
print('Connected')
df = pd.read_sql(sa.text(query_text), conn)
print(df.shape)
return df.to_json(orient='records')
This returns the error:
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) DPI-1047: Cannot locate a 64-bit Oracle Client library: "libclntsh.so: cannot open shared object file: No such file or directory". See https://oracle.github.io/odpi/doc/installation.html#linux for help
(Background on this error at: http://sqlalche.me/e/4xp6)
What I have also tried
I've tried:
Adding
Environment:
Variables:
ORACLE_HOME: /tmp
LD_LIBRARY_PATH: /tmp
to template.yml and redeploying. Same error as above.
Adding os.system('export LD_LIBRARY_PATH=/tmp/instantclient_19_3') into the python script. Same error as above.
Many cp and ln things that were forbidden in Lambda outside of the /tmp folder. Same error as above.
One way that works, but is bad
If I make a folder called lib/ in the Lambda package, and include an odd assortment of libaio.so.1, libclntsh.so, etc. files, the function will work as expected, for some reason. I ended up with this:
<all the other libraries and files as above>
lib/
-libaio.so.1
-libclntsh.so
-libclntsh.so.10.1
-libclntsh.so.11.1
-libclntsh.so.12.1
-libclntsh.so.18.1
-libclntsh.so.19.1
-libclntshcore.so.19.1
-libipc1.so
-libmql1.so
-libnnz19.so
-libocci.so
-libocci.so.10.1
-libocci.so.11.1
-libocci.so.12.1
-libocci.so.18.1
-libocci.so.19.1
-libociicus.so
-libons.so
However, I chose these files through trial and error and don't want to go through this again.
Is there a way to unzip instantclient-basiclite-linux.x64-19.3.0.0.0dbru.zip in Lambda at runtime, and make Lambda see/use it to connect to an Oracle database?
I am not by any means an expert at python but this line seems very strange
print('extracting oracle drivers and copying results to /var/task/lib')
os.system('unzip /var/task/instantclient-basiclite-linux.x64-19.3.0.0.0dbru.zip -d /tmp')
print('extraction steps complete')
os.system('export ORACLE_HOME=/tmp/instantclient_19_3')
Normally, you you will have very limited access to OS level API with Lambda. And even when you do, It can behave the way you do not expect It to do. ( Think as if : Who owns the "unzip" feature? File created by this command would be visible / invokable by who? )
I see you mentioned that you have no issue extracting the files which is also a bit strange
My only answer for you is that
1/ Try to "bring your own" tools ( Unzip, etc.. )
2/ Never try to do OS level calls. Like os.system('export ...') , Always use the full path
Looking again at your question, seems like the way you specify environment variable is conflicting
ORACLE_HOME: /tmp
should not it be
Environment:
Variables:
ORACLE_HOME: /tmp/instantclient_19_3
LD_LIBRARY_PATH: /tmp/instantclient_19_3
Also, see: How to access an AWS Lambda environment variable from Python

AWS Lambda package deployment

I'm trying to deploy a python .zip package as an AWS Lambda
I choose the hello-python Footprint.
I created the 1st lambda with the inline code, after that I tried to change to upload from a development .zip.
The package I used is a .zip contains a single file called hello_python.py with the same code as the default inline code sample, which is shown below:
from __future__ import print_function
import json
print('Loading function')
def lambda_handler(event, context):
#print("Received event: " + json.dumps(event, indent=2))
print("value1 = " + event['key1'])
print("value2 = " + event['key2'])
print("value3 = " + event['key3'])
return event['key1'] # Echo back the first key value
#raise Exception('Something went wrong')
After I click "save and test", nothing happens, but I get this weird red ribbon, but no other substantive error messages. The logs and the run results do not exhibit any change if modifying to source, repackaging and uploading it again.
Lambda functions requires a handler in the format <FILE-NAME-NO-EXTENSION>.<FUNCTION-NAME>. In your case the handler is set to lambda_function.lambda_handler, which is the default value assigned by AWS Lambda). However, you've named your file hello_python.py. Therefore, AWS Lambda is looking for a python file named lambda_function.py and finding nothing.
To fix this either:
Rename your hello_python.py file to lambda_function.py
Modify your lambda function handler to be hello_python.lambda_handler
You can see an example of how this works in the documentation where they create a python function called my_handler() inside the file hello_python.py, and they create a lambda function to call it with the handler hello_python.my_handler.

Categories

Resources