Using PyRserve to call library(foreign) in Python

Using PyRserve to call library(foreign) in Python - python

I am trying to load a SPSS file into a Pandas DataFrame in Python, and am looking for easier ways to do it from more recent developments in using R codes in the Python environment, which lead me to PyRserve.
After connecting to PyRserve,
import pyRserve
conn = pyRserve.connect()
One can pretty much run basic r codes such as
conn.eval('3+5') #output = 8.0
However, if possible in PyRserve, how do one import an R library to load a dataframe with r codes like the ones below,
library(foreign)
dat<-read.spss("/path/spss_file.sav", to.data.frame=TRUE)
and hopefully onto a pandas DataFrame? Any thoughts are appreciated!

#import pyRserve
import pyRserve
#open pyRserve connection
conn = pyRserve.connect()
#load your rscript into a variable (you can even write functions)
test_r_script = '''
library(foreign)
dat<-read.spss("/path/spss_file.sav",
to.data.frame=TRUE)
'''
#do the connection eval
variable = conn.eval(test_r_script)
print variable
# closing the pyRserve connection
conn.close()
My apologies for not explaining it properly... I am adding my github link with it so you can see more examples . I think I have explained it properly there
https://github.com/shintojoseph1234/Rserve-and-pyRserve

Related

Create a Java UDF that uses geoip2 library with the database in a S3 bucket

Correct me if i'm wrong, but my understanding of the UDF function in Snowpark is that you can send the function UDF from your IDE and it will be executed inside Snowflake. I have a staged database called GeoLite2-City.mmdb inside a S3 bucket on my Snowflake account and i would like to use it to retrieve informations about an ip address. So my strategy was to
1 Register an UDF which would return a response string n my IDE Pycharm
2 Create a main function which would simple question the database about the ip address and give me a response.
The problem is that, how the UDF and my code can see the staged file at
s3://path/GeoLite2-City.mmdb
in my bucket, in my case i simply named it so assuming that it will eventually find it (with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:) since the
stage_location='#AWS_CSV_STAGE' is the same as were the UDF will be saved? But i'm not sure if i understand correctly what the option stage_location is referring exactly.
At the moment i get the following error:
"Cannot add package geoip2 because Anaconda terms must be accepted by ORGADMIN to use Anaconda 3rd party packages. Please follow the instructions at https://docs.snowflake.com/en/developer-guide/udf/python/udf-python-packages.html#using-third-party-packages-from-anaconda."
Am i importing geoip2.database correctly in order to use it with snowpark and udf?
Do i import it by writing session.add_packages('geoip2') ?
Thank You for clearing my doubts.
The instructions i'm following about geoip2 are here.
https://geoip2.readthedocs.io/en/latest/
my code:
from snowflake.snowpark import Session
import geoip2.database
from snowflake.snowpark.functions import col
import logging
from snowflake.snowpark.types import IntegerType, StringType
logger = logging.getLogger()
logger.setLevel(logging.INFO)
session = None
user = ''*********'
password = '*********'
account = '*********'
warehouse = '*********'
database = '*********'
schema = '*********'
role = '*********'
print("Connecting")
cnn_params = {
"account": account,
"user": user,
"password": password,
"warehouse": warehouse,
"database": database,
"schema": schema,
"role": role,
}
def first_udf():
with geoip2.database.Reader('GeoLite2-City.mmdb') as reader:
response = reader.city('203.0.113.0')
print('response.country.iso_code')
return response
try:
print('session..')
session = Session.builder.configs(cnn_params).create()
session.add_packages('geoip2')
session.udf.register(
func=first_udf
, return_type=StringType()
, input_types=[StringType()]
, is_permanent=True
, name='SNOWPARK_FIRST_UDF'
, replace=True
, stage_location='#AWS_CSV_STAGE'
)
session.sql('SELECT SNOWPARK_FIRST_UDF').show()
except Exception as e:
print(e)
finally:
if session:
session.close()
print('connection closed..')
print('done.')
UPDATE
I'm trying to solve it using a java udf as in my staging area i have the 'geoip2-2.8.0.jar' library staged already. If i could import it's methods to get the country of an ip it would be perfect, the problem is that i don't know how to do it exactly. I'm trying to follow these instructions https://maxmind.github.io/GeoIP2-java/.
I wanna interrogate the database and get as output the iso code of the country and i want to do it on snowflake worksheet.
CREATE OR REPLACE FUNCTION GEO()
returns varchar not null
language java
imports = ('#AWS_CSV_STAGE/lib/geoip2-2.8.0.jar', '#AWS_CSV_STAGE/geodata/GeoLite2-City.mmdb')
handler = 'test'
as
$$
def test():
File database = new File("geodata/GeoLite2-City.mmdb")
DatabaseReader reader = new DatabaseReader.Builder(database).build();
InetAddress ipAddress = InetAddress.getByName("128.101.101.101");
CityResponse response = reader.city(ipAddress);
Country country = response.getCountry();
System.out.println(country.getIsoCode());
$$;
SELECT GEO();

This will be more complicated that it looks:
To use session.add_packages('geoip2') in Snowflake you need to accept the Anaconda terms. This is easy if you can ask your account admin.
But then you can only get the packages that Anaconda has added to Snowflake in this way. The list is https://repo.anaconda.com/pkgs/snowflake/, and I don't see geoip2 there yet.
So you will need to package you own Python code (until Anaconda sees enough requests for geoip2 in the wishlist). I described the process here https://medium.com/snowflake/generating-all-the-holidays-in-sql-with-a-python-udtf-4397f190252b.
But wait! GeoIP2 is not pure Python, so you will need to wait until Anaconda packages the C extension libmaxminddb. But this will be harder, as you can see their docs don't offer a straightforward way like other pip installable C libraries.
So this will be complicated.
There are other alternative paths, like a commercial provider of this functionality (like I describe here https://medium.com/snowflake/new-in-snowflake-marketplace-monetization-315aa90b86c).
There other approaches to get this done without using a paid dataset, but I haven't written about that yet - but someone else might before I get to do it.
Btw, years ago I wrote something like this for BigQuery (https://cloud.google.com/blog/products/data-analytics/geolocation-with-bigquery-de-identify-76-million-ip-addresses-in-20-seconds), but today I was notified that Google recently deleted the tables that I had shared with the world (https://twitter.com/matthew_hensley/status/1598386009129058315).
So it's time to rebuild in Snowflake. But who (me?) and when is still a question.

Get SMQ queues in Python?

I'm using the PYRFC library and so far I've managed to connect to SAP.
conn = Connection(ashost='xxxxxxxxx', sysnr='02', client='100', user='xxxxx', passwd='xxxxxxxx.')
result = conn.call('STFC_CONNECTION', REQUTEXT=u'Hello SAP!')
print (result)
I did the connection test and everything was ok. But now I'm trying to run the queues created in SAP.
I performed some tests, trying to simulate the F8 but without success.
Is there any way to make this execution using via python?

Directly you cannot query SMQ1/2 results, for an extended explanation read my answer here.
To get the SMQ results in Pyton you need to find a module with an equivalent functionality, and good news pa-pam!, I did it for you: this is TRFC_QIN_GET_CURRENT_QUEUES function module.
How to call you can find it in any pyrfc tutorial. Probable sample:
def func():
import pyrfc
from pyrfc import Connection
conn=pyrfc.Connection(user='', passwd='', ashost='', etc.)
queue_name = '*'
client = '100'
result=con.call("TRFC_QIN_GET_CURRENT_QUEUES", qname=queue_name,
client=client)
print(result['QVIEW'])
all you need to put here is queue name (qname parameter) and client value (what is it?).

Weblogic: Import messages to a Uniform Distributed Queue from exported xml file using WLST

I am trying to write a script to import messages to a Uniform Distributed queue in Weblogic using WLST but I am unable to find a solution that specifically caters to my requirement.
Let me explain the requirement:
I have error queues that store failed messages. I have exported them as an xml file (using WLST) and segregated them based on the different error code in message header into smaller xml files which need to be imported into the main queue for reprocessing(not using Admin console).
I am sure that there is something that can be done to achieve this as I am able to import the segregated xml files using the import option in Admin console which works like a charm but have no idea how it is actually being done so that it could be implemented as a script.
I have explored a few options like exporting the files as a binary SER file which works but it is not something that can be used to filter out the retryable messages only.
The wlst method importMessages() only accepts a composite datatype array. Any method to convert/create the required composite Datatype array from the xml files would also be a great solution to the issue.

I agree it is not very simple and intuitive.
You have 2 solutions :
pure WLST code
java code using JMS API
If you want to write pure WLST code here is a code sample that will help you. The code creates and publish n messages into a queue.
The buildJMSMessage() function is responsible to create a text message.
from javax.management.openmbean import CompositeData
from weblogic.jms.extensions import JMSMessageInfo, JMSMessageFactoryImpl
...
def buildJMSMessage(text):
handle = 1
state = 1
XidString = None
sequenceNumber = 1
consumerID = None
wlmessage = JMSMessageFactoryImpl.getFactory().createTextMessage(text)
destinationName = ""
bodyIncluded = True
msg = JMSMessageInfo(handle, state, XidString, sequenceNumber, consumerID, wlmessage, destinationName, bodyIncluded)
return msg
....
quanity = 10
messages = jarray.zeros(quantity,CompositeData)
for i in range(0,quantity):
messages[i] = buildJMSMessage('Test message #'+str(i)).toCompositeData()
i = i + 1
queue.importMessages(messages, False)

Comparing Plumbr to other options for making a chart with R in a Python script

I plan on making a chart with ggplot in a python script. These are details about the project:
I have a script that runs on a remote machine and I can install anything within reason on the machine
The script runs in python and has data that I want to visualize stored as a dictionary
The script runs daily and the data always has the same structure
I think my best bet is to do this...
Write an R script that takes the data and creates the ggplot visualization
Use plumbr to create a rest API for my script
Send a call to the rest API and get a PNG of my plot in return
I'm also familiar with ggpy by yhat and I'm even wondering if I can install R on the machine and just send code directly to the machine to process it without having RStudio.
Would plumbr be a recommended and secure implementation?
This is a reproducible example-
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
**{this is the part that's missing}**
library(ggplot)
my_data %>% ggplot(aes(city_name, value)) + geom_col()
png("my_bar_chart.png", my_data)

As mentioned in the comments, most of your question should be answered here: Using R in Python with Rpy2: how to ggplot2?.
You can load ggplot with:
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggplot2 as ggp2
assuming of course you have ggplot2 + dependencies available.
Then you can almost use R syntax, except that you put ggp2. in front of every command.
E.g: the Python equivilant of ggplot(mtcars) would be ggp2.ggplot(mtcars).
Your example: (not tested)
my_data = [{"Chicago": "30"} {"New York": "50"}], [{"Cincinatti": "70"}, {"Green Bay": "95"}]
import rpy2.robjects.packages as packages
import rpy2.robjects.lib.ggp2 as ggp2
plot = ggp2.ggplot(my_data) +
ggp2.aes(city_name, value)) +
ggp2.geom_col()
plot.plot()
R("dev.copy(png, 'my_data.png')")

How to assign a 2d libreoffice calc named range to a python variable. Can do it in Libreoffice Basic

I can't seem to find a simple answer to the question. I have this successfully working in Libreoffice Basic:
NamedRange = ThisComponent.NamedRanges.getByName("transactions_detail")
RefCells = NamedRange.getReferredCells()
Set MainRange = RefCells.getDataArray()
Then I iterate over MainRange and pull out the rows I am interested in.
Can I do something similar in a python macro? Can I assign a 2d named range to a python variable or do I have to iterate over the range to assign the individual cells?
I am new to python but hope to convert my iteration intensive macro function to python in hopes of making it faster.
Any help would be much appreciated.
Thanks.

LibreOffice can be manipulated from Python with the library pyuno. The documentation of pyuno is unfortunately incomplete but going through this tutorial may help.
To get started:
Python-Uno, the library to communicate via Uno, is already in the LibreOffice Python’s path. To initialize your context, type the following lines in your python shell :
import socket # only needed on win32-OOo3.0.0
import uno
# get the uno component context from the PyUNO runtime
localContext = uno.getComponentContext()
# create the UnoUrlResolver
resolver = localContext.ServiceManager.createInstanceWithContext(
"com.sun.star.bridge.UnoUrlResolver", localContext )
# connect to the running office
ctx = resolver.resolve( "uno:socket,host=localhost,port=2002;urp;StarOffice.ComponentContext" )
smgr = ctx.ServiceManager
# get the central desktop object
desktop = smgr.createInstanceWithContext( "com.sun.star.frame.Desktop",ctx)
# access the current writer document
model = desktop.getCurrentComponent()
Then to get a named range and access the data as an array, you can use the following methods:
NamedRange = model.NamedRanges.getByName(“Test Name”)
MainRange = NamedRange.getDataArray()
However I am unsure that this will result in a noticeable preformance gain.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.