How to use a socket as source in PyFlink? - python

I would like to use a socket stream as input for my Flink workflow in Python. This works in scala with the socketTextStream() method, for instance
val stream = senv.socketTextStream("localhost", 9000, '\n')
I cannot find an equivalent in PyFlink, although it is briefly mentioned in the documentation. Any help is much appreeciated.

StreamExecutionEnvironment in pyflink do not supply socketTextStream api right now which is only supported in java StreamExecutionEnvironment
May be we can use the add_source api
custom_source = SourceFunction("org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction")
ds = self.env.add_source(custom_source, type_info=Types.ROW(Types.STRING()))

Related

Can someone suggest alternative to HdfsSensor for airflow python3?

I am trying to listen to changes in HDFS to trigger my ETL pipeline in Airflow using HdfsSensor in python3. I am getting the following error as snakebite is not supported for python3
This HDFSHook implementation requires snakebite, but '
ImportError: This HDFSHook implementation requires snakebite, but snakebite is not compatible with Python 3
Thanks to the suggestion by #AyushGoyal, I solved the same problem using WebHDFSSensor. This sensor looks like HdfsSensor and you can just replace the function names. just remember to make sure:
you pass the connection id via webhdfs_conn_id parameter (in HdfsSensor the parameter name was hdfs_conn_id)
the port with which you should try to connect to name node is 50700 (not 8020)
The rest is the same!
example:
from airflow.sensors.web_hdfs_sensor import WebHdfsSensor
file_sensor = WebHdfsSensor(
task_id='check_if_data_is_ready',
filepath="some_file_path",
webhdfs_conn_id='hdfs_conn_id',
poke_interval=10,
timeout=5,
dag=dag,
env={
'JAVA_HOME': '/usr/java/latest'
}
)

How to programmatically create topics using kafka-python?

I am getting started with Kafka and fairly new to Python. I am using this library named kafka-python to communicate with my Kafka broker. Now I need to dynamically create a topic from my code, from the docs what I see is I can call create_topics() method to do so, however I am not sure, how will I get an instance of this class. I am not able to understand this from the docs.
Can some one help me with this?
You first need to create an instance of KafkaAdminClient. The following should do the trick for you:
from kafka.admin import KafkaAdminClient, NewTopic
admin_client = KafkaAdminClient(
bootstrap_servers="localhost:9092",
client_id='test'
)
topic_list = [NewTopic(name="example_topic", num_partitions=1, replication_factor=1)]
admin_client.create_topics(new_topics=topic_list, validate_only=False)
Alternatively, you can use confluent_kafka client which is a lightweight wrapper around librdkafka:
from confluent_kafka.admin import AdminClient, NewTopic
admin_client = AdminClient({"bootstrap_servers": "localhost:9092"})
topic_list = [NewTopic("example_topic", 1, 1)]
admin_client.create_topics(topic_list)

How to prevent LDAP-injection in ldap3 for python3

I'm writing some python3 code using the ldap3 library and I'm trying to prevent LDAP-injection. The OWASP injection-prevention cheat sheet recommends using a safe/parameterized API(among other things). However, I can't find a safe API or safe method for composing search queries in the ldap3 docs. Most of the search queries in the docs use hard-coded strings, like this:
conn.search('dc=demo1,dc=freeipa,dc=org', '(objectclass=person)')
and I'm trying to avoid the need to compose queries in a manner similar to this:
conn.search(search, '(accAttrib=' + accName + ')')
Additionally, there seems to be no mention of 'injection' or 'escaping' or similar concepts in the docs. Does anyone know if this is missing in this library altogether or if there is a similar library for Python that provides a safe/parameterized API? Or has anyone encountered and solved this problem before?
A final point: I've seen the other StackOverflow questions that point out how to use whitelist validation or escaping as a way to prevent LDAP-injection and I plan to implement them. But I'd prefer to use all three methods if possible.
I was a little surprised that the documentation doesn't seem to mention this. However there is a utility function escape_filter_chars which I believe is what you are looking for:
from ldap3.utils import conv
attribute = conv.escape_filter_chars("bar)", encoding=None)
query = "(foo={0})".format(attribute)
conn.search(search, query)
I believe the best way to prevent LDAP injection in python3 is to use the Abstraction Layer.
Example code:
# First create a connection to ldap to use.
# I use a function that creates my connection (abstracted here)
conn = self.connect_to_ldap()
o = ObjectDef('groupOfUniqueNames', conn)
query = 'Common Name: %s' % cn
r = Reader(conn, o, 'dc=example,,dc=org', query)
r.search()
Notice how the query is abstracted? The query would error if someone tried to inject a search here. Also, this search is protected by a Reader instead of a Writer. The ldap3 documentation goes through all of this.

Modules or functions to obtain information from a network interface in Python

I wrote a little application that I use from the terminal in Linux to keep track of the amount of data up and down that I consume in a session of Internet connection (I store the info in MongoDB). The data up and down I write by hand and read them (visually) from the monitor system, the fact is that I would like to automate more my application and make it read data consumed up and down from the interface network i use to connect to internet (in my case ppp0), but the detail is in that I does not find the way to do in Python. I guess Python have a module to import or something that lets me do what I want, but until now I have researched I have not found a way to do it.
Do you know of any module, function or similar that allows me to do in python what I want?
any example?
thanks in advance
Well I answer myself
Found in the community PyAr this recipe to me me like a glove going to do what we wanted without having to use extra commands or other applications.
Slightly modifying the code to better suit my application and add a function that comvierta of bytes to Megabytes leave it like this:
def bytestomb(b):
mb = float(b) / (1024*1024)
return mb
def bytessubidatransferidos():
interface= 'ppp0'
for line in open('/proc/net/dev', 'r'):
if interface in line:
data = line.split('%s:' % interface)[1].split()
tx_bytes = (data[8])
return bytestomb(tx_bytes)
def bytesbajadatransferidos():
interface= 'ppp0'
for line in open('/proc/net/dev', 'r'):
if interface in line:
data = line.split('%s:' % interface)[1].split()
rx_bytes = (data[0])
return bytestomb(rx_bytes)
print bytessubidatransferidos()
print bytesbajadatransferidos()

Discovery of web services using Python

I have several devices on a network. I am trying to use a library to discover the presence and itentity of these devices using Python script, the devices all have a web service. My question is, are there any modules that would help me with this problem as the only module I have found is ws-discovery for Python?
And if this is the only module does anyone have any example Python script using ws-discovery?
Thanks for any help.
Unfortunately I've never used ws-discovery myself, but there seems to be a Python project which implements it:
https://pypi.org/project/WSDiscovery/
From their documentation here's a short example on how to use it:
wsd = WSDiscovery()
wsd.start()
ttype = QName("abc", "def")
ttype1 = QName("namespace", "myTestService")
scope1 = Scope("http://myscope")
ttype2 = QName("namespace", "myOtherTestService_type1")
scope2 = Scope("http://other_scope")
xAddr = "localhost:8080/abc"
wsd.publishService(types=[ttype], scopes=[scope2], xAddrs=[xAddr])
ret = wsd.searchServices()
for service in ret:
print service.getEPR() + ":" + service.getXAddrs()[0]
wsd.stop()
Are you tied to ws-discovery? If not, you might want to consider the Bonjour protocol, aka ZeroConf and DNS-SD. The protocol is relatively widely implemented. I've never used python to do the advertising or discovery but there is a project that implements an API: http://code.google.com/p/pybonjour/
As I said, I have no direct experience with this project and merely point it out as an alternative to ws-discovery.

Categories

Resources