How to programmatically create topics using kafka-python? - python

I am getting started with Kafka and fairly new to Python. I am using this library named kafka-python to communicate with my Kafka broker. Now I need to dynamically create a topic from my code, from the docs what I see is I can call create_topics() method to do so, however I am not sure, how will I get an instance of this class. I am not able to understand this from the docs.
Can some one help me with this?

You first need to create an instance of KafkaAdminClient. The following should do the trick for you:
from kafka.admin import KafkaAdminClient, NewTopic
admin_client = KafkaAdminClient(
bootstrap_servers="localhost:9092",
client_id='test'
)
topic_list = [NewTopic(name="example_topic", num_partitions=1, replication_factor=1)]
admin_client.create_topics(new_topics=topic_list, validate_only=False)
Alternatively, you can use confluent_kafka client which is a lightweight wrapper around librdkafka:
from confluent_kafka.admin import AdminClient, NewTopic
admin_client = AdminClient({"bootstrap_servers": "localhost:9092"})
topic_list = [NewTopic("example_topic", 1, 1)]
admin_client.create_topics(topic_list)

Related

How to use a socket as source in PyFlink?

I would like to use a socket stream as input for my Flink workflow in Python. This works in scala with the socketTextStream() method, for instance
val stream = senv.socketTextStream("localhost", 9000, '\n')
I cannot find an equivalent in PyFlink, although it is briefly mentioned in the documentation. Any help is much appreeciated.
StreamExecutionEnvironment in pyflink do not supply socketTextStream api right now which is only supported in java StreamExecutionEnvironment
May be we can use the add_source api
custom_source = SourceFunction("org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction")
ds = self.env.add_source(custom_source, type_info=Types.ROW(Types.STRING()))

boto3 eks client how to generate presigned url

I'm trying to update a docker image within a deployment in EKS. I'm running a python code from a lambda function. However, I don't know how to use generate_presigned_url(). What should I pass as ClientMethod parameter???
import boto3
client = boto3.client("eks")
url = client.generate_presigned_url()
These are the clientMethods that you could perform in case of EKS.
'associate_encryption_config'
'associate_identity_provider_config'
'can_paginate'
'create_addon'
'create_cluster'
'create_fargate_profile'
'create_nodegroup'
'delete_addon'
'delete_cluster'
'delete_fargate_profile'
'delete_nodegroup'
'describe_addon'
'describe_addon_versions'
'describe_cluster'
'describe_fargate_profile'
'describe_identity_provider_config'
'describe_nodegroup'
'describe_update'
'disassociate_identity_provider_config'
'generate_presigned_url'
'get_paginator'
'get_waiter'
'list_addons'
'list_clusters'
'list_fargate_profiles'
'list_identity_provider_configs'
'list_nodegroups'
'list_tags_for_resource'
'list_updates'
'tag_resource'
'untag_resource'
'update_addon'
'update_cluster_config'
'update_cluster_version'
'update_nodegroup_config'
'update_nodegroup_version'
You can get more information about these method in the documentation here: https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/eks.html#client
After over two weeks I suppose you've found your answer, anyway the ClientMethod mentioned (and, not really well explained on the boto3 docs) is just one of the methods you can use with the EKS client itself. I honestly think this is what KnowledgeGainer was trying to say by listing all the methods, basically you can just pick one. This would give you the presigned URL.
For example, here I'm using one method that isn't requiring any additional arguments, list_clusters:
>>> import boto3
>>> client = boto3.client("eks")
>>> client.generate_presigned_url("list_clusters")
'https://eks.eu-west-1.amazonaws.com/clusters?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAQKOXLHHBFT756PNG%2F20210528%2Feu-west-1%2Feks%2Faws4_request&X-Amz-Date=20210528T014603Z&X-Amz-Expires=3600&X-Amz-SignedHeaders=host&X-Amz-Signature=d25dNCC17013ad9bc75c04b6e067105c23199c23cbadbbbeForExample'
If the method requires any additional arguments, you add those into Params as a dictionary:
>>> method_params = {'name': <your_cluster_name>}
>>> client.generate_presigned_url('describe_cluster', Params=method_params)

Is there a way to create log compacted topic using kafka-python library?

kafka-python contains multiple modules to create/delete topic and also pass multiple configuration while doing so.
Is there a way to add additional configuration to following method -
NewTopic(name=topicname, num_partitions=1, replication_factor=1)
Yes it's possible to create a compacted topic with kafka-python.
The NewTopic constructor accepts a topic_configs argument to specify the topic configurations.
For example:
from kafka import KafkaAdminClient
from kafka.admin import NewTopic
admin = KafkaAdminClient(bootstrap_servers=['localhost:9092'])
topic = NewTopic('bar', 1, 1, topic_configs={'cleanup.policy': 'compact'})
response = admin.create_topics([topic])
print(response)

Confluent-Kafka Python : How to list all topics programmatically

I have a kafka broker running in my local system. To communicate with the broken using my Django based web application I am using confluent-kafka wrapper. However by browsing through the admin api, I could not find out any api for listing kafka topics. (The topics are created pragmatically and are dynamic).
Is there any way I could list them within my program?
The requirement is that, if my worker reboots all assigned consumers listening to those topics will have to be re-initialized so I want to loop to all the topics and assign a consumer to each.
Here is how to do it:
>>> from confluent_kafka.admin import AdminClient
>>> conf = {'bootstrap.servers': 'vps01:9092,vps02:9092,vps03:9092'}
>>> kadmin = AdminClient(conf)
>>> kadmin.list_topics().topics # Returns a dict(). See example below.
{'topic01': TopicMetadata(topic01, 3 partitions),}
I hope that helps.
Two ways you can achieve it:
from confluent_kafka import Consumer
consummer_config = {
'bootstrap.servers': 'localhost:9092',
'group.id': 'group1'
}
consumer = Consumer(consummer_config)
consumer.list_topics().topics
This will return a dictionary something like this:
{'raw_request': TopicMetadata(raw_request, 4 partitions),
'raw_requests': TopicMetadata(raw_requests, 1 partitions),
'idw': TopicMetadata(idw, 1 partitions),
'__consumer_offsets': TopicMetadata(__consumer_offsets, 50 partitions)}
the other way is already posted by #NYCeyes
Try searching again for list_topics
https://docs.confluent.io/5.2.1/clients/confluent-kafka-python/index.html#confluent_kafka.Consumer.list_topics
https://docs.confluent.io/5.2.1/clients/confluent-kafka-python/index.html#confluent_kafka.Producer.list_topics

Who created an Amazon EC2 instance using Boto and Python?

I want to know who created a particular instance. I am using Cloud Trail to find out the statistics, but I am not able to get a particular statistics of who created that instance. I am using Python and Boto3 for finding out the details.
I am using this code- Lookup events() from Cloud trail in boto3, to extract the information about an instance.
ct_conn = sess.client(service_name='cloudtrail',region_name='us-east-1')
events=ct_conn.lookup_events()
I found out the solution to the above problem using lookup_events() function.
ct_conn = boto3.client(service_name='cloudtrail',region_name='us-east-1')
events_dict= ct_conn.lookup_events(LookupAttributes=[{'AttributeKey':'ResourceName', 'AttributeValue':'i-xxxxxx'}])
for data in events_dict['Events']:
json_file= json.loads(data['CloudTrailEvent'])
print json_file['userIdentity']['userName']
#Karthik - Here is the sample of creating session
import boto3
import json
import os
session = boto3.Session(region_name='us-east-1',aws_access_key_id=os.environ['AWS_ACCESS_KEY_ID'],aws_secret_access_key=os.environ['AWS_SECRET_ACCESS_KEY'])
ct_conn = session.client(service_name='cloudtrail',region_name='us-east-1')
events_dict= ct_conn.lookup_events(LookupAttributes=[{'AttributeKey':'ResourceName', 'AttributeValue':'i-xxx'}])
for data in events_dict['Events']:
json_file= json.loads(data['CloudTrailEvent'])
print (json_file['userIdentity']['userName'])

Categories

Resources