I've created an ec2 instance with AWS CDK in python. I've added a security group and allowed ingress rules for ipv4 and ipv6 on port 22. The keypair that I specified, with the help of this stack question has been used in other EC2 instances set up with the console with no issue.
Everything appears to be running, but my connection keeps timing out. I went through the checklist of what usually causes this provided by amazon, but none of those common things seems to be the problem (at least to me).
Why can't I connect with my ssh keypair from the instance I made with AWS CDK? I'm suspecting the KeyName I am overriding is not the correct name in Python, but I can't find it in the cdk docs.
Code included below.
vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_name=os.getenv("VPC_NAME"))
sec_group = ec2.SecurityGroup(self, "SG", vpc=vpc, allow_all_outbound=True)
sec_group.add_ingress_rule(ec2.Peer.any_ipv4(), connection=ec2.Port.tcp(22))
sec_group.add_ingress_rule(ec2.Peer.any_ipv6(), connection=ec2.Port.tcp(22))
instance = ec2.Instance(
self,
"name",
vpc=vpc,
instance_type=ec2.InstanceType.of(ec2.InstanceClass.T2, ec2.InstanceSize.MICRO),
machine_image=ec2.AmazonLinuxImage(
generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2
),
security_group=sec_group,
)
instance.instance.add_property_override("KeyName", os.getenv("KEYPAIR_NAME"))
elastic_ip = ec2.CfnEIP(self, "EIP", domain="vpc", instance_id=instance.instance_id)
This is an issue with internet reachability, not your SSH key.
By default, your instance is placed into a private subnet (docs), so it will not have inbound connectivity from the internet.
Place it into a public subnet and it should work.
Also, you don't have to use any overrides to set the key - use the built-in key_name argument. And you don't have to create the security group - use the connections abstraction. Here's the complete code:
vpc = ec2.Vpc.from_lookup(self, "VPC", vpc_name=os.getenv("VPC_NAME"))
instance = ec2.Instance(
self,
"name",
vpc=vpc,
instance_type=ec2.InstanceType.of(ec2.InstanceClass.T2, ec2.InstanceSize.MICRO),
machine_image=ec2.AmazonLinuxImage(
generation=ec2.AmazonLinuxGeneration.AMAZON_LINUX_2
),
key_name=os.getenv("KEYPAIR_NAME"),
vpc_subnets=ec2.SubnetSelection(subnet_type=ec2.SubnetType.PUBLIC),
)
instance.connections.allow_from_any_ipv4(ec2.Port.tcp(22))
elastic_ip = ec2.CfnEIP(self, "EIP", domain="vpc", instance_id=instance.instance_id)
Related
I am working on a python3.6 script that provisions a network and subnet using the openstack sdk module on OSP16. The subnet I need to add needs to have the gateway set to none. I see in the openstack CLI documentation that the --gateway switch has several options - the default is 'auto', or you specify an IP address, or you can enter none. However, if I use none via the python module I get an error:
Invalid input for gateway_ip. Reason: 'none' is not a valid IP address.
The CLI commands that work look like this:
openstack network create NW-1-PVT
openstack subnet create NW-1-PVTv6 --subnet-range FD02:F160:02:06:0:10F:0:0/64 --no-dhcp --gateway none --ip-version 6 --network NW-1-PVT
When I use the python module I get an error stating that the IP is invalid for the gateway IP, even though none is a valid parameter for the gateway IP (at least for the CLI).
# Create the Network
netw = conn.network.create_network(
name='NW-1-PVT'
)
# Create the subnet
subn = conn.network.create_subnet(
name = 'NW-1-PVTv6',
network_id = netw.id,
ip_version = '6',
cidr = FD02:F160:02:06:0:10F:0:0/64,
gateway_ip = 'none',
is_dhcp_enabled = False
)
Any assistance is appreciated.
I am facing a problem to make my Apache Beam pipeline work on Cloud Dataflow, with DataflowRunner.
The first step of the pipeline is to connect to an external Postgresql server hosted on a VM which is only externally accessible through SSH, port 22, and extract some data. I can't change these firewalling rules, so I can only connect to the DB server via SSH tunneling, aka port-forwarding.
In my code I make use of the python library sshtunnel. It works perfectly when the pipeline is launched from my development computer with DirectRunner:
from sshtunnel import open_tunnel
with open_tunnel(
(user_options.ssh_tunnel_host, user_options.ssh_tunnel_port),
ssh_username=user_options.ssh_tunnel_user,
ssh_password=user_options.ssh_tunnel_password,
remote_bind_address=(user_options.dbhost, user_options.dbport)
) as tunnel:
with beam.Pipeline(options=pipeline_options) as p:
(p | "Read data" >> ReadFromSQL(
host=tunnel.local_bind_host,
port=tunnel.local_bind_port,
username=user_options.dbusername,
password=user_options.dbpassword,
database=user_options.dbname,
wrapper=PostgresWrapper,
query=select_query
)
| "Format CSV" >> DictToCSV(headers)
| "Write CSV" >> WriteToText(user_options.export_location)
)
The same code, launched with DataflowRunner inside a non-default VPC where all ingress are deny but no egress restriction, and CloudNAT configured, fails with this message:
psycopg2.OperationalError: could not connect to server: Connection refused Is the server running on host "0.0.0.0" and accepting TCP/IP connections on port 41697? [while running 'Read data/Read']
So, obviously something is wrong with my tunnel but I cannot spot what exactly. I was beginning to wonder whether a direct SSH tunnel setup was even possible through CloudNAT, until I found this blog post: https://cloud.google.com/blog/products/gcp/guide-to-common-cloud-dataflow-use-case-patterns-part-1 stating:
A core strength of Cloud Dataflow is that you can call external services for data enrichment. For example, you can call a micro service to get additional data for an element.
Within a DoFn, call-out to the service (usually done via HTTP). You have full control to make any type of connection that you choose, so long as the firewall rules you set up within your project/network allow it.
So it should be possible to set up this tunnel ! I don't want to give up but I don't know what to try next. Any idea ?
Thanks for reading
Problem solved ! I can't believe I've spent two full days on this... I was looking completely in the wrong direction.
The issue was not with some Dataflow or GCP networking configuration, and as far as I can tell...
You have full control to make any type of connection that you choose, so long as the firewall rules you set up within your project/network allow it
is true.
The problem was of course in my code : only the problem was revealed only in a distributed environment. I had make the mistake of opening the tunnel from the main pipeline processor, instead of the workers. So the SSH tunnel was up but not between the workers and the target server, only between the main pipeline and the target!
To fix this, I had to change my requesting DoFn to wrap the query execution with the tunnel :
class TunnelledSQLSourceDoFn(sql.SQLSourceDoFn):
"""Wraps SQLSourceDoFn in a ssh tunnel"""
def __init__(self, *args, **kwargs):
self.dbport = kwargs["port"]
self.dbhost = kwargs["host"]
self.args = args
self.kwargs = kwargs
super().__init__(*args, **kwargs)
def process(self, query, *args, **kwargs):
# Remote side of the SSH Tunnel
remote_address = (self.dbhost, self.dbport)
ssh_tunnel = (self.kwargs['ssh_host'], self.kwargs['ssh_port'])
with open_tunnel(
ssh_tunnel,
ssh_username=self.kwargs["ssh_user"],
ssh_password=self.kwargs["ssh_password"],
remote_bind_address=remote_address,
set_keepalive=10.0
) as tunnel:
forwarded_port = tunnel.local_bind_port
self.kwargs["port"] = forwarded_port
source = sql.SQLSource(*self.args, **self.kwargs)
sql.SQLSouceInput._build_value(source, source.runtime_params)
logging.info("Processing - {}".format(query))
for records, schema in source.client.read(query):
for row in records:
yield source.client.row_as_dict(row, schema)
as you can see, I had to override some bits of pysql_beam library.
Finally, each worker open its own tunnel for each request. It's probably possible to optimize this behavior but it's enough for my needs.
I currently have a .csv file in an S3 bucket that I'd like to append to a table in a Redshift database using a Python script. I have a separate file parser and upload to S3 that work just fine.
The code I have for connecting to/copying into the table is below here. I get the following error message:
OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out (0x0000274C/10060)
Is the server running on host "redshift_cluster_name.unique_here.region.redshift.amazonaws.com" (18.221.51.45) and accepting
TCP/IP connections on port 5439?
I can confirm the following:
Port is 5439
Not encrypted
Cluster name/DB name/username/password are all correct
Publicly accessible set to "Yes"
What should I be fixing to make sure I can connect my file in S3 to Redshift? Thank you all for any help you can provide.
Also I have looked around on Stack Overflow and ServerFault but these seem to either be for MySQL to Redshift or the solutions (like the linked ServerFault CIDR solution) did not work.
Thank you for any help!
DATABASE = "db"
USER = "user"
PASSWORD = "password"
HOST = "redshift_cluster_name.unique_here.region.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "public"
S3_FULL_PATH = 's3://bucket/file.csv'
#ARN_CREDENTIALS = 'arn:aws:iam::aws_id:role/myRedshiftRole'
REGION = 'region'
############ CONNECTING AND CREATING SESSIONS ############
connection_string = f"redshift+psycopg2://{USER}:{PASSWORD}#{HOST}:{PORT}/{DATABASE}"
engine = sa.create_engine(connection_string)
session = sessionmaker()
session.configure(bind=engine)
s = session()
SetPath = f"SET search_path TO {SCHEMA}"
s.execute(SetPath)
###########################################################
############ RUNNING COPY ############
copy_command = f
'''
copy category from '{S3_FULL_PATH}'
credentials 'aws_iam_role={ARN_CREDENTIALS}'
delimiter ',' region '{REGION}';
'''
s.execute(copy_command)
s.commit()
######################################
#################CLOSE SESSION################
s.close()
##############################################
Connecting via a Python program would require the same connectivity as connecting from an SQL Client.
I created a new cluster so I could document the process for you.
Here's the steps I took:
Created a VPC with CIDR of 10.0.0.0/16. I don't really need to create another VPC, but I want to avoid any problems with prior configurations.
Created a Subnet in the VPC with CIDR of 10.0.0.0/24.
Created an Internet Gateway and attached it to the VPC.
Edited the default Route Table to send 0.0.0.0/0 traffic to the Internet Gateway. (I'm only creating a public subnet, so don't need a route table for private subnet.)
Created a Redshift Cluster Subnet Group with the single subnet I created.
Launch a 1-node Redshift cluster into the Cluster Subnet Group. Publicly accessible = Yes, default Security Group.
Went back to the VPC console to edit the Default Security Group. Added an Inbound rule for Redshift from Anywhere.
Waited for the Cluster to become ready.
I then used DbVisualizer to login to the database. Success!
The above steps made a publicly-available Redshift cluster and I connected to it from my computer on the Internet.
I am trying to get all the IPs (attached to VMs) from an azure subscription.
I have pulled all the VMs using
compute_client = ComputeManagementClient(credentials, subscription_id)
network_client = NetworkManagementClient(credentials,subscription_id)
for vm in compute_client.virtual_machines.list_all():
print(vm.network_profile.network_interface)
But the network_profile object seems to only be a pointer, I have read through the documentation and can not figure out how to link each vm to its attached IP addresses
I came across this: Is there any python API which can get the IP address (internal or external) of Virtual machine in Azure
But it seems that something has changed.
I am able to resolve the IPs of a machine only if I know the name of the Public_IP address object(Which not all of them have public IPs).
I need to be able to take this network_interface and resolve the IP on it
So It seems that in order to get the IPs, you need to parse the URI given in the vm.network_profile.network_interface. Then use the the subscription and the nic name to get the IP using network_client.network_interfaces.get().
The code I used is below:
compute_client = ComputeManagementClient(credentials, subscription_id)
network_client = NetworkManagementClient(credentials,subscription_id)
try:
get_private(compute_client, network_client)
except:
print("Auth failed on "+ subscription_id)
def get_private(compute_client, network_client):
for vm in compute_client.virtual_machines.list_all():
for interface in vm.network_profile.network_interfaces:
name=" ".join(interface.id.split('/')[-1:])
sub="".join(interface.id.split('/')[4])
try:
thing=network_client.network_interfaces.get(sub, name).ip_configurations
for x in thing:
print(x.private_ip_address)
except:
print("nope")
In this example you could also do x.public_ip_address to get the public IPs
As your said, indeed, something has changed, but not much.
First as below, NetworkManagementClientConfiguration has been remove, see the details in the link.
network_client = NetworkManagementClient(credentials,subscription_id)
Second, according to the source code, the parameter public_ip_address_name is the name of the subnet, cease to be the vm name.
# Resource Group
GROUP_NAME = 'azure-sample-group-virtual-machines'
# Network
SUBNET_NAME = 'azure-sample-subnet'
PUBLIC_IP_NAME = SUBNET_NAME
public_ip_address = network_client.public_ip_addresses.get(GROUP_NAME, PUBLIC_IP_NAME)
Then, you can also the private_ip_address & public_ip_address via the IPConfiguration from the PublicIPAddress
print(public_ip_address.ip_configuration.private_ip_address)
print(public_ip_address.ip_configuration.public_ip_address)
I am using txloadbalancer twisted API in my application and it works great. I have one problem tho, I can't figure out a way too add hosts to a running instance.
I use this function for now:
#pm is a ProxyManager
def addServiceToPM(pm, service):
if isinstance(service, model.HostMapper):
[service] = model.convertMapperToModel([service])
for groupName, group in pm.getGroups(service.name):
proxiedHost = service.getGroup(groupName).getHosts()[0][1]
pm.getGroup(service.name, groupName).addHost(proxiedHost)
tracker = HostTracking(group)
scheduler = schedulers.schedulerFactory(group.lbType, tracker)
pm.addTracker(service.name, groupName, tracker)
and run it with a new host
addServiceToPM(pm, HostMapper(proxy='127.0.0.1:8080', lbType=roundr,
host='host2', address='127.0.0.1:10002'))
This adds the host correctly to the tracker, but not to the proxy service and it is thus not used in the load balancing. Do anyone have a clue about how to do this?
So, I ended up staring at the source code until the answer appeared.
If you, as in my case, want to add a new host to a existing proxy and group you use
def addHostToLB(pm, proxy, group, newHost, newHostName):
tracker = pm.getTracker(proxy, group)
tracker.newHost(newHost, newHostName)