Python write to CSV - python

i want to export user and last logs to csv file, but i recive on file only last line from connection and not all ssh response
import yaml
import os
import functools
import datetime
csv_file = open(filename,'w+')
csv_file.write("%s,%s,%s,%s\n" % ('name' , 'ssh_ec2user' , 'ssh_centosuser' , 'ssh_nginx_log'))
csv_file.flush()
for instance in running_instances:
if (instance.tags == None or instance.tags == ""): continue
for tag in instance.tags:
if 'Name' in tag['Key']:
name = tag['Value']
print(name)
instance_private_ip = (instance.private_ip_address)
print(instance_private_ip)
ssh_ec2user = os.system("ssh -t -t -i %s -n -o StrictHostKeyChecking=no ec2-user#%s 'sudo touch last.txt;sudo chmod 777 last.txt;sudo last > last.txt; sudo grep -v user last.txt |head -n3'" % (identity_file , instance_private_ip))
ssh_centosuser = os.system("ssh -t -t -i %s -n -o StrictHostKeyChecking=no centos#%s 'sudo touch last.txt;sudo chmod 777 last.txt;sudo last > last.txt; sudo grep -v centos last.txt |head -n3'" % (identity_file , instance_private_ip))
ssh_nginx_log = "test nginx"
print(ssh_ec2user,user, ssh_nginx_log)
csv_file.write("\'%s\',\'%s\',\'%s\',\'%s\'\n" %(name,ssh_ec2user,ssh_centosuser,ssh_nginx_log))csv_file.flush()
for example per line i need to receive:
user pts/0 172.21.0.114 Thu Jan 25 12:30 - 13:38 (01:08)
user pts/0 172.21.2.130 Wed Jan 17 15:11 - 15:17 (00:05)
user pts/0 172.21.2.130 Wed Jan 17 09:27 - 09:46 (00:18)
Connection to 1.1.1.1 closed.
65280 0
test nginx
and in file a only receive:
65280 0
how i can input to the same line all answer:
user pts/0 172.21.0.114 Thu Jan 25 12:30 - 13:38 (01:08)
user pts/0 172.21.2.130 Wed Jan 17 15:11 - 15:17 (00:05)
user pts/0 172.21.2.130 Wed Jan 17 09:27 - 09:46 (00:18)
Connection to 1.1.1.1 closed.
65280 0
tnx

Use csv library.
import csv
writer = csv.writer(csv_file)
writer.writerow(['name' , 'ssh_ec2user' , 'ssh_centosuser' , 'ssh_nginx_log'])
...
writer.writerow([name,ssh_ec2user,ssh_centosuser,ssh_nginx_log])
The output will not be on the same line, but will be correctly escaped so that if you open it with Excel or OpenCal or similar will be displayed correctly.
Also you can have ',' characters in your string without messing up the format.

Related

How to convert test log file to json in a prescribed way

I have a log file which is below.trying to take first server details 192.168.1.1 and check when it is connected and disconnected.then go to second server 192.168.1.2 details and check when it is connected and disconnected. Like way need to determine the connection time and disconnected time of all servers
str_ = '''Jan 23 2016 11:30:08AM - ssh 22 192.168.1.1 connected
Jan 23 2016 12:04:56AM - ssh 22 192.168.1.2 connected
Jan 23 2016 2:18:32PM - ssh 22 192.168.1.2 disconnected
Jan 23 2016 5:16:09PM - un x Dos attack from 201.10.0.4
Jan 23 2016 10:43:44PM - ssh 22 192.168.1.1 disconnected
Feb 1 2016 1:40:28AM - ssh 22 192.168.1.1 connected
Feb 1 2016 2:21:52AM - un x Dos attack from 201.168.123.1
Mar 29 2016 2:13:07PM - ssh 22 192.168.1.1 disconnected'''
How to convert my log file in to json
My Expected out
{1:{192.168.1.1:[(connected,Jan 23 2016 11:30:08AM),(disconnected,Jan 23 2016 10:43:44PM)]},
2:{192.168.1.2:[(connected,Jan 23 2016 12:04:56AM),(disconnected,Jan 23 2016 2:18:32PM)]},
3:{192.168.1.1:[(connected,Feb 1 2016 1:40:28AM),(disconnected,Mar 29 2016 2:13:07PM )]},
4:{Dos:[201.10.0.4,201.168.123.1]}}
My Pseudo code
import json
import re
i = 1
result = {}
with open('test.log') as f:
lines = f.readlines()
for line in lines:
r = line.split('')
#result[i] = {}
i += 1
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
Why do you need dict keyed by entry numbers {1: xxx, 2: yyy, 3: zzz}? I'll advise using just a list instead - [xxx, yyy, zzz]. You can get an entry by index and so on. Technically json can't use numbers as keys.
There is no logic to group connected and disconnected events in your pseudocode.
Some lines from log don't has connect/disconnect info, so you need some logic for it too.
lines = f.readlines(); for line in lines: may eat lots of memory for large log files, just use for lines in f:
So, I think you need something like:
import json
import re
result = []
opened = {}
with open('test.log') as f:
for line in f:
date, rest = line.split(' - ', 1)
rest, last = rest.strip().rsplit(' ', 1)
ip = rest.rsplit(' ', 1)[1]
if last == 'connected':
entry = {ip: [(last, date)]}
opened[ip] = entry
result.append(entry)
elif last == 'disconnected':
opened[ip][ip].append((last, date))
del opened[ip]
print(result)
with open('data.json', 'w') as fp:
json.dump(result, fp)
It works for your sample, but needs more error checking for other logs

Airflow BashOperator OSError: [Errno 2] No such file or directory

I keep getting the same error from a scheduled BashOperator that is currently back-filling (it's over a month "behind").
[2018-06-10 22:06:33,558] {base_task_runner.py:115} INFO - Running: ['bash', '-c', u'airflow run dag_name task_name 2018-03-14T00:00:00 --job_id 50 --raw -sd DAGS_FOLDER/dag_file.py']
Traceback (most recent call last):
File "/anaconda/bin//airflow", line 27, in <module>
args.func(args)
File "/anaconda/lib/python2.7/site-packages/airflow/bin/cli.py", line 387, in run
run_job.run()
File "/anaconda/lib/python2.7/site-packages/airflow/jobs.py", line 198, in run
self._execute()
File "/anaconda/lib/python2.7/site-packages/airflow/jobs.py", line 2512, in _execute
self.task_runner.start()
File "/anaconda/lib/python2.7/site-packages/airflow/task_runner/bash_task_runner.py", line 29, in start
self.process = self.run_command(['bash', '-c'], join_args=True)
File "/anaconda/lib/python2.7/site-packages/airflow/task_runner/base_task_runner.py", line 120, in run_command
universal_newlines=True
File "/anaconda/lib/python2.7/subprocess.py", line 394, in __init__
errread, errwrite)
File "/anaconda/lib/python2.7/subprocess.py", line 1047, in _execute_child
raise child_exception
OSError: [Errno 2] No such file or directory
[2018-06-10 22:06:33,633] {sequential_executor.py:47} ERROR - Failed to execute task Command 'airflow run dag_name task_name 2018-03-14T00:00:00 --local -sd /var/lib/airflow/dags/dag_file.py' returned non-zero exit status 1.
I remember seeing something that suggested this might be a permissions issue, but I can't figure out which permissions might be involved.
I'm using a systemd configuration--and at my wit's end--I've taken to running the airflow webserver and scheduler as root.
I can take the list in the first line and enter it verbatim in an ipython shell as args to a subprocess.Popen instance (as it is in airflow/task_runner/base_task_runner.py; save no envs) and not only does it run but it correctly informs the airflow db that the task is complete. I can do this as user Airflow, root, or ubuntu.
I've added /anaconda/bin to the PATH in .bashrc for Airflow, root, ubuntu, and /etc/bash.bashrc in addition to the value for AIRFLOW_HOME which is also in my env file /etc/airflow.
This is what my systemd entry looks like:
[Unit]
Description=Airflow scheduler daemon
After=network.target postgresql.service mysql.service redis.service rabbitmq-server.service
Wants=postgresql.service mysql.service redis.service rabbitmq-server.service
[Service]
EnvironmentFile=/etc/airflow
User=root
Group=root
Type=simple
ExecStart=/anaconda/bin/airflow scheduler
Restart=always
RestartSec=5s
[Install]
WantedBy=multi-user.target
My env file:
PATH=$PATH:/anaconda/bin/
AIRFLOW_HOME=/var/lib/airflow
AIRFLOW_CONFIG=$AIRFLOW_HOME/airflow.cfg
Using apache-airflow==1.9.0 and desperate for a solution. Thanks in advance.
Airflow.cfg:
[core]
airflow_home = /var/lib/airflow
dags_folder = /var/lib/airflow/dags
base_log_folder = /var/lib/airflow/logs
remote_log_conn_id =
encrypt_s3_logs = False
logging_level = INFO
logging_config_class =
log_format = [%%(asctime)s] {%%(filename)s:%%(lineno)d} %%(levelname)s - %%(message)s
simple_log_format = %%(asctime)s %%(levelname)s - %%(message)s
executor = SequentialExecutor
sql_alchemy_conn = {actual value hidden}
sql_alchemy_pool_size = 5
sql_alchemy_pool_recycle = 3600
parallelism = 4
dag_concurrency = 2
dags_are_paused_at_creation = True
non_pooled_task_slot_count = 16
max_active_runs_per_dag = 1
load_examples = False
plugins_folder = /var/lib/airflow/plugins
fernet_key = {actual value hidden}
donot_pickle = False
dagbag_import_timeout = 30
task_runner = BashTaskRunner
default_impersonation =
security =
unit_test_mode = False
task_log_reader = file.task
enable_xcom_pickling = True
killed_task_cleanup_time = 60
[cli]
api_client = airflow.api.client.local_client
endpoint_url = http://localhost:8080
[api]
auth_backend = airflow.api.auth.backend.default
[operators]
default_owner = root
default_cpus = 1
default_ram = 512
default_disk = 512
default_gpus = 0
[webserver]
base_url = http://localhost:8080
web_server_host = 0.0.0.0
web_server_port = 8080
web_server_ssl_cert =
web_server_ssl_key =
web_server_worker_timeout = 120
worker_refresh_batch_size = 1
worker_refresh_interval = 60
secret_key = temporary_key
workers = 1
worker_class = sync
access_logfile = -
error_logfile = -
expose_config = False
authenticate = False
filter_by_owner = False
owner_mode = user
dag_default_view = tree
dag_orientation = LR
demo_mode = False
log_fetch_timeout_sec = 5
hide_paused_dags_by_default = False
page_size = 100
[email]
email_backend = airflow.utils.email.send_email_smtp
[smtp]
smtp_host = localhost
smtp_starttls = True
smtp_ssl = False
smtp_port = 25
smtp_mail_from = airflow#example.com
[celery]
...
[dask]
cluster_address = 127.0.0.1:8786
[scheduler]
job_heartbeat_sec = 120
scheduler_heartbeat_sec = 120
run_duration = -1
min_file_process_interval = 0
dag_dir_list_interval = 300
print_stats_interval = 300
child_process_log_directory = /var/lib/airflow/logs/scheduler
scheduler_zombie_task_threshold = 900
catchup_by_default = True
max_tis_per_query = 0
statsd_on = False
statsd_host = localhost
statsd_port = 8125
statsd_prefix = airflow
max_threads = 1
authenticate = False
[ldap]
...
[mesos]
...
[kerberos]
...
[github_enterprise]
...
[admin]
hide_sensitive_variable_fields = True
Adding ls -hal
root#ubuntu:/var/lib/airflow# ls -hal /var
total 52K
drwxr-xr-x 13 root root 4.0K Jun 3 11:58 .
root#ubuntu:/var/lib/airflow# ls -hal /var/lib
total 164K
drwxr-xr-x 42 root root 4.0K Jun 10 19:00 .
root#ubuntu:/var/lib/airflow# ls -hal
total 40K
drwxr-xr-x 4 airflow airflow 4.0K Jun 11 06:41 .
drwxr-xr-x 42 root root 4.0K Jun 10 19:00 ..
-rw-r--r-- 1 airflow airflow 13K Jun 11 06:41 airflow.cfg
-rw-r--r-- 1 airflow airflow 579 Jun 10 19:00 airflow.conf
drwxr-xr-x 2 airflow airflow 4.0K Jun 10 21:27 dags
drwxr-xr-x 4 airflow airflow 4.0K Jun 10 20:31 logs
-rw-r--r-- 1 airflow airflow 1.7K Jun 10 19:00 unittests.cfg
root#ubuntu:/var/lib/airflow# ls -hal dags/
total 16K
drwxr-xr-x 2 airflow airflow 4.0K Jun 10 21:27 .
drwxr-xr-x 4 airflow airflow 4.0K Jun 11 06:41 ..
-rw-r--r-- 1 airflow airflow 3.4K Jun 10 21:26 dag_file.py
-rw-r--r-- 1 airflow airflow 1.7K Jun 10 21:27 dag_file.pyc
and contents of dag_file.py:
import airflow
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
default_args = {
'owner': 'root',
'run_as': 'root',
'depends_on_past': True,
'start_date': datetime(2018, 2, 20),
'email': ['myemail#gmail.com'],
'email_on_failure': False,
'email_on_retry': False,
'retries': 1,
'retry_delay': timedelta(minutes=5),
'end_date': datetime(2018, 11, 15),
}
env = {
'PSQL': '{obscured}',
'PATH': '/anaconda/bin/:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin',
'PWD': '/home/ubuntu/{obs1}/',
'HOME': '/home/ubuntu',
'PYTHONPATH': '/home/ubuntu/{obs1}',
}
dag = DAG(
'dag_name',
default_args=default_args,
description='',
schedule_interval=timedelta(days=1))
t1 = BashOperator(
env=env,
task_id='dag_file',
bash_command='export PYTHONPATH=/home/ubuntu/{obs1} && /anaconda/bin/ipython $PYTHONPATH/{obs2}/{obs3}.py {{ ds }}',
dag=dag)
And I remind you that this runs correctly as airflow, root, and ubuntu: airflow run dag_name dag_file 2018-03-17T00:00:00 --job_id 55 --raw -sd DAGS_FOLDER/dag_file.py
It looks like python version mismatch, edit you .bashrc with proper python version and run:
source .bashrc
This will resolve you issue.
For my case we are usingexport PATH="/opt/miniconda3/bin":$PATH
Also to check how I can to this:
/opt/miniconda3/bin/python /opt/miniconda3/bin/airflow
This is how I used to run the airflow.
On Airflow v1.10.0 you just specify the filepath, without the space at the end anymore.
Example:
compact_output_task = BashOperator(**{
'task_id': 'compact_output',
'bash_command': './compact_output.sh',
'xcom_push': True,
})
Systemd EnvironmentFile won't expand the variable inside it so your PATH will only look at /anaconda/bin if you just want to extend your PATH it's better to use
ExecStart=/bin/bash -c 'PATH=/path/to/venv/bin/:$PATH exec /path/to/airflow scheduler
this solved my issue with No such file or directory, because airflow couldn't find the binary that I was calling inside my bash operator.

Bad HTTP response returned from the server. Code 500

I have a problem to use pywinrm on linux, to get a PowerShell Session.
I read several posts and questions on sites about that. But any that can solve my question.
The error is in the Kerberos autentication. This is my krb5.conf:
0 [libdefaults]
1 default_realm = DOMAIN.COM.BR
2 ticket_lifetime = 24000
3 clock-skew = 300
4 dns_lookup_kdc = true
5
6 # [realms]
7 # LABCORP.CAIXA.GOV.BR = {
8 # kdc = DOMAIN.COM.BR
9 # kdc = DOMAIN.COM.BR
10 # admin_server = DOMAIN.COM.BR
11 # default_domain = DOMAIN.COM.BR
12 # }
13
14 [logging]
15
16 default = FILE:/var/log/krb5libs.log
17 kdc = FILE:/var/log/krb5kdc.log
18 admin_server = FILE:/var/log/kadmind.log
19
20 # [domain_realm]
21 # .DOMAIN.COM.BR = DOMAIN.COM.BR
22 # server.com = DOMAIN.COM.BR
My /etc/resolv.conf is:
search DOMAIN.COM.BR
nameserver IP
And my python code is:
import winrm
s = winrm.Session(
'DOMAIN.COM.BR ',
'transport='kerberos',
auth=('my_active_directory_user', 'my_active_directory_password'),
server_cert_validation='ignore')
r = s.run_cmd('ipconfig', ['/all'])
And the server return this error:
winrm.exceptions.WinRMTransportError: ('http', 'Bad HTTP response returned from server. Code 500')
The port of the server is open. I see with nmap:
5985/tcp open wsman
I can ping and resolv the name of the server:
$ ping DOMAIN.COM.BR
PING DOMAIN.COM.BR (IP) 56(84) bytes of data.
64 bytes from IP: icmp_seq=2 ttl=127 time=0.410 ms
64 bytes from IP: icmp_seq=2 ttl=127 time=0.410 ms
I can use kinit without problem to get the ticket:
$ kinit my_active_directory_user#DOMAIN.COM.BR
And, list the ticket:
$ klist
Ticket cache: FILE:/tmp/krb5cc_1000
Default principal: my_active_directory_user#DOMAIN.COM.BR
Valid starting Expires Service principal
05-09-2017 10:23:52 05-09-2017 17:03:50 krbtgt/DOMAIN.COM.BR #DOMAIN.COM.BR
What kind of problem is that?
Other solution is to add this line with allow_weak_crypto in your krb5.conf file:
[libdefaults]
***
allow_weak_crypto = true
***

python + run system command with variables

I need to run system command from python
I have python - version - Python 2.4.3
I try the following , in this example ls -ltr | grep Aug
#!/usr/bin/python
import commands
Month = "Aug"
status,output = commands.getstatusoutput(" ls -ltr | grep Month " )
print output
how to insert the Month variable in the command ?
so grep will do that
| grep Aug
I try this also
status,output = commands.getstatusoutput( " ls -ltr | grep {} ".format(Month) )
but I get the following error
Traceback (most recent call last):
File "./stamm.py", line 14, in ?
status,output = commands.getstatusoutput( " ls -ltr | grep {} ".format(Month) )
AttributeError: 'str' object has no attribute 'format'
import commands
Month = "Aug"
status,output = commands.getstatusoutput(" ls -ltr | grep '" + Month + "'")
print output
Or a couple other possibilites are:
status,output = commands.getstatusoutput("ls -ltr | grep '%s'" % Month)
or
status,output = commands.getstatusoutput(" ls -ltr | grep \"" + Month + "\"")
You don't need to run the shell, there is subprocess module in Python 2.4:
#!/usr/bin/env python
from subprocess import Popen, PIPE
Month = "Aug"
grep = Popen(['grep', Month], stdin=PIPE, stdout=PIPE)
ls = Popen(['ls', '-ltr'], stdout=grep.stdin)
output = grep.communicate()[0]
statuses = [ls.wait(), grep.returncode]
See How do I use subprocess.Popen to connect multiple processes by pipes?
Note: you could implement it in pure Python:
#!/usr/bin/env python
import os
from datetime import datetime
def month(filename):
return datetime.fromtimestamp(os.path.getmtime(filename)).month
Aug = 8
files = [f for f in os.listdir('.') if month(f) == Aug]
print(files)
See also, How do you get a directory listing sorted by creation date in python?

stdout and stderror from fabric task wrapped in celery

im running a simple task, triggered from a django view:
task = mock_deploy.delay()
mock_deploy is defined as:
from celery.decorators import task as ctask
from project.fabscripts.task.mock import *
#ctask(name="mock_deploy")
def mock_deploy():
print "hi form celery task b4 mockdeploy 1234"
output = execute(mock_deploy2)
return "out: %s" % (output)
And the fabric task itself is defined as:
#task
def mock_deploy2():
lrun("ls -l /")
lrun("ifconfig eth0")
# i need to get the full output from those commands and save them to db
And now... I was trying to substitute stdout, overwriting fabric execute function:
def execute(task):
output = StringIO()
error = StringIO()
sys.stdout = output
sys.stderr = error
task()
sys.stdout = sys.__stdout__
sys.stderr = sys.__stderr__
return (output.getvalue(), error.getvalue())
And I was trying to substitute stdout within fabric task. No matter what i did, the only output i was getting was a first line of "what fabric wants to do"
out: [localhost] local: ls -l /
Then, the whole output of the ls command was printed perfectly fine in celery log. Except for the missing one line of out: [localhost] local: ls -l / `9the one i managed to get as output)
[2012-06-14 21:33:56,587: DEBUG/MainProcess] TaskPool: Apply <function execute_and_trace at 0x36710c8> (args:('mock_deploy', '2a90d920-130a-4942-829b-87f4d5ebe80f', [], {}) kwargs:{'hostname': 's16079364', 'request': {'retries': 0, 'task': 'mock_deploy', 'utc': False, 'loglevel': 10, 'delivery_info': {'routing_key': u'celery', 'exchange': u'celery'}, 'args': [], 'expires': None, 'is_eager': False, 'eta': None, 'hostname': 's16079364', 'kwargs': {}, 'logfile': None, 'id': '2a90d920-130a-4942-829b-87f4d5ebe80f'}})
[2012-06-14 21:33:56,591: DEBUG/MainProcess] Task accepted: mock_deploy[2a90d920-130a-4942-829b-87f4d5ebe80f] pid:22214
hi form celery task b4 mockdeploy 1234
total 3231728
-rw-r--r-- 1 root root 3305551148 2012-06-13 14:43 dumpling.sql
drwxr-xr-x 2 root root 4096 2012-05-09 17:42 bin
drwxr-xr-x 4 root root 4096 2012-02-14 15:21 boot
drwxr-xr-x 2 root root 4096 2012-03-09 14:10 build
drwxr-xr-x 2 root root 4096 2010-05-11 19:58 cdrom
-rw------- 1 root root 2174976 2012-05-23 11:23 core
drwxr-xr-x 15 root root 4080 2012-06-11 12:55 dev
drwxr-xr-x 135 root root 12288 2012-06-14 21:15 etc
drwxr-xr-x 6 root root 77 2012-05-21 14:41 home
...
A horrible horrible workaround is wrapping up a fabric run command to add a "> /tmp/logfile.log" on each command, then when the task is finished ill retrieve the file with scp...
My question in short is how do i get the full output of a fabric task when its triggered with celery?
The following did the trick:
#ctask(name="mock_deploy")
def mock_deploy():
env.roledefs.update({'remote': ['root#1.1.1.1',]})
output = StringIO()
sys.stdout = output
execute(mock_deploy2)
sys.stdout = sys.__stdout__
return output.getvalue()

Categories

Resources