Python wildcard search - python

I have a Lambda python function that I inherited which searches and reports on installed packages on EC2 instances. It pulls this information from SSM Inventory where the results are output to an S3 bucket. All of the installed packages have specific names until now. Now we need to report on Palo Alto Cortex XDR. The issue I'm facing is that this product includes the version number in the name and we have different versions installed. If I use the exact name (i.e. Cortex XDR 7.8.1.11343) I get reporting on that particular version but not others. I want to use a wild card to do this. I import regex (import re) on line 7 and then I change line 71 to xdr=line['Cortex*']) but it gives me the following error. I'm a bit new to Python and coding so any explanation as to what I'm doing wrong would be helpful.
File "/var/task/SoeSoftwareCompliance/RequiredSoftwareEmail.py", line 72, in build_html
xdr=line['Cortex*'])
import configparser
import logging
import csv
import json
from jinja2 import Template
import boto3
import re
# config
config = configparser.ConfigParser()
config.read("config.ini")
# logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# #TODO
# refactor common_csv_header so that we use one with variable
# so that we write all content to one template file.
def build_html(account=None,
ses_email_address=None,
recipient_email=None):
"""
:param recipient_email:
:param ses_email_address:
:param account:
"""
account_id = account["id"]
account_alias = account["alias"]
linux_ec2s = []
windows_ec2s = []
ec2s_not_in_ssm = []
excluded_ec2s = []
# linux ec2s html
with open(f"/tmp/{account_id}_linux_ec2s_required_software_report.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
if line["platform-type"] == "Linux":
item = dict(id=line['instance-id'],
name=line['instance-name'],
ip=line['ip-address'],
ssm=line['amazon-ssm-agent'],
cw=line['amazon-cloudwatch-agent'],
ch=line['cloudhealth-agent'])
# skip compliant linux ec2s where are values are found
compliance_status = not all(item.values())
if compliance_status:
linux_ec2s.append(item)
# windows ec2s html
with open(f"/tmp/{account_id}_windows_ec2s_required_software_report.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
if line["platform-type"] == "Windows":
item = dict(id=line['instance-id'],
name=line['instance-name'],
ip=line['ip-address'],
ssm=line['Amazon SSM Agent'],
cw=line['Amazon CloudWatch Agent'],
ch=line['CloudHealth Agent'],
mav=line['McAfee VirusScan Enterprise'],
trx=line['Trellix Agent'],
xdr=line['Cortex*'])
# skip compliant windows ec2s where are values are found
compliance_status = not all(item.values())
if compliance_status:
windows_ec2s.append(item)
# ec2s not found in ssm
with open(f"/tmp/{account_id}_ec2s_not_in_ssm.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
item = dict(name=line['instance-name'],
id=line['instance-id'],
ip=line['ip-address'],
pg=line['patch-group'])
ec2s_not_in_ssm.append(item)
# display or hide excluded ec2s from report
display_excluded_ec2s_in_report = json.loads(config.get("settings", "display_excluded_ec2s_in_report"))
if display_excluded_ec2s_in_report == "true":
with open(f"/tmp/{account_id}_excluded_from_compliance.csv", "r") as fp:
lines = csv.DictReader(fp)
for line in lines:
item = dict(id=line['instance-id'],
name=line['instance-name'],
pg=line['patch-group'])
excluded_ec2s.append(item)
# pass data to html template
with open('templates/email.html') as file:
template = Template(file.read())
# pass parameters to template renderer
html = template.render(
linux_ec2s=linux_ec2s,
windows_ec2s=windows_ec2s,
ec2s_not_in_ssm=ec2s_not_in_ssm,
excluded_ec2s=excluded_ec2s,
account_id=account_id,
account_alias=account_alias)
# consolidated html with multiple tables
tables_html_code = html
client = boto3.client('ses')
client.send_email(
Destination={
'ToAddresses': [recipient_email],
},
Message={
'Body': {
'Html':
{'Data': tables_html_code}
},
'Subject': {
'Charset': 'UTF-8',
'Data': f'SOE | Software Compliance | {account_alias}',
},
},
Source=ses_email_address,
)
print(tables_html_code)

If I understand your problem correctly, you are getting a KeyError exception because Python does not support wildcards out of the box. A csv.DictReader creates a standard Python dictionary for each row in csv. Python's dictionary is just an associative array without pattern matching.
You can implement this by regex, though. If you have a dictionary line and you don't know the full name of a key you are looking for, you can solve it by re.search function.
line = {'Cortex XDR 7.8.1.11343': 'Some value you are looking for'}
val = next(v for k, v in line.items() if re.search('Cortex.+', k))
print(val) # 'Some value you are looking for'
Be aware that this assumes that a line dictionary contains at least one item that matches the 'Cortex.+' pattern and returns the first match. You would have to refactor this a bit to change this.

1. import os - missing in the code
2. def build_html(account=None -> When the account is pass with Nonetype and below error will thrown in account["id"] and account["alias"].
Ex:
Traceback (most recent call last):
File "C:\Users\pro\Documents\project\pywilds.py", line 134, in <module>
build_html(account=None)
File "C:\Users\pro\Documents\project\pywilds.py", line 33, in build_html
account_id = account["id"]
TypeError: 'NoneType' object is not subscriptable
I hope it helps..

Related

Saving file format after editing it with ConfigParser

i am using ConfigParser to write some modification in a configuration file, basically what i am doing is :
retrieve my urls from an api
write them in my config file
but after the edit, i noticed that the file format has changed :
Before the edit :
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
[[inputs.cpu]]
percpu = true
totalcpu = true
[[inputs.prometheus]]
urls= []
interval = "140s"
[inputs.prometheus.tags]
exp = "exp"
After the edit :
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
[[inputs.cpu]
percpu = true
totalcpu = true
[[inputs.prometheus]
interval = "140s"
response_timeout = "120s"
[inputs.prometheus.tags]
exp = "snmp"
the offset changed and all the comments that were in the file has been deleted, my code :
edit = configparser.ConfigParser(strict=False, allow_no_value=True, empty_lines_in_values=False)
edit.read("file.conf")
edit.set("[section]", "urls", str(urls))
print(edit)
# Write changes back to file
with open('file.conf', 'w') as configfile:
edit.write(configfile)
I have already tried : SafeConfigParser, RawConfigParser but it doesn't work.
when i do a print(edit.section()), here is what i get : ['global_tags', 'agent', '[inputs.cpu', , '[inputs.prometheus', 'inputs.prometheus.tags']
Is there any help please ?
Here's an example of a "filter" parser that retains all other formatting but changes the urls line in the agent section if it comes across it:
import io
def filter_config(stream, item_filter):
"""
Filter a "config" file stream.
:param stream: Text stream to read from.
:param item_filter: Filter function; takes a section and a line and returns a filtered line.
:return: Yields (possibly) filtered lines.
"""
current_section = None
for line in stream:
stripped_line = line.strip()
if stripped_line.startswith('['):
current_section = stripped_line.strip('[]')
elif not stripped_line.startswith("#") and " = " in stripped_line:
line = item_filter(current_section, line)
yield line
def urls_filter(section, line):
if section == "agent" and line.strip().startswith("urls = "):
start, sep, end = line.partition(" = ")
return start + sep + "hi there..."
return line
# Could be a disk file, just using `io.StringIO()` for self-containedness here
config_file = io.StringIO("""
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
# HELLO! THIS IS A COMMENT!
metric_buffer_limit = 100000
urls = ""
[other]
urls = can't touch this!!!
""")
for line in filter_config(config_file, urls_filter):
print(line, end="")
The output is
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 10000
# HELLO! THIS IS A COMMENT!
metric_buffer_limit = 100000
urls = hi there...
[other]
urls = can't touch this!!!
so you can see all comments and (mis-)indentation was preserved.
The problem is that you're passing brackets with the section name, which is unnecessary:
edit.set("[section]", "urls", str(urls))
See this example from the documentation:
import configparser
config = configparser.RawConfigParser()
# Please note that using RawConfigParser's set functions, you can assign
# non-string values to keys internally, but will receive an error when
# attempting to write to a file or when you get it in non-raw mode. Setting
# values using the mapping protocol or ConfigParser's set() does not allow
# such assignments to take place.
config.add_section('Section1')
config.set('Section1', 'an_int', '15')
config.set('Section1', 'a_bool', 'true')
config.set('Section1', 'a_float', '3.1415')
config.set('Section1', 'baz', 'fun')
config.set('Section1', 'bar', 'Python')
config.set('Section1', 'foo', '%(bar)s is %(baz)s!')
# Writing our configuration file to 'example.cfg'
with open('example.cfg', 'w') as configfile:
config.write(configfile)
But, anyway, it won't preserve the identation, nor will it support nested sections; you could try the YAML format, which does allow to use indentation to separate nested sections, but it won't keep the exact same indentation when saving, but, do you really need it to be the exact same? Anyway, there are various configuration formats out there, you should study them to see what fits your case better.

Running Google Cloud DocumentAI sample code on Python returned the error 503

I am trying the example from the Google repo:
https://github.com/googleapis/python-documentai/blob/HEAD/samples/snippets/quickstart_sample.py
I have an error:
metadata=[('x-goog-request-params', 'name=projects/my_proj_id/locations/us/processors/my_processor_id'), ('x-goog-api-client', 'gl-python/3.8.10 grpc/1.38.1 gax/1.30.0 gapic/1.0.0')]), last exception: 503 DNS resolution failed for service: https://us-documentai.googleapis.com/v1/
My full code:
from google.cloud import documentai_v1 as documentai
import os
# TODO(developer): Uncomment these variables before running the sample.
project_id= '123456789'
location = 'us' # Format is 'us' or 'eu'
processor_id = '1a23345gh823892' # Create processor in Cloud Console
file_path = 'document.jpg'
os.environ['GRPC_DNS_RESOLVER'] = 'native'
def quickstart(project_id: str, location: str, processor_id: str, file_path: str):
# You must set the api_endpoint if you use a location other than 'us', e.g.:
opts = {}
if location == "eu":
opts = {"api_endpoint": "eu-documentai.googleapis.com"}
client = documentai.DocumentProcessorServiceClient(client_options=opts)
# The full resource name of the processor, e.g.:
# projects/project-id/locations/location/processor/processor-id
# You must create new processors in the Cloud Console first
name = f"projects/{project_id}/locations/{location}/processors/{processor_id}:process"
# Read the file into memory
with open(file_path, "rb") as image:
image_content = image.read()
document = {"content": image_content, "mime_type": "image/jpeg"}
# Configure the process request
request = {"name": name, "raw_document": document}
result = client.process_document(request=request)
document = result.document
document_pages = document.pages
# For a full list of Document object attributes, please reference this page: https://googleapis.dev/python/documentai/latest/_modules/google/cloud/documentai_v1beta3/types/document.html#Document
# Read the text recognition output from the processor
print("The document contains the following paragraphs:")
for page in document_pages:
paragraphs = page.paragraphs
for paragraph in paragraphs:
print(paragraph)
paragraph_text = get_text(paragraph.layout, document)
print(f"Paragraph text: {paragraph_text}")
def get_text(doc_element: dict, document: dict):
"""
Document AI identifies form fields by their offsets
in document text. This function converts offsets
to text snippets.
"""
response = ""
# If a text segment spans several lines, it will
# be stored in different text segments.
for segment in doc_element.text_anchor.text_segments:
start_index = (
int(segment.start_index)
if segment in doc_element.text_anchor.text_segments
else 0
)
end_index = int(segment.end_index)
response += document.text[start_index:end_index]
return response
def main ():
quickstart (project_id = project_id, location = location, processor_id = processor_id, file_path = file_path)
if __name__ == '__main__':
main ()
FYI, on the Google Cloud website it stated that the endpoint is:
https://us-documentai.googleapis.com/v1/projects/123456789/locations/us/processors/1a23345gh823892:process
I can use the web interface to run DocumentAI so it is working. I just have the problem with Python code.
Any suggestion is appreciated.
I would suspect the GRPC_DNS_RESOLVER environment variable to be the root cause. Did you try with the corresponding line commented out? Why was it added in your code?

How to evaluate CDK `CfnOutput` (CloudFormation output) values?

I'm writing a tool that utilitizes CloudFormation outputs after a cdk deploy and then sets up the development environment with config files based on those outputs.
At the end of each core infrastructure component (auth, db, webapp, storage, etc.), I have a CfnOutput construct like the following:
cdk.CfnOutput(
self, 'UserPoolID',
value=self.user_pool.user_pool_id,
)
Which outputs something like
Stack.AuthUserPoolIDABC1234 = s1lvgk44ul23ahfd91p4rdngnf
My goal is to get that value (s1lvgk44ul23ahfd91p4rdngnf) into a configuration file config.js, along with other values from other CloudFormation outputs.
So I wrote a wrapper around CfnOutput like the following:
import os
def cfn_output(scope, prefix, name, value):
cdk.CfnOutput(
scope, name,
value=value,
)
# Save name and value to flat files so that we can read them in other processes
os.makedirs('.tmp', exist_ok=True)
with open(os.path.join('.tmp', f'{prefix}{name}.txt'), 'w') as f:
f.write(value)
And so I used it instead of CfnOutput like so:
cfn_output(
scope=self,
prefix='Auth',
name='UserPoolID',
value=self.user_pool.user_pool_id
)
When I run cdk synth, the file generated (.tmp/AuthUserPoolID.txt) has this content:
${Token[TOKEN.249]}
which is obviously not s1lvgk44ul23ahfd91p4rdngnf as a I expected.
Any solutions or workarounds to getting that token resolved into something usable, or perhaps a different solution altogether?
Instead I decided to use the SDK to get the evaluated outputs from the CloudFormation stack.
# Prepare
cloudformation = boto3.client('cloudformation')
stack_name = 'Stack'
# Get stack outputs
res = cloudformation.describe_stacks(StackName=stack_name)
outputs = res['Stacks'][0]['Outputs']
mp = {
'ApiURL': '',
'AuthUserPoolClientID': '',
'AuthUserPoolID': '',
'DatabaseName': '',
'StorageHostingBucketName': '',
'WebappURL': '',
}
# Parse stack output names
for output in outputs:
ok = output['OutputKey']
ov = output['OutputValue']
for k in mp:
if ok.startswith(k):
mp[k] = ov
# Generate config.js data
data = {
'endpoint': mp['ApiURL'],
'userPoolId': mp['AuthUserPoolID'],
'userPoolClientId': mp['AuthUserPoolClientID'],
}
json_data = json.dumps(data, separators=(',', ':'))
text = f'window.config={json_data}'
# Write ac.js
configjs = os.path.join(os.path.dirname(__file__), '../static/config.js')
with open(configjs, 'w') as f:
f.write(text)

Using jsonpath in api.ai in case we have already our json file to query in?

import json
def makeWebhookResult(req):
if req.get("result").get("action") != "Phdapp":
return {}
result = req.get("result")
parameters = result.get("parameters")
Progr = parameters.get("PhDsubjects")
time = parameters.get("PhdTime")
Levp = parameters.get("PhDDegLevp")
with open('Sheet1.json') as f:
data = f.read()
jsondata = json.loads(data)
match = jsonpath.jsonpath(jsondata,
'$.features[[?(#.ProgramName == Progr && #.Level == Levp && #.StartDate == time)]].UniversityName')
speech = "This is the universities you were looking for " + match
This is the part of my python code which have errors i can't figure it out, I have an intent with action which is "Phdapp" with three parameters that i need to use their values in my jsonpath querying from "sheet1.json" file in the same repository on GitHub in json format. But i can't get data from my intent neither accessing my son file for querying...is it because api.ai is not compatible with jsonpath or it is the problem of my code! or if there is a best to use which easier it can be my pleasure to know it. Thanks

Fetching language detection from Google api

I have a CSV with keywords in one column and the number of impressions in a second column.
I'd like to provide the keywords in a url (while looping) and for the Google language api to return what type of language was the keyword in.
I have it working manually. If I enter (with the correct api key):
http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=merde
I get:
{"responseData": {"language":"fr","isReliable":false,"confidence":6.213709E-4}, "responseDetails": null, "responseStatus": 200}
which is correct, 'merde' is French.
so far I have this code but I keep getting server unreachable errors:
import time
import csv
from operator import itemgetter
import sys
import fileinput
import urllib2
import json
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
#not working
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
# Deserialize JSON to Python objects
result_object = json.loads(result)
#Get the rows in the table, then get the second column's value
# for each row
return row in result_object
#not working
def retrieve_terms(seedterm):
print(seedterm)
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=%(seed)s'
url = url_template % {"seed": seedterm}
try:
with urllib2.urlopen(url) as data:
data = perform_request(seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
#terms = parse_result(result)
#print terms
print result
def main(argv):
filename = argv[1]
csvfile = open(filename, 'r')
csvreader = csv.DictReader(csvfile)
rows = []
for row in csvreader:
rows.append(row)
sortedrows = sorted(rows, key=itemgetter('impressions'), reverse = True)
keys = sortedrows[0].keys()
for item in sortedrows:
retrieve_terms(item['keywords'])
try:
outputfile = open('Output_%s.csv' % (filename),'w')
except IOError:
print("The file is active in another program - close it first!")
sys.exit()
dict_writer = csv.DictWriter(outputfile, keys, lineterminator='\n')
dict_writer.writer.writerow(keys)
dict_writer.writerows(sortedrows)
outputfile.close()
print("File is Done!! Check your folder")
if __name__ == '__main__':
start_time = time.clock()
main(sys.argv)
print("\n")
print time.clock() - start_time, "seconds for script time"
Any idea how to finish the code so that it will work? Thank you!
Try to add referrer, userip as described in the docs:
An area to pay special attention to
relates to correctly identifying
yourself in your requests.
Applications MUST always include a
valid and accurate http referer header
in their requests. In addition, we
ask, but do not require, that each
request contains a valid API Key. By
providing a key, your application
provides us with a secondary
identification mechanism that is
useful should we need to contact you
in order to correct any problems. Read
more about the usefulness of having an
API key
Developers are also encouraged to make
use of the userip parameter (see
below) to supply the IP address of the
end-user on whose behalf you are
making the API request. Doing so will
help distinguish this legitimate
server-side traffic from traffic which
doesn't come from an end-user.
Here's an example based on the answer to the question "access to google with python":
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from pprint import pprint
api_key, userip = None, None
query = {'q' : 'матрёшка'}
referrer = "https://stackoverflow.com/q/4309599/4279"
if userip:
query.update(userip=userip)
if api_key:
query.update(key=api_key)
url = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s' %(
urllib.urlencode(query))
request = urllib2.Request(url, headers=dict(Referer=referrer))
json_data = json.load(urllib2.urlopen(request))
pprint(json_data['responseData'])
Output
{u'confidence': 0.070496580000000003, u'isReliable': False, u'language': u'ru'}
Another issue might be that seedterm is not properly quoted:
if isinstance(seedterm, unicode):
value = seedterm
else: # bytes
value = seedterm.decode(put_encoding_here)
url = 'http://...q=%s' % urllib.quote_plus(value.encode('utf-8'))

Categories

Resources