I have a csv file that i need to format first before i can send the data to zabbix, but i enconter a problem in 1 of the csv that i have to use.
This is a example of the part of the file that i have problems:
Some_String_That_ineed_01[ifcontainsCell01removehere],xxxxxxxxx02
Some_String_That_ineed_01vtcrmp01[ifcontainsCell01removehere]-[1],aass
so this is 2 lines from the file, the other lines i already treated.
i need to check if Cell01 is in the line
if 'Cell01' in h: do something.
i need to remove all the content beetwen the [ ] included this [] that contains the Cell01 word and leave only this:
Some_String_That_ineed_01,xxxxxxxxx02
Some_String_That_ineed_01vtcrmp01-[1],aass
else my script already treats quite easy. There must be a better way then what i think which is use h.split in the first [ then split again on the , then remove the content that i wanna then add what is left sums strings. since i cant use replace because i need this data([1]).
later on with the desired result i will add this to zabbix sender as the item and item key_. I already have the hosts,timestamps,values.
You should use a regexp (re module):
import re
s = "Some_String_That_ineed_01[ifcontainsCell01removehere],xxxxxxxxx02"
replaced = re.sub('\[.*Cell01.*\]', '', s)
print (replaced)
will return:
[root#somwhere test]# python replace.py
Some_String_That_ineed_01,xxxxxxxxx02
You can also experiment here.
Related
i am quite new to Python and i would like to ask the following:
Let's say for example i have the following .txt file:
USERNAME -- example.name
SERVER -- server01
COMPUTERNAME -- computer01
I would like to search through this document for the 3 keywords: USERNAME, SERVER and COMPUTERNAME and when I find these, I would like to extract their values, a.i "example.name", "server01" and "computer01" respectively for each line.
Is this possible? I have already tried for looking by line numbers, but I would much prefer searching by keywords instead.
Question 2: Somewhere in this .txt file exists a line with the keyword Adresses: which has multiple values but listed in different lines, like such:
Adresses:
6001:8000:1080:142::12
8002:2013:2380:110::53
9007:2013:2380:117::80
.
.
Would there be any way to get all of the listed addresses as a result, not just the first one? The number of said addresses is dynamic, so it may change in the future.
To this i have honestly no idea how to begin. I appreciate any kind of hints or pointing me in the right direction.
Thank you very much for your time and attention!
Like this:
with open("filename.txt") as f:
for x in f:
a = x.split(" -- ")
print(a[1])
If line with given value always starts with keyword you can try something like this
with open('file.txt', 'r') as file:
for line in file:
if line.startswith('keyword'):
keyword, value = line.split(' -- ')
and to gather all the addresses i'd initiate list of addresses beforehand, then add line
addresses.append(value)
inside of if statement
Your best friend for this kind of task will be str.split function. You can put your data in a dict which will map keywords to values :
data = {} # Create a new dict
with open('data.txt') as file: # Open the file containing the data
lines = file.read().split('\n') # Split the file in lines
for line in lines: # For each line
keyword, value = line.split(' -- ') # Extract keyword and value
data[keyword] = value # Put in the dict
Then, you can access your values with data['USERNAME'] for example. This method will work on any document containing a key-value association on each line (even if you have more than 3 keywords). However, it will not work if the same text file contains the addresses in the format you mentionned.
If you want to include the addresses in the same file, you'll need to adapt the code. You can for example check if splitted_line contains two elements (= key-value on the same line, like USERNAME) or only one (= key-value on multiple lines, like Addresses:). Then, you can store in a list all the different addresses, and bound this list to the data dict. It's not a problem to have a dict in the form :
{
'USERNAME': 'example.name',
'Addresses': ['6001:8000:1080:142::12', '8002:2013:2380:110::53']
}
While I can manipulate a CSV file with Python without an issue if it's strictly comma delimited, I'm running into a massive problem with this format I'm working with. It's comma delimited but the last column conists of a mesh of about six commans withiin the following figure:
"{""EvidenceDetails"": [{""MitigationString"": """", ""Criticality"": 2, ""Timestamp"": ""2018-05-07T13:51:02.000Z"", ""CriticalityLabel"": ""Suspicious"", ""EvidenceString"": ""1 sighting on 1 source: item. Most recent item: Item4: item. I've never seen this IP before. Most recent link (May 7, 2018): link"", ""Rule"": ""Recent""}, {""MitigationString"": """", ""Criticality"": 2, ""Timestamp"": ""2018-05-09T05:32:41.316Z"", "etc"}]}"
The other columns are standard comma separation, but this one column is a mess. I need to only pull out the timestamps' YYYY-MM-DD; nothing else. I can't seem to figure out a way to strip out the unnecessary characters, however.
Any suggestions? I'm working with Python specifically, but if there's something else I should look to, let me know!
Thanks!
Rather than splitting/stripping, it may be easier to use a regular expression to extract the datestamps you want directly.
Here is an example with the line you provided in your question:
import re
pattern_to_use = "[0-9]{4}-[0-9]{2}-[0-9]{2}"
string_to_search = """""{""EvidenceDetails"": [{""MitigationString"": """", ""Criticality"": 2, ""Timestamp"": ""2018-05-07T13:51:02.000Z"", ""CriticalityLabel"": ""Suspicious"", ""EvidenceString"": ""1 sighting on 1 source: item. Most recent item: Item4: item. I've never seen this IP before. Most recent link (May 7, 2018): link"", ""Rule"": ""Recent""}, {""MitigationString"": """", ""Criticality"": 2, ""Timestamp"": ""2018-05-09T05:32:41.316Z"", "etc"}]}"""""
print(re.findall(pattern, string_to_search))
This will print an array containing the datestamps, in the order they appeared in the string (i.e, ['2018-05-07', '2018-05-09'])
See The Python 3 Docs for more information on regular expressions.
You're looking at JSON format, so try using the json module:
import json
# if data is in a file
with open('your filename here','r') as f:
data = json.load(f)
# if data is stored in a string variable
data = json.loads(stringvar)
The data variable should now contain your data in a more easily accessible format.
I am having data as follows,
data['url']
http://hostname.com/aaa/uploads/2013/11/a-b-c-d.jpg https://www.aaa.com/
http://hostname.com/bbb/uploads/2013/11/e-f-g-h.gif https://www.aaa.com/
http://hostname.com/ccc/uploads/2013/11/e-f-g-h.png http://hostname.com/ccc/uploads/2013/11/a-a-a-a.html
http://hostname.com/ddd/uploads/2013/11/w-e-r-t.ico
http://hostname.com/ddd/uploads/2013/11/r-t-y-u.aspx https://www.aaa.com/
http://hostname.com/bbb/uploads/2013/11/t-r-w-q.jpeg https://www.aaa.com/
I want to find out the formats such as .jpg, .gif, .png, .ico, .aspx, .html, .jpeg and parse it out backwards until it finds a "/". Also I want to check for several occurance all through the string. My output should be,
data['parsed']
a-b-c-d
e-f-g-h
e-f-g-h a-a-a-a
w-e-r-t
r-t-y-u
t-r-w-q
I am thinking instead of writing individual commands for each of the formats, is there a way to write everything under a single command.
Can anybody help me in writing for theses commands? I am new to regex and any help would be appreciated.
this builds a list of name to extension pairs
import re
results = []
for link in data:
matches = re.search(r'/(\w-\w-\w-\w)\.(\w{2,})\b', link)
results.append((matches.group(1), matches.group(2)))
This pattern returns the file names. I have just used one of your urls to demonstrate, for more, you could simply append the matches to a list of results:
import re
url = "http://hostname.com/ccc/uploads/2013/11/e-f-g-h.png http://hostname.com/ccc/uploads/2013/11/a-a-a-a.html"
p = r'((?:[a-z]-){3}[a-z]).'
matches = re.findall(p, url)
>>> print('\n'.join(matches))
e-f-g-h
a-a-a-a
There is the assumption that the urls all have the general form you provided.
You might try this:
data['parse'] = re.findall(r'[^/]+\.[a-z]+ ',data['url'])
That will pick out all of the file names with their extensions. If you want to remove the extensions, the code above returns a list which you can then process with list comprehension and re.sub like so:
[re.sub('\.[a-z]+$','',exp) for exp in data['parse']]
Use the .join function to create a string as demonstrated in Totem's answer
I am basically trying to continue this unanswered question about parsing a fortigate config file.
Reading a fortigate configuration file with Python
The root problem is that this config contains a number of records like this,
edit 1
set srcintf "port26"
set dstintf "port25"
set srcaddr "all"
set dstaddr "all"
set action accept
set utm-status enable
set schedule "always"
set service "ANY"
set av-profile "default"
set nat enable
set central-nat enable
next
I would like to get the output for each acl on a single line so I can import them into a CSV. The problem is that each record can have a variable number of lines, and the indentation shows subsections of the preceding line. The other post does get some of it correctly, but it doesn't handle the indentation. I have come up with some workarounds that replace white spaces with arbitrary characters, but I didn't know if there was a method to read the number tabs/whitespaces and use that to indicate positioning.
Thanks
So I have managed to read your text and turn it into a dictionary in python. It is pretty simple. You basically have to do something along the lines of:
conFigFile=open('./config.txt')
data=dict()
record=0
for line in conFigFile:
if line.find('edit')>=0:
record=int(line.replace('edit',''))
data[record]={}
if line.find('set')>=0:
line=line.replace('set','')
line=line.strip()
print line
key,val=line.split(' ')
data[record][key]=val
conFigFile.close()
This will produce a dictionary which will then allow you to make calls such as:
>>> data[1]['nat']
'enable'
>>> data[1].keys()
['nat', 'service', 'schedule', 'central-nat', 'srcaddr', 'av-profile', 'dstintf', 'srcintf', 'action', 'dstaddr', 'utm-status']
So now it is possible to generate a csv file:
csvFile=open('./data.csv','w')
records=data.keys()
for record in records:
values=data[record].keys()
valList=['Record',str(record)]
for val in values:
valList.append(val)
valList.append(data[record][val])
csvFile.write(",".join(valList))
csvFile.close()
Which produces the csv file:
Record,1,nat,enable,service,"ANY",schedule,"always",central-nat,enable,srcaddr,"all",av-profile,"default",dstintf,"port25",srcintf,"port26",action,accept,dstaddr,"all",utm-status,enable
If you really want to count the spaces before the line, you can do something like the following:
>>> a=' test: one '
>>> a.count(' ') #count all spaces
11
>>> (len(a) - len(a.lstrip())) #count leading spaces
5
I am trying to find chinesse words in two differnet files, but It didn't work so I tried to search for the words in the same file I get them from, but it seems it doesn't find it neither? how is it possible?
chin_split = codecs.open("CHIN_split.txt","r+",encoding="utf-8")
used this for the regex code.
import re
for n in re.findall(ur'[\u4e00-\u9fff]+',chin_split.read()):
print n in re.findall(ur'[\u4e00-\u9fff]+',chin_split.read())
how comes I get only falses printed???
FYI I tried to do this and it works:
for x in [1,2,3,4,5,6,6]:
print x in [1,2,3,4,5,6,6]
BTW
chin_split contains words in English Hebrew and Chinese
some lines from chin_split.txt:
he daodan 核导弹 טיל גרעיני
hedantou 核弹头 ראש חץ גרעיני
helu 阖庐 "ביתו, מעונו
helu 阖庐 שם מלך וו בתקופת ה'אביב והסתיו'"
huiwu 会晤 להיפגש עם
You are reading a file descriptor many times and that is wrong.
The first chin_split.read() will yield all the content but the others (inside the loop) will just get an empty string.
That loop makes no sense, but if you want to keep it, save the file content in a variable first.