python - How to extract strings from each line in text file?

python - How to extract strings from each line in text file? - python

I have a text file that detects the amount of monitors that are active.
I want to extract specific data from each line and include it in a list.
The text file looks like this:
[EnumerateDevices]: Enumerating Devices.
DISPLAY\LGD03D7\4&ACE0355&1&UID68092928 : Generic PnP Monitor
DISPLAY\ABCF206\4&ACE0355&1&UID51249920 : Generic PnP Monitor
//
// here can be more monitors...
//
2 matching device(s) found.
I need to get the number after the UID in the middle of the text : 68092929 , 51249920 ..
I thought of doing the next:
a. enter each line in text
b. see if "UID" string exist
c. if it exists : split (here I dot know how to do it.. split by (" ") or ("&")
Is there any good idea you can advise? I don't understand how can I get the numbers after the UID (if the next number is longer than the previous ones for example)
how can I get a command that does : ("If you see UID string, get all the data until you see first blank")
any idea?
Thanks

I would use a regular expresssion to extract the UID
e.g.
import re
regexp = re.compile('UID(\d+)')
file = """[EnumerateDevices]: Enumerating Devices.
DISPLAY\LGD03D7\4&ACE0355&1&UID68092928 : Generic PnP Monitor
DISPLAY\ABCF206\4&ACE0355&1&UID51249920 : Generic PnP Monitor
//
// here can be more monitors...
//
2 matching device(s) found."""
print re.findall(regexp, file)

Use regular expressions:
import re
p =re.compile(r'.*UID(\d+)')
with open('infile') as infile:
for line in infile:
m = p.match(line)
if m:
print m.groups[0]

You can use the split() method.
s = "hello this is a test"
words = s.split(" ")
print words
The output of the above snippet is a list containing: ['hello', 'this', 'is', 'a', 'test']
In your case, you can split on the substring "UID" and grab the second element in the list to get the number that you're looking for.
See docs here: https://docs.python.org/2/library/string.html#string.split

This is a bit esoteric but does the trick with some list comprehension:
[this.split("UID")[1].split()[0] for this in txt.split("\n") if "UID" in this]
the output is the list you are looking for I presume: ['68092928', '51249920']
Explanations:
split the text into rows (split("\n")
select only rows with UID inside (for this in ... if "UID" in this)
in the remaining rows, split using "UID".
You want to keep only one element after UID hence the [1]
The resulting string contains the id and some text separated by a space so, we use a second split(), defaulting to spaces.

>>> for line in s.splitlines():
... line = line.strip()
... if "UID" in line:
... tmp = line.split("UID")
... uid = tmp[1].split(':')[0]
... print "UID " + uid
...
UID 68092928
UID 51249920

You can use the find() method:
if line.find('UID') != -1:
print line[line.find('UID') + 2 :]
Docs https://docs.python.org/2/library/string.html#string.find

if you read the whole file at once, otherwise if line by line just change the first line to line.split()
for elem in file.split():
if 'UID' in elem:
print elem.split('UID')[1]
the split will have already stripped "junk" do each elem that contains the 'UID' string will be all set to int() or just print as a string

Related

python 3 parsing a semicolon separated very long string to remove each second element

I'm pretty new to python and are looking for a way to get the following result from a long string
reading in lines of a textfile where each line looks like this
; 2:55:12;PuffDG;66,81; Puff4OG;66,75; Puff3OG;35,38;
after dataprocessing the data shall be stored in another textfile with this data
short example
2:55:12;66,81;66,75;35,38;
the real string is much longer but always with the same pattern
; 2:55:12;PuffDG;66,81; Puff4OG;66,75; Puff3OG;35,38; Puff2OG;30,25; Puff1OG;29,25; PuffFB;23,50; ....
So this means remove leading semicolon
keep second element
remove third element
keep fourth element
remove fith element
keep sixth element
and so on
the number of elements can vary so I guess as a first step I have to parse the string to get the number of elements and then do some looping through the string and assign each part that shall be kept to a variable
I have tried some variations of the command .split() but with no success.
Would it be easier to store all elements in a list and then for-loop through the list keeping and dropping elements?
If Yes how would this look like so at the end I have stored a file with
lines like this
2:55:12 ; 66,81 ; 66,75 ; 35,38 ;
2:56:12 ; 67,15 ; 74;16 ; 39,15 ;
etc. ....
best regards Stefan

This solution works independently of the content between the semicolons
One line, though it's a bit messier:
result = ' ; '.join(string.split(';')[1::2])
Getting rid of lead semicolon:
Just slice it off!
string = string[2:]
Splitting by semicolon & every second element:
Given a string, we can split by semicolon:
arr = string.split(';')[1::2]
The [::2] means to slice out every second element, starting with index 1. This keeps all "even" elements (second, fourth, etcetera).
Resulting string
To produce the string result you want, simply .join:
result = ' ; '.join(arr)

A regex based solution, which operates on the original input:
inp = "; 2:55:12;PuffDG;66,81; Puff4OG;66,75; Puff3OG;35,38;"
output = re.sub(r'\s*[A-Z][^;]*?;', '', inp)[2:]
print(output)
This prints:
2:55:12;66,81;66,75;35,38;

This shows how to do it for one line of input if the same pattern repeats itself every time
input_str = "; 2:55:12;PuffDG;66,81; Puff4OG;66,75; Puff3OG;35,38;"
f = open('output.txt', 'w') # open text to write to
output_list = input_str.split(';')[1::2] # create list with numbers of interest
# write to file
for out in output_list:
f.write(f"{out.strip()} ; ")
# end line
f.write("\n")

thank you very much for the quick response. You are awesome.
Your solutions are very comact.
In the meantime I found another solution but this solution needs more lines of code
best regards Stefan
I'm not familiar with how to insert code as a code-section properly
So I add it as plain text
fobj = open(r"C:\Users\Stefan\AppData\Local\Programs\Python\Python38-32\Heizung_2min.log")
wobj = open(r"C:\Users\Stefan\AppData\Local\Programs\Python\Python38-32\Heizung_number_2min.log","w")
for line in fobj:
TextLine = fobj.readline()
print(TextLine)
myList = TextLine.split(';')
TextLine = ""
for index, item in enumerate(myList):
if index % 2 == 1:
TextLine += item
TextLine += ";"
TextLine += '\n'
print(TextLine)
wobj.write(TextLine)
fobj.close()
wobj.close()`

Get the full word(s) by knowing only just a part of it

I am searching through a text file line by line and i want to get back all strings that contains the prefix AAAXX1234. For example in my text file i have these lines
Hello my ID is [123423819::AAAXX1234_3412] #I want that(AAAXX1234_3412)
Hello my ID is [738281937::AAAXX1234_3413:AAAXX1234_4212] #I
want both of them(AAAXX1234_3413, AAAXX1234_4212)
Hello my ID is [123423819::XXWWF1234_3098] #I don't care about that
The code i have a just to check if the line starts with "Hello my ID is"
with open(file_hrd,'r',encoding='utf-8') as hrd:
hrd=hrd.readlines()
for line in hrd:
if line.startswith("Hello my ID is"):
#do something

Try this:
import re
with open(file_hrd,'r',encoding='utf-8') as hrd:
res = []
for line in hrd:
res += re.findall('AAAXX1234_\d+', line)
print(res)
Output:
['AAAXX1234_3412', 'AAAXX1234_3413', 'AAAXX1234_4212']

I’d suggest you to parse your lines and extract the information into meaningful parts. That way, you can then use a simple startswith on the ID part of your line. In addition, this will also let you control where you find these prefixes, e.g. in case the lines contains additional data that could also theoretically contain something that looks like an ID.
Something like this:
if line.startswith('Hello my ID is '):
idx_start = line.index('[')
idx_end = line.index(']', idx_start)
idx_separator = line.index(':', idx_start, idx_end)
num = line[idx_start + 1:idx_separator]
ids = line[idx_separator + 2:idx_end].split(':')
print(num, ids)
This would give you the following output for your three example lines:
123423819 ['AAAXX1234_3412']
738281937 ['AAAXX1234_3413', 'AAAXX1234_4212']
123423819 ['XXWWF1234_3098']
With that information, you can then check the ids for a prefix:
if any(ids, lambda x: x.startswith('AAAXX1234')):
print('do something')

Using regular expressions through the re module and its findall() function should be enough:
import re
with open('file.txt') as file:
prefix = 'AAAXX1234'
lines = file.read().splitlines()
output = list()
for line in lines:
output.extend(re.findall(f'{prefix}_[\d]+', line))

You can do it by findall with the regex r'AAAXX1234_[0-9]+', it will find all parts of the string that start with AAAXX1234_ and then grabs all of the numbers after it, change + to * if you want it to match 'AAAXX1234_' on it's own as well

Match a string which is few lines above another line where the first string was matched

So, I have this text file which is huge. I need to look for a string and when I match it, I need to go a few lines back(above the current line) and search for another string and extract some information from that line that contains the second string. How can I do this in Python using regex match?
I am trying to do something like this.
substr1 = re.compile("ACT",re.IGNORECASE)
substr2 = re.compile(vector,re.IGNORECASE)
try:
with open (filepath, 'rt') as in_file:
for linenum, line in enumerate(in_file):
if substr2.search(line) != None:
print(linenum,line)
# Code to trace back a few lines to look for substr1
break
except FileNotFoundError: # If the file not found,
print("pattern not found.") # print an error message.
It is kind of like I want to read it backward when I match the first string and look for the first occurrence of the second string. The number of lines varies and I cannot thus use the dequeue option I think. I am totally new to Python.
Any help is appreciated, thank you!
Am adding an example log file that I am reading.
X 123
X 1234
X 12345
Vector1
----
-----
-----
X 1231
X 12344
X 123456
vector a
vector b
vector c
vector d
-------
-------
Vector
----
-----
-----
X 1233
X 12345
X 123451
Vector2
String 1 : Vector
String 2 : X
Output should be X 123456

You do not need to backtrack. Instead, just search forward in a smarter manner. If you search for substr1 first, the only issue that could happen is that more occurrences of substr1 will be found before you find substr2. The way to handle that is to keep updating match of substr1 as you go.
From your description, it does not appear that you need regex at all. Instead, you appear to be looking for simple string containment tests.
substr1 = 'X'
substr2 = 'Vector'
with open (filepath, 'rt') as in_file:
matched = None
for linenum, line in enumerate(in_file, start=1):
if substr1 in line:
matched = line
elif matched and line == substr2:
# Process the second string
print(matched)
break
If you have whitespace at the end of your lines, as you do in the sample you give, you may want to use line.startswith(substr2) instead of line == substr2.
Minor fixes:
start=1 will make your line numbers start with 1, which is probably what you want.
If you want to compare against None, the proper way is is not None instead of !=. Additionally, regex.search returns a match object. It will always be truthy if a match occurs. The idiomatic way to check it is without even is not None.

Parsing paragraph out of text file in Python?

I am trying to parse certain paragraphs out of multiple text file and store them in list. All the text file have some similar format to this:
MODEL NUMBER: A123
MODEL INFORMATION: some info about the model
DESCRIPTION: This will be a description of the Model. It
could be multiple lines but an empty line at the end of each.
CONCLUSION: Sold a lot really profitable.
Now i can pull out the information where its one line, but am having trouble when i encounter something which is multiple line (like 'Description'). The description length is not known but i know at the end it would have an empty line (which would mean using '\n'). This is what i have so far:
import os
dir = 'Test'
DESCRIPTION = []
for files in os.listdir(dir):
if files.endswith('.txt'):
with open(dir + '/' + files) as File:
reading = File.readlines()
for num, line in enumerate(reading):
if 'DESCRIPTION:' in line:
Start_line = num
if len(line.strip()) == 0:
I don't know if its the best approach, but what i was trying to do with if len(line.strip()) == 0: is to create a list of blank lines and then find the first greater value than Start_Line. I saw this Bisect.
In the end i would like my data to be if i say print Description
['DESCRIPTION: Description from file 1',
'DESCRIPTION: Description from file 2',
'DESCRIPTION: Description from file 3,]
Thanks.

Regular expression. Think about it this way: you have a pattern that will allow you to cut any file into pieces you will find palatable: "newline followed by capital letter"
re.split is your friend
Take a string
"THE
BEST things
in life are
free
IS
YET
TO
COME"
As a string:
p = "THE\nBEST things\nin life are\nfree\nIS\nYET\nTO\nCOME"
c = re.split('\n(?=[A-Z])', p)
Which produces list c
['THE', 'BEST things\nin life are\nfree', 'IS', 'YET', 'TO', 'COME']
I think you can take it from there, as this would separate your files into each a list of strings with each string beings its own section, then from there you can find the "DESCRIPTION" element and store it, you see that you separate each section, including its subcontents by that re split. Important to note that the way I've set up the regex it recognies the PATTERN "newline and then Capital Letter" but CUTS after the newline, which is why it is outside the brackets.

Get a value from a string in python

Program Details:
I am writing a program for python that will need to look through a text file for the line:
Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.
Problem:
Then after the program has found that line, it will then store the line into an array and get the value 19.612545, from f = 19.612545.
Question:
I so far have been able to store the line into an array after I have found it. However I am having trouble as to what to use after I have stored the string to search through the string, and then extract the information from variable f. Does anyone have any suggestions or tips on how to possibly accomplish this?

Depending upon how you want to go at it, CosmicComputer is right to refer you to Regular Expressions. If your syntax is this simple, you could always do something like:
line = 'Found mode 1 of 12: EV= 1.5185449E+04, f= 19.612545, T= 0.050988.'
splitByComma=line.split(',')
fValue = splitByComma[1].replace('f= ', '').strip()
print(fValue)
Results in 19.612545 being printed (still a string though).
Split your line by commas, grab the 2nd chunk, and break out the f value. Error checking and conversions left up to you!

Using regular expressions here is maddness. Just use string.find as follows: (where string is the name of the variable the holds your string)
index = string.find('f=')
index = index + 2 //skip over = and space
string = string[index:] //cuts things that you don't need
string = string.split(',') //splits the remaining string delimited by comma
your_value = string[0] //extracts the first field
I know its ugly, but its nothing compared with RE.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python - How to extract strings from each line in text file? - python

Use regular expressions: import re p =re.compile(r'.*UID(\d+)') with open('infile') as infile: for line in infile: m = p.match(line) if m: print m.groups[0]

>>> for line in s.splitlines(): ... line = line.strip() ... if "UID" in line: ... tmp = line.split("UID") ... uid = tmp[1].split(':')[0] ... print "UID " + uid ... UID 68092928 UID 51249920

You can use the find() method: if line.find('UID') != -1: print line[line.find('UID') + 2 :] Docs https://docs.python.org/2/library/string.html#string.find

Related

python 3 parsing a semicolon separated very long string to remove each second element

Get the full word(s) by knowing only just a part of it

Match a string which is few lines above another line where the first string was matched

Parsing paragraph out of text file in Python?

Get a value from a string in python

Categories

Resources