I am making a webscraping tool which gets the amount of players on a game server.
At the moment the most efficient method of doing this is to use Requests and BS4, to write the HTML source to a txt file, then search that file for
" / "
Unfortunately my HTML contains two forward slashed with spaces either side, so I need to be able to do something like
"%d / %d"
So it only gets the one with the integer, unfortunately I do not know the values either side, I just need it to only pick the one an integer in it.
prange = list(range(0, 65))
searchfile = open("data.txt", "r")
for line in searchfile:
if " / " in line:
print (line)
searchfile.close()
Thanks in advance!
You can try using re to find required pattern:
>>> import re
>>> re.search( '(\d+)\s+/\s+(\d+)', 'dsdsd 111 / 222 dsdsds').groups()
('111', '222')
What you want is using regex to search for a specific pattern in your document.
re.search(r'(\d) / (\d)', your_text) will return all occurrences of X / Y where X and Y are 1-digit numbers. If you want more than one digit, you can take a look at the regex syntax, and write something like r'(\d+) / (\d+)'.
With your example, you should have:
prange = list(range(0, 65))
searchfile = open("data.txt", "r")
for line in searchfile:
m = re.search(r'(\d+ / \d+)', line)
if m:
print (line)
searchfile.close()
Related
I am searching through a text file line by line and i want to get back all strings that contains the prefix AAAXX1234. For example in my text file i have these lines
Hello my ID is [123423819::AAAXX1234_3412] #I want that(AAAXX1234_3412)
Hello my ID is [738281937::AAAXX1234_3413:AAAXX1234_4212] #I
want both of them(AAAXX1234_3413, AAAXX1234_4212)
Hello my ID is [123423819::XXWWF1234_3098] #I don't care about that
The code i have a just to check if the line starts with "Hello my ID is"
with open(file_hrd,'r',encoding='utf-8') as hrd:
hrd=hrd.readlines()
for line in hrd:
if line.startswith("Hello my ID is"):
#do something
Try this:
import re
with open(file_hrd,'r',encoding='utf-8') as hrd:
res = []
for line in hrd:
res += re.findall('AAAXX1234_\d+', line)
print(res)
Output:
['AAAXX1234_3412', 'AAAXX1234_3413', 'AAAXX1234_4212']
I’d suggest you to parse your lines and extract the information into meaningful parts. That way, you can then use a simple startswith on the ID part of your line. In addition, this will also let you control where you find these prefixes, e.g. in case the lines contains additional data that could also theoretically contain something that looks like an ID.
Something like this:
if line.startswith('Hello my ID is '):
idx_start = line.index('[')
idx_end = line.index(']', idx_start)
idx_separator = line.index(':', idx_start, idx_end)
num = line[idx_start + 1:idx_separator]
ids = line[idx_separator + 2:idx_end].split(':')
print(num, ids)
This would give you the following output for your three example lines:
123423819 ['AAAXX1234_3412']
738281937 ['AAAXX1234_3413', 'AAAXX1234_4212']
123423819 ['XXWWF1234_3098']
With that information, you can then check the ids for a prefix:
if any(ids, lambda x: x.startswith('AAAXX1234')):
print('do something')
Using regular expressions through the re module and its findall() function should be enough:
import re
with open('file.txt') as file:
prefix = 'AAAXX1234'
lines = file.read().splitlines()
output = list()
for line in lines:
output.extend(re.findall(f'{prefix}_[\d]+', line))
You can do it by findall with the regex r'AAAXX1234_[0-9]+', it will find all parts of the string that start with AAAXX1234_ and then grabs all of the numbers after it, change + to * if you want it to match 'AAAXX1234_' on it's own as well
I am trying to replace a number in a string with another number. For instance, I have the string "APU12_24F" and I want to add 7 to the second number to make it "APU12_31F".
Right now I am simply able to locate the number in which I'm interested by using string.split.
I can't figure out how to edit the new strings which this produces.
def main():
f=open("edita15888_debug.txt", "r")
fl = f.readlines()
for x in fl:
if ("APU12" in x):
list_string=split_string(x)
print(list_string);
return
def split_string_APU12(string):
# Split the string based on APU12_
list_string = string.split("APU12_")
return list_string
main()
The output for this makes sense as I'll get something like ['', 24F\n]. I just now need to change the 24 to 31 then put it back into the original string.
Feel free to let me know if there is a better approach to this. I'm very new to python and everything I can find online with the available search/replace functions doesn't seem to do what I'd need them to do. Thank you!
Assuming that pattern is _ + multiple digits you can replace it with regex
import re
re.sub(r"_(\d+)", lambda r: '_'+str(int(r.group(1)) + 7),'APU12_24F')
This isn't generalized because I'm not sure what the rest of the data looks like but maybe something like this should work:
def main():
f=open("edita15888_debug.txt", "r")
fl = f.readlines()
for x in fl:
if ("APU12" in x):
list_string=split_string_APU12(x)
list_string = int(list_string[1].split('F')[0]) + 7
list_string = "APU12_" + str(list_string)
print(list_string)
return
def split_string_APU12(string):
# Split the string based on APU12_
list_string = string.split("APU12_")
return list_string
main()
I'm assuming your strings will be of the format
APU12_##...F
(where ###... means a variable digits number, and F could be any letter, but just one). If so, you could do something like this:
# Notice the use of context managers
# I would recommend learning about this for working with files
with open('edita15888_debug.txt', 'r') as f:
fl = f.readlines()
new_strings = []
for line in fl:
beg, end = line.split('_')
# This splits the end part into number + character
number, char = int(end[:-1]), end[-1]
# Here goes your operation on the number
number += your_quantity # This may be your +7, for example
# Now joining back everything together
new_strings.append(beg + '_' + str(number) + char)
And this would yield you the same list of strings but with the numbers before the last letter modified as you need.
I hope this helps you!
I assumed you need to add seven to a number which goes after an underscore. I hope, this function will be helpful
import re
def add_seven_to_number_after_underscore_in_a_string(aString):
regex = re.compile(r'_(\d+)')
match = regex.search(aString)
return regex.sub('_' + str(int(match.group(1)) + 7), aString)
I'm new to the world of python and I'm trying to extract values from multiple text files. I can open up the files fine with a loop, but I'm looking for a straight forward way to search for a string and then return the value after it.
My results text files look like this
SUMMARY OF RESULTS
Max tip rotation =,-18.1921,degrees
Min tip rotation =,-0.3258,degrees
Mean tip rotation =,-7.4164,degrees
Max tip displacement =,6.9956,mm
Min tip displacement =,0.7467,mm
Mean tip displacement = ,2.4321,mm
Max Tsai-Wu FC =,0.6850
Max Tsai-Hill FC =,0.6877
So I want to be able to search for say 'Max Tsai-Wu =,' and it return 0.6850
I want to be able to search for the string as the position of each variable might change at a later date.
Sorry for posting such an easy question, just can't seem to find a straight forward robust way of finding it.
Any help would be greatly appreciated!
Matt
You can make use of regex:
import re
regexp = re.compile(r'Max Tsai-Wu.*?([0-9.-]+)')
with open('input.txt') as f:
for line in f:
match = regexp.match(line)
if match:
print match.group(1)
prints:
0.6850
UPD: getting results into the list
import re
regexp = re.compile(r'Max Tsai-Wu.*?([0-9.-]+)')
result = []
with open('input.txt') as f:
for line in f:
match = regexp.match(line)
if match:
result.append(match.group(1))
My favorite way is to test if the line starts with the desired text:
keyword = 'Max Tsai-Wu'
if line.startswith(keyword):
And then split the line using the commas and return the value
try:
return float(line.split(',')[1])
except ValueError:
# treat the error
You can use regular expression to find both name and value:
import re
RE_VALUE = re.compile('(.*?)\s*=,(.*?),')
def test():
line = 'Max tip rotation =,-18.1921,degrees'
rx = RE_VALUE.search(line)
if rx:
print('[%s] value: [%s]' % (rx.group(1), rx.group(2)))
test()
This way reading file line by line you can fill some dictionary.
My regex uses fact that value is between commas.
If the files aren't that big, you could simply do:
import re
files = [list, of, files]
for f in files:
with open(f) as myfile:
print re.search(r'Max Tsai-Wu.*?=,(.+)', myfile.read()).group(1)
I have a large file with several lines as given below.I want to read in only those lines which have the _INIT pattern in them and then strip off the _INIT from the name and only save the OSD_MODE_15_H part in a variable. Then I need to read the corresponding hex value, 8'h00 in this case, ans strip off the 8'h from it and replace it with a 0x and save in a variable.
I have been trying strip the off the _INIT,the spaces and the = and the code is becoming really messy.
localparam OSD_MODE_15_H_ADDR = 16'h038d;
localparam OSD_MODE_15_H_INIT = 8'h00
Can you suggest a lean and clean method to do this?
Thanks!
The following solution uses a regular expression (compiled to speed searching up) to match the relevant lines and extract the needed information. The expression uses named groups "id" and "hexValue" to identify the data we want to extract from the matching line.
import re
expression = "(?P<id>\w+?)_INIT\s*?=.*?'h(?P<hexValue>[0-9a-fA-F]*)"
regex = re.compile(expression)
def getIdAndValueFromInitLine(line):
mm = regex.search(line)
if mm == None:
return None # Not the ..._INIT parameter or line was empty or other mismatch happened
else:
return (mm.groupdict()["id"], "0x" + mm.groupdict()["hexValue"])
EDIT: If I understood the next task correctly, you need to find the hexvalues of those INIT and ADDR lines whose IDs match and make a dictionary of the INIT hexvalue to the ADDR hexvalue.
regex = "(?P<init_id>\w+?)_INIT\s*?=.*?'h(?P<initValue>[0-9a-fA-F]*)"
init_dict = {}
for x in re.findall(regex, lines):
init_dict[x.groupdict()["init_id"]] = "0x" + x.groupdict()["initValue"]
regex = "(?P<addr_id>\w+?)_ADDR\s*?=.*?'h(?P<addrValue>[0-9a-fA-F]*)"
addr_dict = {}
for y in re.findall(regex, lines):
addr_dict[y.groupdict()["addr_id"]] = "0x" + y.groupdict()["addrValue"]
init_to_addr_hexvalue_dict = {init_dict[x] : addr_dict[x] for x in init_dict.keys() if x in addr_dict}
Even if this is not what you actually need, having init and addr dictionaries might help to achieve your goal easier. If there are several _INIT (or _ADDR) lines with the same ID and different hexvalues then the above dict approach will not work in a straight forward way.
try something like this- not sure what all your requirements are but this should get you close:
with open(someFile, 'r') as infile:
for line in infile:
if '_INIT' in line:
apostropheIndex = line.find("'h")
clean_hex = '0x' + line[apostropheIndex + 2:]
In the case of "16'h038d;", clean_hex would be "0x038d;" (need to remove the ";" somehow) and in the case of "8'h00", clean_hex would be "0x00"
Edit: if you want to guard against characters like ";" you could do this and test if a character is alphanumeric:
clean_hex = '0x' + ''.join([s for s in line[apostropheIndex + 2:] if s.isalnum()])
You can use a regular expression and the re.findall() function. For example, to generate a list of tuples with the data you want just try:
import re
lines = open("your_file").read()
regex = "([\w]+?)_INIT\s*=\s*\d+'h([\da-fA-F]*)"
res = [(x[0], "0x"+x[1]) for x in re.findall(regex, lines)]
print res
The regular expression is very specific for your input example. If the other lines in the file are slightly different you may need to change it a bit.
I am trying to increment a version number using regex but I can't seem to get the hang of regex at all. I'm having trouple with the symbols in the string I am trying to read and change. The code I have so far is:
version_file = "AssemblyInfo.cs"
read_file = open(version_file).readlines()
write_file = open(version_file, "w")
r = re.compile(r'(AssemblyFileVersion\s*(\s*"\s*)(\S+))\s*"\s*')
for l in read_file:
m1 = r.match(l)
if m1:
VERSION_ID=map(int,m1.group(2).split("."))
VERSION_ID[2]+=1 # increment version
l = r.sub(r'\g<1>' + '.'.join(['%s' % (v) for v in VERSION_ID]), l)
write_file.write(l)
write_file.close()
The string I am trying to read and change is:
[assembly: AssemblyFileVersion("1.0.0.0")]
What I would like written to the file is:
[assembly: AssemblyFileVersion("1.0.0.1")]
So basically I want to increment the build number by one.
Can anyone help me fix my regualr expression. I seem to have trouble getting to grips with regular expression that have to get around symbols.
Thanks for any help.
If you specify the version as "1.0.0.*" then AFAIK it gets updated on each build automagically, at least if you're using Visual Studio.NET.
I'm not sure regex is your best bet, but one way of doing it would be this:
import re
# Don't bother matching everything, just the bits that matter.
pat = re.compile(r'AssemblyFileVersion.*\.(\d+)"')
# ... lines omitted which set up read_file, write_file etc.
for line in read_file:
m = pat.search(line)
if m:
start, end = m.span(1)
line = line[:start] + str(int(line[start:end]) + 1) + line[end:]
write_file.write(line)
Good luck with regex.
If I had to do the same, I'd convert the string to int by removing the dots, add one and convert back to string.
Well, I'd have also used a integer version number in the first place.