Parsing a file by keywords

Parsing a file by keywords - python

help with the parser of the file in python. There is a file with content:
============ | another peace mushroom enroll point trip sort notice hobby bacon exact slab | 0xb34a47885262f9d8673dc77de7b583961134f09fb03620b29d282c32ee6932be | 0xD0b2612a6eE3111114b43b25322C6F08A251D38D | Total: 47.62874464666479$ | | | Tokens eth: | 20.608732$ MANA | | Protocols cro: | 17.840052$ VVS Finance | 8.953779$ V3S Finance
============ | road vocal tissue faint wonder host forget canvas jump brisk latin trigger | 0x72e164aa187feaff7cb28a74b7ff800a0dfe916594c70f141069669e9df5a23b | 0xC7dFe558ed09F0f3b72eBb0A04e9d4e99af0bd0D | Total:
22.908481672796988$ | | | Tokens eth: | 22.376087$ SOS
============ | spend easy harsh benefit correct arch draft similar music car glad roof | 0xbce666bca3c862a2ee44651374f95aca677de16b4922c6d5e7d922cc0ac42a3d | 0x5870923a244f52fF2D119fbf5525421E32EC006e | Total: 9.077030269778557$ | | | Tokens eth: | 8.942218$ SOS
============
The separators in the strings are the characters ============ and I need to parse these strings, and output only the data between characters where Total is greater than 20. I will be grateful for help!
From the code I tried this option, but I need the remaining values between the signs === provided that Total is greater than 20.
with open('total.txt') as file:
data_file = file.readlines()
# print(data_file)
for i in data_file:
replace_data = i.strip('| ')
replace_space = replace_data.strip('\n')
remove_whitespace = replace_space.strip('\n')
if 'Total:' in remove_whitespace:
print(remove_whitespace)
there all the values on a new line, an example in the photo.

Consume the file line by line. If a line starts with '=' then you're starting a new section in which case you need to check the list of lines you've previously built up and also check to see if a Total was found greater than 20.
Try this:
outlist = []
flag = False
def dump(list_, flag_):
if list_ and flag_:
print('\n'.join(list_))
return [], False
with open('total.txt') as file:
for line in map(str.strip, file):
if line.startswith('='):
outlist, flag = dump(outlist, flag)
else:
tokens = line.split()
if len(tokens) == 3 and tokens[1] == 'Total:':
try:
flag = float(tokens[2][:-1]) > 20.0
except ValueError:
pass
outlist.append(line)
dump(outlist, flag)

Related

Set all lines with the same quantity of spaces - even those lines with emojies on them

I'm self-learning Python and I'm doing a program using Python and Pytube for retrieve information for a YouTube video.
The information of the YouTube video to get is:
Title
Author
Description
I want to set this information in the following structure:
Simplified example - info extracted from YouTube video_id: kqtD5dpn9C8
┌──────────────────────────────────────────────────────────────┐
| |
| 📕 Get my FREE Python cheat sheet: http://bit_ly/2Gp80s6 |
| |
| Courses: https://codewithmosh.com |
| Twitter: https://twitter.com/moshhamedani |
| Facebook: https://www.facebook.com/programmingwithmosh/ |
| Blog: http://programmingwithmosh.com |
| |
| #Python, #MachineLearning, #WebDevelopment |
| |
| 📔 Python Exercises for Beginners: https://goo_gl/1XnQB1 |
| |
| ⭐ My Favorite Python Books |
| - Python Crash Course: https://amzn.to/2GqMdjG |
| |
| TABLE OF CONTENT |
| |
| 0:00:00 Introduction |
| 0:00:30 What You Can Do With Python |
| 0:57:43 Tuples |
| |
└──────────────────────────────────────────────────────────────┘
The problem I'm facing currently is as follows:
When formatting the "description" info, there are some lines that has emojis on them; the code I share below sets all lines the description has with equal size; however, due to the emoji, an additional "space" is shown and the format is visually broken.
Code:
# Line # 1 - without emojis:
item = "#Python, #MachineLearning, #WebDevelopment"
emptyLine = "| |"
# Difference between length of the line and the empty line.
# It is used for concatenate to the line in construction "spaces" and "|" char |.
diffChars = len(emptyLine)-len(item)
item = item + (" "*(diffChars-4)) + " |"
print("| " + item)
# ----------------------------
Result:
| #Python, #MachineLearning, #WebDevelopment |
# Line # 2 - with emoji:
item = "📔 Python Exercises for Beginners: https://goo_gl/1XnQB1"
emptyLine = "| |"
# Difference between length of the line and the empty line.
# It is used for concatenate to the line in construction "spaces" and "|" char |.
diffChars = len(emptyLine)-len(item)
if (diffChars % 2 == 0):
diffChars = diffChars - 1
#print("Qty characters difference (2): " + str(diffChars))
item = item + (" "*(diffChars-4)) + " |"
print("| " + item)
# ----------------------------
Result:
| 📔 Python Exercises for Beginners: https://goo_gl/1XnQB1 |
# Line # 3 - without emojis:
item = "0:25:59 Operator Precedence "
emptyLine = "| |"
# Difference between length of the line and the empty line.
# It is used for concatenate to the line in construction "spaces" and "|" char |.
diffChars = len(emptyLine)-len(item)
if (diffChars % 2 == 0):
diffChars = diffChars - 1
#print("Qty characters difference (2): " + str(diffChars))
item = item + (" "*(diffChars-4)) + " |"
print("| " + item)
Result:
| 0:25:59 Operator Precedence |
Results of the previous lines:
| #Python, #MachineLearning, #WebDevelopment |
| 📔 Python Exercises for Beginners: https://goo_gl/1XnQB1 | # <= line with diff. in size.
| 0:25:59 Operator Precedence |
As you can see in the results above, the line with the "📔" emoji shows at the end of the line a slight difference - it has an additional "space" - breaking the straight vertical line.
Is there any way to set all lines with the same quantity of spaces - even those lines with emojies on them?
PS: I'm using Google Colab (link to my book with the code so far - n.b. is in spanish) and w3schools.com and in both enviroments I have the same results. You can copy and paste the previous code in Google Colab or in the "try-it" feature of w3schools.com.

how to identify and print a pattern inside an ascii file in python 2?

I am trying to develop a program that can read patterns from a txt file using Python 2.x. This pattern is supposed to be a bug:
| |
###O
| |
And the pattern doesn't include the whitespaces.
So far I have come up with a way to open the txt file, read it and process the data inside of it but I can't think of a way to make Python understand this pattern as 1, instead of counting each character. I've tried regular expressions but it ended up showing an output similar to this:
| |
###O
| |
| |
###O
| |
| |
###O
| |
Instead of just saying how many of this pattern were detected inside the file, for example:
There were 3 occurrences.
Update: So far i got this
file = open('bug.txt', 'r')
data = file.read() #read content from file to a string
occurrences = data.count('| |\n\'###O\'\n| |\n')
print('Number of occurrences of the pattern:', occurrences)
But this is not working. The file itself has the patterns 3 times but with whitespaces in between, but the whitespace is not part of the pattern and when i try to paste the pattern from the file it breaks the lines, and if i correct the pattern to | | ###O | | it shows 0 occurrences because its not really the pattern.

It depends on how you store your ASCII data, but if you convert it to a string you can use the python .count() function.
For example:
# define string
ascii_string = "| | ###O | | | | ###O | | | | ###O | |"
pattern = "| | ###O | |"
count = ascii_string.count(pattern)
# print count in python 3+
print("Occurrences:", count)
# print count in python 2.7
print "Occurrences:", count
This will result in:
Occurrences: 3

>>> import re
>>> data = '''| |
... ###O
... | |
... | |
... ###O
... | |
... | |
... ###O
... | |'''
>>> result = re.findall('[ ]*\| \|\n[ ]*###O\n[ ]*\| \|', data)
>>> len(result)
3
>>>
Result being occurrences.
How to do it from a file:
import re
with open('some file.txt') as fd:
data = fd.read()
result = re.findall('[ ]*\| \|\n[ ]*###O\n[ ]*\| \|', data)
len(result)
Alternative way of doing it to accommodate for edit on OP:
>>> data = '''| |
... ###O
... | |
... | |
... ###O
... | |
... | |
... ###O
... | |
... | | ###O | |'''
>>> data.replace('\n', '').replace(' ', '').count('||###O||')
4
>>>

I solved the problem this way.
def somefuntion(file_name):
ascii_str = ''
with open(file_name, 'r') as reader:
for line in reader.readlines():
for character in line.replace('\n', '').replace(' ', ''):
ascii_str += str(ord(character))
return ascii_str
if __name__ == "__main__":
bug = somefuntion('bug.txt')
landscape = somefuntion('landscape.txt')
print(landscape.count(bug))

Only the last line of a multiline file / string is printed

I searched a bit on Stack Overflow and stumbled on different answers but nothing fitted for my situation...
I got a map.txt file like this:
+----------------------+
| |
| |
| |
| test |
| |
| |
| |
+------------------------------------------------+
| | |
| | |
| | |
| Science | Bibliothek |
| | |
| | |
| | |
+----------------------+-------------------------+
when I want to print it using this:
def display_map():
s = open("map.txt").read()
return s
print display_map()
it just prints me:
+----------------------+-------------------------+
When I try the same method with another text file like:
line 1
line 2
line 3
it works perfectly.
What I do wrong?

I guess this file uses the CR (Carriage Return) character (Ascii 13, or '\r') for newlines; on Windows and Linux this would just move the cursor back to column 1, but not move the cursor down to the beginning of a new line.
(Of course such line terminators would not survive copy-paste to Stack Overflow, which is why this cannot be replicated).
You can debug strange characters in a string with repr:
print(repr(read_map())
It will print out the string with all special characters escaped.
If you see \r in the repred string, you could try this instead:
def read_map():
with open('map.txt') as f: # with ensures the file is closed properly
return f.read().replace('\r', '\n') # replace \r with \n
Alternatively supply the U flag to open for universal newlines, which would convert '\r', '\r\n' and '\n' all to the \n upon reading despite the underlying operating system's conventions:
def read_map():
with open('map.txt', 'rU') as f:
return f.read()

String to Csv file using Python

I have the following string
string = "OGC Number | LT No | Job /n 9625878 | EPP3234 | 1206545/n" and continues on
I am trying to write it to a .CSV file where it will look like this:
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
where each newline in the string is a new row
where each "|" in the sting is a new column
I am having trouble getting the formatting.
I think I need to use:
string.split('/n')
string.split('|')
Thanks.
Windows 7, Python 2.6

Untested:
text="""
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454"""
import csv
lines = text.splitlines()
with open('outputfile.csv', 'wb') as fout:
csvout = csv.writer(fout)
csvout.writerow(lines[0]) # header
for row in lines[2:]: # content
csvout.writerow([col.strip() for col in row.split('|')])

If you are interested in using a third party module. Prettytable is very useful and has a nice set of features to deal with and print tabular data.

EDIT: Oops, I missunderstood your question!
The code below will use two regular expressions to do the modifications.
import re
str="""OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
"""
# just setup above
# remove all lines with at least 4 dashes
str=re.sub( r'----+\n', '', str )
# replace all pipe symbols with their
# surrounding spaces by single semicolons
str=re.sub( r' +\| +', ';', str )
print str

Python -- how to read and change specific fields from file? (specifically, numbers)

I just started learning python scripting yesterday and I've already gotten stuck. :(
So I have a data file with a lot of different information in various fields.
Formatted basically like...
Name (tab) Start# (tab) End# (tab) A bunch of fields I need but do not do anything with
Repeat
I need to write a script that takes the start and end numbers, and add/subtract a number accordingly depending on whether another field says + or -.
I know that I can replace words with something like this:
x = open("infile")
y = open("outfile","a")
while 1:
line = f.readline()
if not line: break
line = line.replace("blah","blahblahblah")
y.write(line + "\n")
y.close()
But I've looked at all sorts of different places and I can't figure out how to extract specific fields from each line, read one field, and change other fields. I read that you can read the lines into arrays, but can't seem to find out how to do it.
Any help would be great!
EDIT:
Example of a line from the data here: (Each | represents a tab character)
| |
V V
chr21 | 33025905 | 33031813 | ENST00000449339.1 | 0 | **-** | 33031813 | 33031813 | 0 | 3 | 1835,294,104, | 0,4341,5804,
chr21 | 33036618 | 33036795 | ENST00000458922.1 | 0 | **+** | 33036795 | 33036795 | 0 | 1 | 177, | 0,
The second and third columns (indicated by arrows) would be the ones that I'd need to read/change.

You can use csv to do the splitting, although for these sorts of problems, I usually just use str.split:
with open(infile) as fin,open('outfile','w') as fout:
for line in fin:
#use line.split('\t'3) if the name of the field can contain spaces
name,start,end,rest = line.split(None,3)
#do something to change start and end here.
#Note that `start` and `end` are strings, but they can easily be changed
#using `int` or `float` builtins.
fout.write('\t'.join((name,start,end,rest)))
csv is nice if you want to split lines like this:
this is a "single argument"
into:
['this','is','a','single argument']
but it doesn't seem like you need that here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing a file by keywords - python

Related

Set all lines with the same quantity of spaces - even those lines with emojies on them

how to identify and print a pattern inside an ascii file in python 2?

Only the last line of a multiline file / string is printed

String to Csv file using Python

Python -- how to read and change specific fields from file? (specifically, numbers)

Categories

Resources