Only the last line of a multiline file / string is printed - python

I searched a bit on Stack Overflow and stumbled on different answers but nothing fitted for my situation...
I got a map.txt file like this:
+----------------------+
| |
| |
| |
| test |
| |
| |
| |
+------------------------------------------------+
| | |
| | |
| | |
| Science | Bibliothek |
| | |
| | |
| | |
+----------------------+-------------------------+
when I want to print it using this:
def display_map():
s = open("map.txt").read()
return s
print display_map()
it just prints me:
+----------------------+-------------------------+
When I try the same method with another text file like:
line 1
line 2
line 3
it works perfectly.
What I do wrong?

I guess this file uses the CR (Carriage Return) character (Ascii 13, or '\r') for newlines; on Windows and Linux this would just move the cursor back to column 1, but not move the cursor down to the beginning of a new line.
(Of course such line terminators would not survive copy-paste to Stack Overflow, which is why this cannot be replicated).
You can debug strange characters in a string with repr:
print(repr(read_map())
It will print out the string with all special characters escaped.
If you see \r in the repred string, you could try this instead:
def read_map():
with open('map.txt') as f: # with ensures the file is closed properly
return f.read().replace('\r', '\n') # replace \r with \n
Alternatively supply the U flag to open for universal newlines, which would convert '\r', '\r\n' and '\n' all to the \n upon reading despite the underlying operating system's conventions:
def read_map():
with open('map.txt', 'rU') as f:
return f.read()

Related

Parsing a file by keywords

help with the parser of the file in python. There is a file with content:
============ | another peace mushroom enroll point trip sort notice hobby bacon exact slab | 0xb34a47885262f9d8673dc77de7b583961134f09fb03620b29d282c32ee6932be | 0xD0b2612a6eE3111114b43b25322C6F08A251D38D | Total: 47.62874464666479$ | | | Tokens eth: | 20.608732$ MANA | | Protocols cro: | 17.840052$ VVS Finance | 8.953779$ V3S Finance
============ | road vocal tissue faint wonder host forget canvas jump brisk latin trigger | 0x72e164aa187feaff7cb28a74b7ff800a0dfe916594c70f141069669e9df5a23b | 0xC7dFe558ed09F0f3b72eBb0A04e9d4e99af0bd0D | Total:
22.908481672796988$ | | | Tokens eth: | 22.376087$ SOS
============ | spend easy harsh benefit correct arch draft similar music car glad roof | 0xbce666bca3c862a2ee44651374f95aca677de16b4922c6d5e7d922cc0ac42a3d | 0x5870923a244f52fF2D119fbf5525421E32EC006e | Total: 9.077030269778557$ | | | Tokens eth: | 8.942218$ SOS
============
The separators in the strings are the characters ============ and I need to parse these strings, and output only the data between characters where Total is greater than 20. I will be grateful for help!
From the code I tried this option, but I need the remaining values between the signs === provided that Total is greater than 20.
with open('total.txt') as file:
data_file = file.readlines()
# print(data_file)
for i in data_file:
replace_data = i.strip('| ')
replace_space = replace_data.strip('\n')
remove_whitespace = replace_space.strip('\n')
if 'Total:' in remove_whitespace:
print(remove_whitespace)
there all the values on a new line, an example in the photo.
Consume the file line by line. If a line starts with '=' then you're starting a new section in which case you need to check the list of lines you've previously built up and also check to see if a Total was found greater than 20.
Try this:
outlist = []
flag = False
def dump(list_, flag_):
if list_ and flag_:
print('\n'.join(list_))
return [], False
with open('total.txt') as file:
for line in map(str.strip, file):
if line.startswith('='):
outlist, flag = dump(outlist, flag)
else:
tokens = line.split()
if len(tokens) == 3 and tokens[1] == 'Total:':
try:
flag = float(tokens[2][:-1]) > 20.0
except ValueError:
pass
outlist.append(line)
dump(outlist, flag)

how to identify and print a pattern inside an ascii file in python 2?

I am trying to develop a program that can read patterns from a txt file using Python 2.x. This pattern is supposed to be a bug:
| |
###O
| |
And the pattern doesn't include the whitespaces.
So far I have come up with a way to open the txt file, read it and process the data inside of it but I can't think of a way to make Python understand this pattern as 1, instead of counting each character. I've tried regular expressions but it ended up showing an output similar to this:
| |
###O
| |
| |
###O
| |
| |
###O
| |
Instead of just saying how many of this pattern were detected inside the file, for example:
There were 3 occurrences.
Update: So far i got this
file = open('bug.txt', 'r')
data = file.read() #read content from file to a string
occurrences = data.count('| |\n\'###O\'\n| |\n')
print('Number of occurrences of the pattern:', occurrences)
But this is not working. The file itself has the patterns 3 times but with whitespaces in between, but the whitespace is not part of the pattern and when i try to paste the pattern from the file it breaks the lines, and if i correct the pattern to | | ###O | | it shows 0 occurrences because its not really the pattern.
It depends on how you store your ASCII data, but if you convert it to a string you can use the python .count() function.
For example:
# define string
ascii_string = "| | ###O | | | | ###O | | | | ###O | |"
pattern = "| | ###O | |"
count = ascii_string.count(pattern)
# print count in python 3+
print("Occurrences:", count)
# print count in python 2.7
print "Occurrences:", count
This will result in:
Occurrences: 3
>>> import re
>>> data = '''| |
... ###O
... | |
... | |
... ###O
... | |
... | |
... ###O
... | |'''
>>> result = re.findall('[ ]*\| \|\n[ ]*###O\n[ ]*\| \|', data)
>>> len(result)
3
>>>
Result being occurrences.
How to do it from a file:
import re
with open('some file.txt') as fd:
data = fd.read()
result = re.findall('[ ]*\| \|\n[ ]*###O\n[ ]*\| \|', data)
len(result)
Alternative way of doing it to accommodate for edit on OP:
>>> data = '''| |
... ###O
... | |
... | |
... ###O
... | |
... | |
... ###O
... | |
... | | ###O | |'''
>>> data.replace('\n', '').replace(' ', '').count('||###O||')
4
>>>
I solved the problem this way.
def somefuntion(file_name):
ascii_str = ''
with open(file_name, 'r') as reader:
for line in reader.readlines():
for character in line.replace('\n', '').replace(' ', ''):
ascii_str += str(ord(character))
return ascii_str
if __name__ == "__main__":
bug = somefuntion('bug.txt')
landscape = somefuntion('landscape.txt')
print(landscape.count(bug))

what are the arguments for converting a double-up pdf page into a one column pdf page using ghostscript

I need the arguments for the ghostscript in order to convert a double-up pdf page to a simple column pdf page
the input
+--------+-------+
| | |
| | |
| | |
| 1 | 2 |
| | |
| | |
+--------+--------+
the output
+-------+
| |
| 1 |
| |
| |
| |
| |
+--------+
+-------+
| |
| 2 |
| |
| |
| |
| |
+--------+
So depending on these two posts post1 and post2 I created this code
import sys
import locale
import ghostscript
args = [
"-ooutput.pdf",
"-sDEVICE=pdfwrite",
"-g2807x5950"
"-fpdfFile.pdf"
]
# arguments have to be bytes, encode them
encoding = locale.getpreferredencoding()
args = [a.encode(encoding) for a in args]
ghostscript.Ghostscript(*args)
I expeced a 2 page pdf file but a fatal error was raised
Edit: this is the error message
enter image description here
If you read the text its says "Device pdfwrite requires an output file but no file was specified". So that tells you that -o was ignored, or there was some other problem with it.
I suspect you are using the Ghostscript DLL, rather than forking a process, in which case you have to set argv[0] to a dummy value. The reason is that, when running a C program, argv[0] is the name of the executable. So the args processing skips over the 0th element of the args array.
This is covered in the Ghostscript documentation here
NB there also looks like a missing '.' in your argument list, but I don't speak Python so I could be wrong.
You probably need to change your args to something like:
args = [
"MyApp",
"-o output.pdf",
"-sDEVICE=pdfwrite",
"-g2807x5950",
"-fpdfFile.pdf"
]

String to Csv file using Python

I have the following string
string = "OGC Number | LT No | Job /n 9625878 | EPP3234 | 1206545/n" and continues on
I am trying to write it to a .CSV file where it will look like this:
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
where each newline in the string is a new row
where each "|" in the sting is a new column
I am having trouble getting the formatting.
I think I need to use:
string.split('/n')
string.split('|')
Thanks.
Windows 7, Python 2.6
Untested:
text="""
OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454"""
import csv
lines = text.splitlines()
with open('outputfile.csv', 'wb') as fout:
csvout = csv.writer(fout)
csvout.writerow(lines[0]) # header
for row in lines[2:]: # content
csvout.writerow([col.strip() for col in row.split('|')])
If you are interested in using a third party module. Prettytable is very useful and has a nice set of features to deal with and print tabular data.
EDIT: Oops, I missunderstood your question!
The code below will use two regular expressions to do the modifications.
import re
str="""OGC Number | LT No | Job
------------------------------
9625878 | EPP3234 | 1206545
9708562 | PGP43221 | 1105482
9887954 | BCP5466 | 1025454
"""
# just setup above
# remove all lines with at least 4 dashes
str=re.sub( r'----+\n', '', str )
# replace all pipe symbols with their
# surrounding spaces by single semicolons
str=re.sub( r' +\| +', ';', str )
print str

Extracting each line from a file and passing it as a variable to "foreach" loop

Could somebody help me figure out a simple way of doing this using any script ? I will be running the script on Linux
1 ) I have a file1 which has the following lines :
(Bank8GntR[3] | Bank8GntR[2] | Bank8GntR[1] | Bank8GntR[0] ),
(Bank7GntR[3] | Bank7GntR[2] | Bank7GntR[1] | Bank7GntR[0] ),
(Bank6GntR[3] | Bank6GntR[2] | Bank6GntR[1] | Bank6GntR[0] ),
(Bank5GntR[3] | Bank5GntR[2] | Bank5GntR[1] | Bank5GntR[0] ),
2 ) I need the contents of file1 to be modified as following and written to a file2
(Bank15GntR[3] | Bank15GntR[2] | Bank15GntR[1] | Bank15GntR[0] ),
(Bank14GntR[3] | Bank14GntR[2] | Bank14GntR[1] | Bank14GntR[0] ),
(Bank13GntR[3] | Bank13GntR[2] | Bank13GntR[1] | Bank13GntR[0] ),
(Bank12GntR[3] | Bank12GntR[2] | Bank12GntR[1] | Bank12GntR[0] ),
So I have to:
read each line from the file1,
use "search" using regular expression,
to match Bank[0-9]GntR,
replace \1 with "7 added to number matched",
insert it back into the line,
write the line into a new file.
How about something like this in Python:
# a function that adds 7 to a matched group.
# groups 1 and 2, we grabbed (Bank) to avoid catching the digits in brackets.
def plus7(matchobj):
return '%s%d' % (matchobj.group(1), int(matchobj.group(2)) + 7)
# iterate over the input file, have access to the output file.
with open('in.txt') as fhi, open('out.txt', 'w') as fho:
for line in fhi:
fho.write(re.sub('(Bank)(\d+)', plus7, line))
Assuming you don't have to use python, you can do this using awk:
cat test.txt | awk 'match($0, /Bank([0-9]+)GntR/, nums) { d=nums[1]+7; gsub(/Bank[0-9]+GntR\[/, "Bank" d "GntR["); print }'
This gives the desired output.
The point here is that match will match your data and allows capturing groups which you can use to extract out the number. As awk supports arithmetic, you can then add 7 within awk and then do a replacement on all the values in the rest of the line. Note, I've assumed all the values in the line have the same digit in them.

Categories

Resources