I am trying to format a print statement to align a specific column.
Currently my output is:
0 - Rusty Bucket (40L bucket - quite rusty) = $0.00
1 - Golf Cart (Tesla powered 250 turbo) = $195.00*
2 - Thermomix (TM-31) = $25.50*
3 - AeroPress (Great coffee maker) = $5.00
4 - Guitar (JTV-59) = $12.95
The output I am looking for is:
0 - Rusty Bucket (40L bucket - quite rusty) = $0.00
1 - Golf Cart (Tesla powered 250 turbo) = $195.00*
2 - Thermomix (TM-31) = $25.50*
3 - AeroPress (Great coffee maker) = $5.00
4 - Guitar (JTV-59) = $12.95
Here is the code I am currently using for the print:
def list_items():
count = 0
print("All items on file (* indicates item is currently out):")
for splitline in all_lines:
in_out = splitline[3]
dollar_sign = "= $"
daily_price = "{0:.2f}".format(float(splitline[2]))
if in_out == "out":
in_out = str("*")
else:
in_out = str("")
print(count, "- {} ({}) {}{}{}".format(splitline[0], splitline[1], dollar_sign, daily_price, in_out))
count += 1
I have tried using formatting such as:
print(count, "- {:>5} ({:>5}) {:>5}{}{}".format(splitline[0], splitline[1], dollar_sign, daily_price, in_out))
but have never been able to get just the one column to align. Any help or suggestions would be greatly appreciated! I am also using python 3.x
To note I am using tuples to contain all the information, with all_lines being the master list, as it were. The information is being read from a csv originally. Apologies for the horrible naming conventions, trying to work on functionality first.
Sorry if this has been answered elsewhere; I have tried looking.
EDIT: Here is the code im using for my all_lines
import csv
open_file = open('items.csv', 'r+')
all_lines = []
for line in open_file:
splitline = line.strip().split(',')
all_lines.append((splitline[0], splitline[1], splitline[2], splitline[3]))
And here is the csv file information:
Rusty Bucket,40L bucket - quite rusty,0.0,in
Golf Cart,Tesla powered 250 turbo,195.0,out
Thermomix,TM-31,25.5,out
AeroPress,Great coffee maker,5.0,in
Guitar,JTV-59,12.95,in
You should look at str.ljust(width[, fillchar]):
> '(TM-31)'.ljust(15)
'(TM-31) ' # pad to width
Then extract the variable-length {} ({}) part and padd it to the necessary width.
What you are looking for may be:
[EDIT] I have now also included a tabulate version since you may gain flexibility with this.
import csv
from tabulate import tabulate
open_file = open('items.csv', 'r+')
all_lines = []
for line in open_file:
splitline = line.strip().split(',')
all_lines.append((splitline[0], splitline[1], splitline[2], splitline[3]))
#print all_lines
count = 0
new_lines=[]
for splitline in all_lines:
in_out = splitline[3]
dollar_sign = "= $"
daily_price = "{0:.2f}".format(float(splitline[2]))
if in_out == "out":
in_out = str("*")
else:
in_out = str("")
str2='('+splitline[1]+')'
print count, "- {:<30} {:<30} {}{:<30} {:<10}".format(splitline[0], str2, dollar_sign, daily_price, in_out)
new_lines.append([splitline[0], str2, dollar_sign, daily_price, in_out])
count += 1
print tabulate(new_lines, tablefmt="plain")
print
print tabulate(new_lines, tablefmt="plain", numalign="left")
I do not like the idea of controlling printing format myself.
In this case, I would leverage a tabulating library such as: Tabulate.
The two key points are:
keep data in a table (e.g. list in a list)
select proper printing format with tablefmt param.
Related
I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)
I am a non-programmer geographer, heard some programming concepts but very newby :-)
I am to read six rows of environmental data. 1000 lines at the most, each time.
Each row housing two digit numbers (0 to 99) a summer issue, only positive numbers.
Once I read them I am to display the numbers 0 to 99 vertically with the number of occurrences for the reading for each of the six rows:
0 = 230.....0 = 3........0 = 230......0 = 123......0 = 223......0 = 334
1 = 67......1 = 657......1 = 627......1 = 767......1 = 467......1 = 337
2 = 762.....2 = 328......2 = 987......2 = 326......2 = 32.......2 = 123
.
.
99 = 3.....99 = 34.......99 = 1.......99 = 89......99 = 78......99 = 123
If I can get this far I will feel great. Once I learn how to do this and I can look at the data I can decide what makes sense to run next; excel, graphs, statistics, statistics in R, get the numbers into a matrix to manipulate from there, etc. First time so I am figuring this out as I go.
Any help will be much appreciated,
Adolfo
I am working in the research for the restoration of Quebrada Verde watershed in Valparaiso, Chile.
from array import array
import sys
if len(sys.argv) > 1:
count = array('H', [0]*100)
file = open(sys.argv[1], 'r')
if file:
for line in file:
count[int(line)]+=1
file.close()
for a in range (100):
print(a, count[a], sep='\t')
else:
print('unable to open the file')
else:
print('usage: python', sys.argv[0], ' file')
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = [format(s,'02')+format(d.year,'04')+format(d.month,'02')+format(d.day,'02')+format(d.hour,'02')+format(d.minute,'02')+format(int(p*0.2),'04')]
outfile.writelines(nr+'/n')
Using the above script, I have read in a .txt file and reformatted it as 'nr' so it looks like this:
['012015072314000000']
['012015072313450000']
['012015072313300000']
['012015072313150000']
['012015072313000000']
['012015072312450000']
['012015072312300000']
['012015072312150000']
..etc.
I need to now print it onto my new .txt file, but Python is not allowing me to print 'nr' with line breaks after each entry, I think because the data is in strings. I get this error:
TypeError: can only concatenate list (not "str") to list
Is there another way to do this?
You are trying to combine a list with a string, which cannot work. Simply don't create a list in nr.
import csv
import datetime
with open('soundTransit1_remote_rawMeasurements_15m.txt','r') as infile, open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile,delimiter='\t')
#ouw = csv.writer(outfile,delimiter=' ')
for row in inr:
d = datetime.datetime.strptime(row[0],'%Y-%m-%d %H:%M:%S')
s = 1
p = int(row[5])
nr = "{:02d}{:%Y%m%d%H%M}{:04d}\n".format(s,d,int(p*0.2))
outfile.write(nr)
There is no need to put your string into a list; just use outfile.write() here and build a string without a list:
nr = format(s,'02') + format(d.year,'04') + format(d.month, '02') + format(d.day, '02') + format(d.hour, '02') + format(d.minute, '02') + format(int(p*0.2), '04')
outfile.write(nr + '\n')
Rather than use 7 separate format() calls, use str.format():
nr = '{:02}{:%Y%m%d%H%M}{:04}\n'.format(s, d, int(p * 0.2))
outfile.write(nr)
Note that I formatted the datetime object with one formatting operation, and I included the newline into the string format.
You appear to have hard-coded the s value; you may as well put that into the format directly:
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, int(p * 0.2))
outfile.write(nr)
Together, that updates your script to:
with open('soundTransit1_remote_rawMeasurements_15m.txt', 'r') as infile,\
open('soundTransit1.txt','w') as outfile:
inr = csv.reader(infile, delimiter='\t')
for row in inr:
d = datetime.datetime.strptime(row[0], '%Y-%m-%d %H:%M:%S')
p = int(int(row[5]) * 0.2)
nr = '01{:%Y%m%d%H%M}{:04}\n'.format(d, p)
outfile.write(nr)
Take into account that the csv module works better if you follow the guidelines about opening files; in Python 2 you need to open the file in binary mode ('rb'), in Python 3 you need to set the newline parameter to ''. That way the module can control newlines correctly and supports including newlines in column values.
I am coding a Python script that collect words in a text file (PDB file), later gathering them in phrases. However, as I am just a beginner in programming, I'm having huge difficulties in doing it. I know how to do it just one line per time. I wish you guys could give me some help. Please.
The text has informations about sites of a protein. Each site has four dedicated lines of information, as you can see below:
REMARK 800
REMARK 800 SITE_IDENTIFIER: CC1
REMARK 800 EVIDENCE_CODE: SOFTWARE
REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE EDO A 326
REMARK 800
REMARK 800 SITE_IDENTIFIER: DF8
REMARK 800 EVIDENCE_CODE: AUTHOR
REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE HEM T 238
REMARK 800
REMARK 800 SITE_IDENTIFIER: FC7
REMARK 800 EVIDENCE_CODE: SOFTWARE
REMARK 800 SITE_DESCRIPTION: BINDING SITE FOR RESIDUE NAG D 1001
#and so on ...
An extended exemple is seen in the following link (search for "REMARK 800"):
http://www.pdb.org/pdb/files/3HDL.pdb
As observed,
The 1st line has nothing. (It just separates one information from the next)
the 2nd has the SITE_IDENTIFIER. (e.g. CC1)
the 3rd, the EVIDENCE_CODE. (e.g. SOFTWARE)
the 4th, some RESIDUE informations. (e.g. EDO A 326)
This pattern is seen in a great part of the text.
What I want to do is to gather some words from three of the four consecutive dedicated lines, in such way that they are put together in one phrase. The necessary informations are the SITE_IDENTIFIER, the EVIDENCE_CODE, and 3 words from SITE_DESCRIPTION. Thus, concerning the text excerpt above, the resulting phrases would be something like this:
CC1 SOFTWARE EDO A 326
DF8 AUTHOR HEM T 238
FC7 SOFTWARE NAG D 1001
#and so on...
Is it possible to do? If so, can you guys imagine how can I do this?
I tried doing it this way, but I feel like it is not going to work at all:
name_file = "3HDL.pdb"
pdb_file = open(name_file,"r")
for line in pdb_file:
list = line.split()
list_2=[]
for j in range(0, 15):
list_2.append("")
if (list[0] == "REMARK" and list[1] == "800"):
j=0
while not j == len(list):
list_2[j] = list[j]
j+=1
n=1
if(list_2[0] == "REMARK" and list_2[1] == "800" and list_2[2] == "SITE_IDENTIFIER:"):
n+=1
print("Site", str(n) + ":", list_2[3])
print("ok" + "\n")
As you can see, I am really a beginner.
Sorry about any grammar problems and thank you very much.
How about something like this:
import re
f = open("3HDL.pdb", "r")
for line in f:
m = re.search(r"REMARK 800 SITE_IDENTIFIER: (.+)", line)
if m:
site_id = m.group(1).strip()
else:
m = re.search(r"REMARK 800 EVIDENCE_CODE: (.+)", line)
if m:
evidence_code = m.group(1).strip()
else:
m = re.search(r"REMARK 800 SITE_DESCRIPTION: (.+)", line)
if m:
site_descrip = m.group(1).strip()
print site_id, evidence_code, site_descrip
f.close()
Or, if you want to avoid using the regex module:
f = open("3HDL.pdb", "r")
for line in f:
if line.startswith("REMARK 800"):
if line.startswith("SITE_IDENTIFIER:", 11):
site_id = line[28:].rstrip()
elif line.startswith("EVIDENCE_CODE:", 11):
evidence_code = line[26:].rstrip()
elif line.startswith("SITE_DESCRIPTION:", 11):
site_descrip = line[29:].rstrip()
print site_id, evidence_code, site_descrip
f.close()
Here we assume the content wanted are the last word of line 2,3 and the last 3 words of line 4.
name_file = "3HDL.pdb"
pdb_file = open(name_file,"r")
output = []
for linenum, line in enumerate(pdb_file):
if linenum % 4 ==0:
continue
elif linenum % 4 == 1:
output.append(line.split()[-1])
elif linenum % 4 == 2:
output.append(line.split()[-1])
elif linenum % 4 == 3:
output.extend(line.split()[-3:])
for i in range(len(output)/6):
print ' '.join(output[i:i+6])
So the question basically gives me 19 DNA sequences and wants me to makea basic text table. The first column has to be the sequence ID, the second column the length of the sequence, the third is the number of "A"'s, 4th is "G"'s, 5th is "C", 6th is "T", 7th is %GC, 8th is whether or not it has "TGA" in the sequence. Then I get all these values and write a table to "dna_stats.txt"
Here is my code:
fh = open("dna.fasta","r")
Acount = 0
Ccount = 0
Gcount = 0
Tcount = 0
seq=0
alllines = fh.readlines()
for line in alllines:
if line.startswith(">"):
seq+=1
continue
Acount+=line.count("A")
Ccount+=line.count("C")
Gcount+=line.count("G")
Tcount+=line.count("T")
genomeSize=Acount+Gcount+Ccount+Tcount
percentGC=(Gcount+Ccount)*100.00/genomeSize
print "sequence", seq
print "Length of Sequence",len(line)
print Acount,Ccount,Gcount,Tcount
print "Percent of GC","%.2f"%(percentGC)
if "TGA" in line:
print "Yes"
else:
print "No"
fh2 = open("dna_stats.txt","w")
for line in alllines:
splitlines = line.split()
lenstr=str(len(line))
seqstr = str(seq)
fh2.write(seqstr+"\t"+lenstr+"\n")
I found that you have to convert the variables into strings. I have all of the values calculated correctly when I print them out in the terminal. However, I keep getting only 19 for the first column, when it should go 1,2,3,4,5,etc. to represent all of the sequences. I tried it with the other variables and it just got the total amounts of the whole file. I started trying to make the table but have not finished it.
So my biggest issue is that I don't know how to get the values for the variables for each specific line.
I am new to python and programming in general so any tips or tricks or anything at all will really help.
I am using python version 2.7
Well, your biggest issue:
for line in alllines: #1
...
fh2 = open("dna_stats.txt","w")
for line in alllines: #2
....
Indentation matters. This says "for every line (#1), open a file and then loop over every line again(#2)..."
De-indent those things.
This puts the info in a dictionary as you go and allows for DNA sequences to go over multiple lines
from __future__ import division # ensure things like 1/2 is 0.5 rather than 0
from collections import defaultdict
fh = open("dna.fasta","r")
alllines = fh.readlines()
fh2 = open("dna_stats.txt","w")
seq=0
data = dict()
for line in alllines:
if line.startswith(">"):
seq+=1
data[seq]=defaultdict(int) #default value will be zero if key is not present hence we can do +=1 without originally initializing to zero
data[seq]['seq']=seq
previous_line_end = "" #TGA might be split accross line
continue
data[seq]['Acount']+=line.count("A")
data[seq]['Ccount']+=line.count("C")
data[seq]['Gcount']+=line.count("G")
data[seq]['Tcount']+=line.count("T")
data[seq]['genomeSize']+=data[seq]['Acount']+data[seq]['Gcount']+data[seq]['Ccount']+data[seq]['Tcount']
line_over = previous_line_end + line[:3]
data[seq]['hasTGA']= data[seq]['hasTGA'] or ("TGA" in line) or (TGA in line_over)
previous_line_end = str.strip(line[-4:]) #save previous_line_end for next line removing new line character.
for seq in data.keys():
data[seq]['percentGC']=(data[seq]['Gcount']+data[seq]['Ccount'])*100.00/data[seq]['genomeSize']
s = '%(seq)d, %(genomeSize)d, %(Acount)d, %(Ccount)d, %(Tcount)d, %(Tcount)d, %(percentGC).2f, %(hasTGA)s'
fh2.write(s % data[seq])
fh.close()
fh2.close()