Ordering a file - python

I have this code
archivo=open("archivo.csv","r")
for i in range(10):
for reg in archivo:
if archivo[reg] < archivo[reg+1]:
x = archivo[reg]
archivo[reg] = archivo[reg+1]
archivo[reg+1] = x
archivo.close()
archivo = open("archivo.csv","w")
archivo.write(reg)
What i want is to order alphabetically the files and to save it ordered, but i have several errors. The main one says that the file has not atribute getitem and in the web i didn't find anything equal. Can someone help me?
Input looks like
Matt | 7 | 8
John | 9 | 6
Jim | 6 | 7

I have modified the source CSV file to be comma separated. So archivo.csv looks like
Matt,7,8
John,9,6
Jim,6,7
Now to read this file, python already has standard module called csv. Using that we can read and write csv reliably.
from csv import reader, writer
archivo=reader(open("archivo.csv","r"))
a = sorted(archivo)
archivo1 = writer(open("archivo1.csv", "w"))
for row in a:
archivo1.writerow(row)

Related

Pickle persistence

I'm starting using Python recently and I'm trying to make a program that manipulates data in Python using pickle, however I would like my file to be kind like this:
CODE | PIECE | PRICE
line one 1 1 1,00
line two 2 2 2,00
Consider 1 right down CODE, 1 right down PIECE and 1,00 right down PRICE until gets 50.
Here's the question: Is there anyway to do this using pickle? Like:
columns = int(input('Number of columns : ')) # Which would be 3 (code, piece and price)
data = [ ]
for i in range(columns):
raw = input('Enter data '+str(i)+' : ')
data.append(raw)
file = open('file.dat', 'wb')
pickle.dump(data, file)
file.close()
Obviously, it cannot be done using input, so is there some way to do this?

Python print .psl format without quotes and commas

I am working on a linux system using python3 with a file in .psl format common to genetics. This is a tab separated file that contains some cells with comma separated values. An small example file with some of the features of a .psl is below.
input.psl
1 2 3 x read1 8,9, 2001,2002,
1 2 3 mt read2 8,9,10 3001,3002,3003
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
I need to filter this file to extract only regions of interest. Here, I extract only rows with a value of 9 in the fourth column.
import csv
def read_psl_transcripts():
psl_transcripts = []
with open("input.psl") as input_psl:
csv_reader = csv.reader(input_psl, delimiter='\t')
for line in input_psl:
#Extract only rows matching chromosome of interest
if '9' == line[3]:
psl_transcripts.append(line)
return psl_transcripts
I then need to be able to print or write these selected lines in a tab delimited format matching the format of the input file with no additional quotes or commas added. I cant seem to get this part right and additional brackets, quotes and commas are always added. Below is an attempt using print().
outF = open("output.psl", "w")
for line in read_psl_transcripts():
print(str(line).strip('"\''), sep='\t')
Any help is much appreciated. Below is the desired output.
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
You might be able to solve you problem with a simple awk statement.
awk '$4 == 9' input.pls > output.pls
But with python you could solve it like this:
write_pls = open("output.pls", "w")
with open("input.pls") as file:
for line in file:
splitted_line = line.split()
if splitted_line[3] == '9':
out_line = '\t'.join(splitted_line)
write_pls.write(out_line + "\n")
write_pls.close()

How can I get records from an array into a table in python?

I have an xml file, with some data that I am extracting and placing in a numpy record array. I print the array and I see the data is in the correct location. I am wondering how I can take that information in my numpy record array and place it in a table. Also I am getting the letter b when I print my record, how do I fix that?
Xml data
<instance name="uart-0" module="uart_16550" offset="000014"/>
<instance name="uart-1" offset="000020" module="uart_16650"/>
Code in python
inst_rec=np.zeros(5,dtype=[('name','a20'),('module','a20'),('offset','a5')])
for node in xml_file.iter():
if node.tag=="instance":
attribute=node.attrib.get('name')
inst_rec[i]= (node.attrib.get('name'),node.attrib.get('module'),node.attrib.get('offset'))
i=i+1
for x in range (0,5):
print(inst_rec[x])
Output
(b'uart-0', b'uart_16550', b'00001')
(b'uart-1', b'uart_16650', b'00002')
You are using Python3, which uses unicode strings. It displays byte strings with the b. The xml file may also be bytes, for example, encoding='UTF-8'.
You can get rid of the b, by passing the strings through decode() before printing.
More on writing csv files in Py3
Numpy recarray writes byte literals tags to my csv file?
In my tests, I can simplify the display by making the inst_rec array use unicode strings ('U20')
import numpy as np
import xml.etree.ElementTree as ET
tree = ET.parse('test.xml')
root = tree.getroot()
# inst_rec=np.zeros(2,dtype=[('name','a20'),('module','a20'),('offset','a5')])
inst_rec = np.zeros(2,dtype=[('name','U20'),('module','U20'),('offset','U5')])
i = 0
for node in root.iter():
if node.tag=="instance":
attribute=node.attrib.get('name')
rec = (node.attrib.get('name'),node.attrib.get('module'),node.attrib.get('offset'))
inst_rec[i] = rec
# no need to decode
i=i+1
# simple print of the array
print(inst_rec)
# row by row print
for x in range(inst_rec.shape[0]):
print(inst_rec[x])
# formatted row by row print
for rec in inst_rec:
print('%20s,%20s, %5s'%tuple(rec))
# write a csv file
np.savetxt('test.out', inst_rec, fmt=['%20s','%20s','%5s'], delimiter=',')
producing
[('uart-0', 'uart_16550', '00001') ('uart-1', 'uart_16650', '00002')]
('uart-0', 'uart_16550', '00001')
('uart-1', 'uart_16650', '00002')
uart-0, uart_16550, 00001
uart-1, uart_16650, 00002
and
1703:~/mypy$ cat test.out
uart-0, uart_16550,00001
uart-1, uart_16650,00002
As ASCII table display
# formatted row by row print
print('----------------------------------------')
for rec in inst_rec:
print('| %20s | %20s | %5s |'%tuple(rec))
print('---------------------------------------')
If you want anything fancier you need to specify the display tool - html, rich text, etc.
with the added package prettyprint:
import prettytable
pp = prettytable.PrettyTable()
pp.field_names = inst_rec.dtype.names
for rec in inst_rec:
pp.add_row(rec)
print(pp)
produces
+--------+------------+--------+
| name | module | offset |
+--------+------------+--------+
| uart-0 | uart_16550 | 00001 |
| uart-1 | uart_16650 | 00002 |
+--------+------------+--------+
In Python3 I am still using the unicode dtype. prettyprint will display the b if the any of the strings are byte.
To avoid printing b'xxx', try this:
print (', '.join(y.decode() for y in inst_rec[x]))

Parsing and adding content of a file in Python

What I am doing is that I have a file which contains some data as follows:
ben | 2 | 40
germany | 6 | 60
What I need as an output is::
ben | 2 | 40
germany | 6 | 60
field 1 = 8
field 2 = 100
Please suggest me some solution to move ahead in Python.
This has the aroma of a homework assignment to me, so I'm going to try to stick to providing some pointers rather than an outright solution.
You can use Python's open() function to open a file. The resulting object can be iterated over in a loop, like for line in myfile:. When you're done with the file you should call myfile.close(), and you could re-open it in "append" mode to write the answer at the end.
Each line will be a string, and you can call line.split('|') to get the line into chunks. I like to use multiple assignment: name, col1, col2 = line.split('|'). You will probably need to use int() to coerce the numbers from string format to integer format so that you can add them up.
I think that's probably a pretty reliable start, right?
there's probably a more elegant way to do this...
results = [0, 0]
with open("\path\to\file.txt") as f:
for line in f:
values = line.split("|")
results[0] += int(values[1])
results[1] += int(values[2])
print("field 1 = " + str(results[0]), "field 2 = " + str(results[1]))

Python -- how to read and change specific fields from file? (specifically, numbers)

I just started learning python scripting yesterday and I've already gotten stuck. :(
So I have a data file with a lot of different information in various fields.
Formatted basically like...
Name (tab) Start# (tab) End# (tab) A bunch of fields I need but do not do anything with
Repeat
I need to write a script that takes the start and end numbers, and add/subtract a number accordingly depending on whether another field says + or -.
I know that I can replace words with something like this:
x = open("infile")
y = open("outfile","a")
while 1:
line = f.readline()
if not line: break
line = line.replace("blah","blahblahblah")
y.write(line + "\n")
y.close()
But I've looked at all sorts of different places and I can't figure out how to extract specific fields from each line, read one field, and change other fields. I read that you can read the lines into arrays, but can't seem to find out how to do it.
Any help would be great!
EDIT:
Example of a line from the data here: (Each | represents a tab character)
| |
V V
chr21 | 33025905 | 33031813 | ENST00000449339.1 | 0 | **-** | 33031813 | 33031813 | 0 | 3 | 1835,294,104, | 0,4341,5804,
chr21 | 33036618 | 33036795 | ENST00000458922.1 | 0 | **+** | 33036795 | 33036795 | 0 | 1 | 177, | 0,
The second and third columns (indicated by arrows) would be the ones that I'd need to read/change.
You can use csv to do the splitting, although for these sorts of problems, I usually just use str.split:
with open(infile) as fin,open('outfile','w') as fout:
for line in fin:
#use line.split('\t'3) if the name of the field can contain spaces
name,start,end,rest = line.split(None,3)
#do something to change start and end here.
#Note that `start` and `end` are strings, but they can easily be changed
#using `int` or `float` builtins.
fout.write('\t'.join((name,start,end,rest)))
csv is nice if you want to split lines like this:
this is a "single argument"
into:
['this','is','a','single argument']
but it doesn't seem like you need that here.

Categories

Resources