Parsing and adding content of a file in Python - python

What I am doing is that I have a file which contains some data as follows:
ben | 2 | 40
germany | 6 | 60
What I need as an output is::
ben | 2 | 40
germany | 6 | 60
field 1 = 8
field 2 = 100
Please suggest me some solution to move ahead in Python.

This has the aroma of a homework assignment to me, so I'm going to try to stick to providing some pointers rather than an outright solution.
You can use Python's open() function to open a file. The resulting object can be iterated over in a loop, like for line in myfile:. When you're done with the file you should call myfile.close(), and you could re-open it in "append" mode to write the answer at the end.
Each line will be a string, and you can call line.split('|') to get the line into chunks. I like to use multiple assignment: name, col1, col2 = line.split('|'). You will probably need to use int() to coerce the numbers from string format to integer format so that you can add them up.
I think that's probably a pretty reliable start, right?

there's probably a more elegant way to do this...
results = [0, 0]
with open("\path\to\file.txt") as f:
for line in f:
values = line.split("|")
results[0] += int(values[1])
results[1] += int(values[2])
print("field 1 = " + str(results[0]), "field 2 = " + str(results[1]))

Related

Pickle persistence

I'm starting using Python recently and I'm trying to make a program that manipulates data in Python using pickle, however I would like my file to be kind like this:
CODE | PIECE | PRICE
line one 1 1 1,00
line two 2 2 2,00
Consider 1 right down CODE, 1 right down PIECE and 1,00 right down PRICE until gets 50.
Here's the question: Is there anyway to do this using pickle? Like:
columns = int(input('Number of columns : ')) # Which would be 3 (code, piece and price)
data = [ ]
for i in range(columns):
raw = input('Enter data '+str(i)+' : ')
data.append(raw)
file = open('file.dat', 'wb')
pickle.dump(data, file)
file.close()
Obviously, it cannot be done using input, so is there some way to do this?

Separate lines in Python

I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)

Ordering a file

I have this code
archivo=open("archivo.csv","r")
for i in range(10):
for reg in archivo:
if archivo[reg] < archivo[reg+1]:
x = archivo[reg]
archivo[reg] = archivo[reg+1]
archivo[reg+1] = x
archivo.close()
archivo = open("archivo.csv","w")
archivo.write(reg)
What i want is to order alphabetically the files and to save it ordered, but i have several errors. The main one says that the file has not atribute getitem and in the web i didn't find anything equal. Can someone help me?
Input looks like
Matt | 7 | 8
John | 9 | 6
Jim | 6 | 7
I have modified the source CSV file to be comma separated. So archivo.csv looks like
Matt,7,8
John,9,6
Jim,6,7
Now to read this file, python already has standard module called csv. Using that we can read and write csv reliably.
from csv import reader, writer
archivo=reader(open("archivo.csv","r"))
a = sorted(archivo)
archivo1 = writer(open("archivo1.csv", "w"))
for row in a:
archivo1.writerow(row)

Not able to read rpt file using Python 3

I am trying to read a .rpt file using the python code:
>>> with open(r'C:\Users\lenovo-pc\Desktop\training2.rpt','r',encoding = 'utf-8', errors = 'replace') as d:
... count = 0
... for i in d.readlines():
... count = count + 1
... print(i+"\n")
...
...
u
i
d
|
e
x
p
i
d
|
n
a
m
e
|
d
o
m
a
i
n
And I am getting the following result as mentioned above.
Kindly, let me know how I can read the .rpt file using python3.
This is, indeed, strange behavior. While I can not easily reproduce the error without knowing the format of the .rpt file here are some hints what might go wrong. I assume it looks something like this:
uid|expid|name|domain
...
Which can be read and printed with the following code:
with open(r'C:\Users\lenovo-pc\Desktop\training2.rpt','r',encoding = 'utf-8', errors = 'replace') as rfile:
count = 0
for line in rfile:
count += 1
print(line.strip()) # this removes white spaces, line breaks etc.
However, the problem seems to be that you iterate over the string of the first line in your file instead of the lines in the file. That would produce the patter of you see, as the print() function adds a line break (in addition to the one you add manually). This leaves you with on character per line (followed by two line breaks).
>>> for i in "foo":
... print(i+"\n")
f
o
o
Make sure you did not reuse variable names from earlier in the session and do not overwrite the file object.

Python -- how to read and change specific fields from file? (specifically, numbers)

I just started learning python scripting yesterday and I've already gotten stuck. :(
So I have a data file with a lot of different information in various fields.
Formatted basically like...
Name (tab) Start# (tab) End# (tab) A bunch of fields I need but do not do anything with
Repeat
I need to write a script that takes the start and end numbers, and add/subtract a number accordingly depending on whether another field says + or -.
I know that I can replace words with something like this:
x = open("infile")
y = open("outfile","a")
while 1:
line = f.readline()
if not line: break
line = line.replace("blah","blahblahblah")
y.write(line + "\n")
y.close()
But I've looked at all sorts of different places and I can't figure out how to extract specific fields from each line, read one field, and change other fields. I read that you can read the lines into arrays, but can't seem to find out how to do it.
Any help would be great!
EDIT:
Example of a line from the data here: (Each | represents a tab character)
| |
V V
chr21 | 33025905 | 33031813 | ENST00000449339.1 | 0 | **-** | 33031813 | 33031813 | 0 | 3 | 1835,294,104, | 0,4341,5804,
chr21 | 33036618 | 33036795 | ENST00000458922.1 | 0 | **+** | 33036795 | 33036795 | 0 | 1 | 177, | 0,
The second and third columns (indicated by arrows) would be the ones that I'd need to read/change.
You can use csv to do the splitting, although for these sorts of problems, I usually just use str.split:
with open(infile) as fin,open('outfile','w') as fout:
for line in fin:
#use line.split('\t'3) if the name of the field can contain spaces
name,start,end,rest = line.split(None,3)
#do something to change start and end here.
#Note that `start` and `end` are strings, but they can easily be changed
#using `int` or `float` builtins.
fout.write('\t'.join((name,start,end,rest)))
csv is nice if you want to split lines like this:
this is a "single argument"
into:
['this','is','a','single argument']
but it doesn't seem like you need that here.

Categories

Resources