Formatting output of CSV data? - python

I'm fairly new to python and made something that had this output:
(The text is in a csv file so so:
1,A
2,B
3,C etc)
Number Letter
1 A
2 B
3 C
26 Z
Unfortunately, I spent a good amount of time making it using a complicated method in which I manually made spaces like this:
Updated Code rn
fx = int(input('Number?\n'))
f=open('nums.txt','r')
lines=f.readlines()
line = lines[fx - 1]
with open('nums.txt','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
NUM, LTR, SMB = line.rsplit(',', 1)
print(NUM.ljust(13) + LTR.ljust(13) + SMB)
How do I get it to make 3 columns? Right now it comes up with a
ValueError: not enough values to unpack (expected 3, got 2)
So is there a simpler method of achieving this that doesn't move the strings around like this:
Number Letter
1 A
2 B
3 C
26 Z #< string moves with spaces.

For simple alignment, you can use ljust or rjust. There is also no need to read the entire file for each line you want to process:
with open('numberletter','r') as f:
for i, line in enumerate(f):
if i >= 5:
break
number, letter = line.rsplit(',', 1)
print(number.ljust(13) + letter)
For more complex output formatting, look at str.format() and the formatting syntax

You can use sys module for that.
import sys
a=[1,"A"]
sys.stdout.write("%-6s %-50s " % (a[0],a[1]))

Related

I'm reading into a 256 byte string. I want to skip it, if it's all binary zeros (\x00) Is there a single test?

Totally new to python. Trying to parse a file but not all records contain data. I want to skip the records that are all hex 00.
if record == ('\x00' * 256): from a sample of print("-"*80))
gave a Syntax error, hey I said I was new. :)
Thanks for the reply, I'm using 2.7 and reading like this....
with open(testfile, "rb") as f:
counter = 0
while True:
record = f.read(256)
counter += 1
Your example looks to be very close. I'm not sure about Python 2, but in Python 3 you should specify that a string is binary.
I would do something like:
empty = b'\x00' * 256
if record == empty:
print('skipped this line')
Remember that Python 2 uses print statements, so you should do print 'skipped this line' instead.

Separate lines in Python

I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)

Convert string (letter) from file text to integer

I'm the new one to machine learning. I got some problem when trying to use int for letters. I use Python 3.5 on Mac OS. This is my code:
def file2matrix(filename):
fr = open(filename)
numberOfLines = len(fr.readlines())
returnMat = zeros((numberOfLines, 3))
classLabelVector = []
fr = open(filename)
index=0
for line in fr.readlines():
line = line.strip()
listFromLine1 = line.split('\t')
listFromLine = zeros(3)
i = 0
for value in listFromLine1:
if value.isdigit():
valueAsInt = int(value)
listFromLine[i] = valueAsInt
i += 1
returnMat[index, :] = listFromLine[0:3]
classLabelVector.append(int(listFromLine1[-1]))
index += 1
return returnMat, classLabelVector
This is my txt file:
23 8 1 f
7 8 5 j
5 9 1 j
6 6 6 f
This is the error:
classLabelVector.append(int(listFromLine1[-1])) ValueError: invalid literal for int() with base 10: 'f'
Can anybody help me with these problems?
If I understand your desired outcome correctly, you want to return a list with n lists in it. Each list will be along the line of [23. 8. 1.]. Then you want a second list that takes the last number of each list like this: [1, 5, 1, 6].
Assuming this is all correct, the reason you are getting classLabelVector.append(int(listFromLine1[-1])) ValueError: invalid literal for int() with base 10: 'f' is because you are not returning any numbers, but a string instead. I found 3 issues that should fix the error.
First, I found no '\t' in your text document. I instead used listFromLine1 = line.split(' ') and it split based on spaces. This could simply be from the way it copied over when you posted, though.
Second, when you assign a value for each position in listFromLine you then ignore it and append from listFromLine1 which you did nothing to, so it remains a string.
Third, try using if value.isnumeric(): instead of if value.isdigit():.
Fixing these few problems should get the program working. Also, you open the file and run fr.readlines() twice and never tell it to close. Your making the program work twice for the same information. You should try to rewrite it to only open once and use with open() as fr: because it will close when done.
EDIT: if you want the second list to be the letters instead [f, j, j, f] then keep it as listFromLine1 and use str() instead of int(): classLabelVector.append(str(listFromLine1[-1]))

matching and dispalying specific lines through python

I have 15 lines in a log file and i want to read the 4th and 10 th line for example through python and display them on output saying this string is found :
abc
def
aaa
aaa
aasd
dsfsfs
dssfsd
sdfsds
sfdsf
ssddfs
sdsf
f
dsf
s
d
please suggest through code how to achieve this in python .
just to elaborate more on this example the first (string or line is unique) and can be found easily in logfile the next String B comes within 40 lines of the first one but this one occurs at lots of places in the log file so i need to read this string withing the first 40 lines after reading string A and print the same that these strings were found.
Also I cant use with command of python as this gives me errors like 'with' will become a reserved keyword in Python 2.6. I am using Python 2.5
You can use this:
fp = open("file")
for i, line in enumerate(fp):
if i == 3:
print line
elif i == 9:
print line
break
fp.close()
def bar(start,end,search_term):
with open("foo.txt") as fil:
if search_term in fil.readlines()[start,end]:
print search_term + " has found"
>>>bar(4, 10, "dsfsfs")
"dsfsfs has found"
#list of random characters
from random import randint
a = list(chr(randint(0,100)) for x in xrange(100))
#look for this
lookfor = 'b'
for element in xrange(100):
if lookfor==a[element]:
print a[element],'on',element
#b on 33
#b on 34
is one easy to read and simple way to do it. Can you give part of your log file as an example? There are other ways that may work better :).
after edits by author:
The easiest thing you can do then is:
looking_for = 'findthis' i = 1 for line in open('filename.txt','r'):
if looking_for == line:
print i, line
i+=1
it's efficient and easy :)

Python regex question

I'm having some problems figuring out a solution to this problem.
I want to read from a file on a per line basis and analyze whether that line has one of two characters (1 or 0). I then need to sum up the value of the line and also find the index value (location) of each of the "1" character instances.
so for example:
1001
would result in:
line 1=(count:2, pos:[0,3])
I tried a lot of variations of something like this:
r=urllib.urlopen(remote-resouce)
list=[]
for line in lines:
for m in re.finditer(r'1',line):
list.append((m.start()))
I'm having two issues:
1) I thought that the best solution would be to iterate through each line and then use a regex finditer function. My issue here is that I keep failing to write a for loop that works. Despite my best efforts, I keep returning the results as one long list, rather than a multidimensional array of dictionaries.
Is this approach the right one? If so, how do I write the correct for loop?
If not, what else should I try?
Perhaps do it without regex:
import urllib
url='http://stackoverflow.com/questions/5158168/python-regex-question/5158341'
f=urllib.urlopen(url)
for linenum,line in enumerate(f):
print(line)
locations=[pos for pos,char in enumerate(line) if char=='1']
print('line {n}=(count:{c}, pos:{l})'.format(
n=linenum,
c=len(locations),
l=locations
))
Using regexes here is probably a bad idea. You can see if a 1 or 0 is in a line of text with '0' in line or '1' in line, and you can get the count with line.count('1').
Finding all of the locations of 1s does require iterating through the string, I believe.
Unubtu's code works fine. I tested it on a sample file which also has all 0's for a particular line. Here is the complete code -
#! /usr/bin/python
2
3 # Write a program to read a text file which has 1's and 0's on each line
4 # For each line count the number of 1's and their position and print it
5
6 import sys
7
8 def countones(infile):
9 f = open(infile,'r')
10 for linenum, line in enumerate(f):
11 locations = [pos for pos,char in enumerate(line) if char == '1']
12 print('line {n}=(count:{c}, pos:{l})'.format(n=linenum,c=len(locations),l= locations))
13
14
15 def main():
16 infile = './countones.txt'
17 countones(infile)
18
19 # Standard boilerplate to call the main() function to begin the program
20 if __name__ == '__main__':
21 main()
Input file -
1001
110001
111111
00001
010101
00000
Result -
line 0=(count:2, pos:[0, 3])
line 1=(count:3, pos:[0, 1, 5])
line 2=(count:6, pos:[0, 1, 2, 3, 4, 5])
line 3=(count:1, pos:[4])
line 4=(count:3, pos:[1, 3, 5])
line 5=(count:0, pos:[])

Categories

Resources