Text files help(Python 3) - python

Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.

As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])

In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).

You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)

The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.

in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")

A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)

Related

How to print out nicely formatted tables from a dictionary

For the sake of practicing how to be more comfortable and fluent in working with dictionaries, I have written a little program that reads the content of a file and adds it to a dictionary as a key: value pair. This is no problem, but when I got curious about how to print the content out again in the same format as the table in the datafile using for-loops, I ran into trouble.
My question is: How can I print out the content of the dictionary onto the terminal using for-loops?
The datafile is:
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
My code is:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
infile.close()
I would like the out print the data back onto the terminal. Ideally, the out print should look something like this
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
If someone can help me with this, I would be grateful.
The easiest solution is to use a library such as Tabulate, which you can find here an example of an output (You can customize it further)
>>> from tabulate import tabulate
>>> table = [["Sun",696000,1989100000],["Earth",6371,5973.6],
... ["Moon",1737,73.5],["Mars",3390,641.85]]
>>> print(tabulate(table))
----- ------ -------------
Sun 696000 1.9891e+09
Earth 6371 5973.6
Moon 1737 73.5
Mars 3390 641.85
----- ------ -------------
Otherwise, if you MUST use your own custom for-loop, you can add tabs to fix how it looks as in:
print(a+"\t") where \t is the horizental tabulation escape character
Edit: An example of how this can be utilized is below:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
print(words[i],end= '\t')
print()
infile.close()
Things to note:
1- For each field, we use print(...,end= '\t'), this causes the output to be a tab instead of a new line, we also might consider adding more tabs (e.g. end='\t\t') or spaces, or any other formating such as a seperator character (e.g. `end='\t|\t')
2- After each line, we use print(), this will only print a new line, moving the cursor for the printing downwards.
Take look at .ljust, .rjust and .center methods of str, consider following simple example
d = {"Alpha": 1, "Beta": 10, "Gamma": 100, "ExcessivelyLongName": 1}
for key, value in d.items():
print(key.ljust(5), str(value).rjust(3))
output
Alpha 1
Beta 10
Gamma 100
ExcessivelyLongName 1
Note that ljust does add (by default) space to attain specified width or do nothing if name is longer than that, also as values are integers they need to be first converted to str if you want to use one of mentioned methods.
You can do this using pandas although it isn't exactly your same styling:
import pandas as pd
with open('filename.csv') as f:
headers, *data = map(str.split, f.readlines())
df = pd.DataFrame(dict(zip(headers, zip(*data)))
print(df.to_string(index=False))
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU

Computational tractability of algorithm for matching names in two files in python

So I have two .txt files that I'm trying to match up. The first .txt file is just lines of about 12,500 names.
John Smith
Jane Smith
Joe Smith
The second .txt file also contains lines with names (that might repeat) but also extra info, about 17GB total.
584 19423 john smith John Smith 79946792 5 5 11 2016-06-24
584 19434 john smith John Smith 79923732 5 4 11 2018-03-14
584 19423 jane smith Jane Smith 79946792 5 5 11 2016-06-24
My goal is to find all the names from File 1 in File 2, and then spit out the File 2 lines that contain any of those File 1 names.
Here is my python code:
with open("Documents/File1.txt", "r") as t:
terms = [x.rstrip('\n') for x in t]
with open("Documents/File2.txt", "r") as f, open("Documents/matched.txt","w") as w:
for line in f:
if any([term in line for term in terms]):
w.write(line)
So this code definitely works, but it has been running for 3 days (and is still going!!!). I did some back-of-the-envelope calculations, and I'm very worried that my algorithm is computationally intractable (or hyper inefficient) given the size of the data.
Would anyone be able to provide feedback re: (1) whether this is actually intractable and/or extremely inefficient and if so (2) what an alternative algorithm might be?
Thank you!!
First, when testing membership, set and dict are going to be much, much faster, so terms should be a set:
with open("Documents/File1.txt", "r") as t:
terms = set(line.strip() for line in t)
Next, I would split each line into a list, and check if the name is in the set, not if members of the set are in the line, which is O(N) where N is the length of each line. This way you can directly pick out the column numbers (via slicing) that contain the first and last name:
with open("Documents/File2.txt", "r") as f, open("Documents/matched.txt","w") as w:
for line in f:
# split the line on whitespace
names = line.split()
# Your names seem to occur here
name = ' '.join(names[4:6])
if name in terms:
w.write(line)

How do you split a list by space in python?

How do you split a list by space? With the code below, it reads a file with 4 lines of 7 numbers separated by spaces. When it takes the file and then splits it, it splits it by number so if i print item[0], 5 will print instead of 50. here is the code
def main():
filename = input("Enter the name of the file: ")
infile = open(filename, "r")
for i in range(4):
data = infile.readline()
print(data)
item = data.split()
print(data[0])
main()
the file looks like this
50 60 15 100 60 15 40 /n
100 145 20 150 145 20 45 /n
50 245 25 120 245 25 50 /n
100 360 30 180 360 30 55 /n
Split takes as argument the character you want to split your string with.
I invite you to read the documentation of methods you are using. :)
EDIT : By the way, readline returns a string, not a **list **.
However, split does return a list.
import nltk
tokens = nltk.word_tokenize(TextInTheFile)
Try this once you have opened that file.
TextInTheFile is a variable
There's not a lot wrong with what you are doing, except that you are printing the wrong thing.
Instead of
print(data[0])
use
print(item[0])
data[0] is the first character of the string you read from file. You split this string into a variable called item so that's what you should print.

Find and copy a line using regex in Python

I am new to this forum and to programming and apologize in advance if I violate any of the forum rules. I have researched this extensively, but I couldn't find a solution for my problem.
So I have a very long file that has this general structure:
data="""
20.020001 563410 9
20.520001 577410 20
21.022001 591466 9
21.522001 605466 120
23.196001 652338 2
25.278001 710634 7
25.780001 724690 144
26.280001 738690 9
26.782001 752746 40
27.282001 766746 9
27.784001 780802 140
29.372001 825266 2
31.458001 883674 7
31.958002 897674 8
32.458002 911674 9
32.958002 925674 10
"""
I imported the file using
with open("C:\blablabla\text.txt", 'r+') as infile:
data = infile.read()
Now I am trying to use a regular expression to find all lines that end with 140 through 146, so I did this:
items=re.findall('.......................14[0-6]\n',data,re.MULTILINE)
for x in items:
print x
This works, but when I now try to copy those lines that contain the regular expression,
for x in items:
if items in data:
data.write(items)
I get the following error:
if items in data:
TypeError: 'in <string>' requires string as left operand, not list
I understand what the problem is, but I don't know how to solve it. How can I feed the left operand a string when the outcome of my regex is a list?
Any help is much appreciated!
You should simply handle each line separately:
data = infile.readlines()
for line in data:
if re.match('.......................14[0-6]\n', line):
print line[:-1]
The last character of the line is a trailing newline, which would be duplicated by the one the print statement includes.
You can read the file line by line:
data=""
with open("file.txt", 'r+') as infile:
for line in infile:
if (146 >= int(line.split()[-1]) >= 140) :
data = data + line
print data
Your Regex can be simplified further
re.findall('.*?14[0-6]\n')
To overcome your further problems
items = re.findall('.*?14[0-6]\n',data)
result=""""""
for x in items:
result+=str(x)
print result

Return the average mark for all student in that Section

I know it was asked already but the answers the super unclear
The first requirement is to open a file (sadly I have no idea how to do that)
The second requirement is a section of code that does the following:
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace
So I don't think i can target that element due to it being separate by whitespace?
Here is an excerpt of the file, showing line structure
987654322 Xu Carolyn L0101 19.5
233432555 Jones Billy Andrew L5101 16.0
555432345 Patel Amrit L0101 13.5
888332441 Fletcher Bobby L0201 18
777998713 Van Ryan Sarah Jane L5101 20
877633234 Zhang Peter L0102 9.5
543444555 Martin Joseph L0101 15
876543222 Abdolhosseini Mohammad Mazen L0102 18.5
I was provided the following hints:
Notice that the number of names per student varies.
Use rstrip() to get rid of extraneous whitespace at the end of the lines.
I don't understand the second hint.
This is what I have so far:
counter = 0
elements = -1
for sets in the_file
elements = elements + 1
if elements = 3
I know it has something to do with readlines() and the targeting the section code.
marks = [float(line.strip().split()[-1]) for line in open('path/to/input/file')]
average = sum(marks)/len(marks)
Hope this helps
Open and writing to files
strip method
Something like this?
data = {}
with open(filename) as f:#open a file
for line in f.readlines():#proceed through file lines
#next row is to split data using spaces and them skip empty using strip
stData = [x.strip() for x in line.split() if x.strip()]
#assign to variables
studentN, studentName, sectionCode, midtermGrade = stData
if sectionCode not in data:
data[sectionCode] = []
#building dict, key is a section code, value is a tuple with student info
data[sectionCode].append([studentN, studentName, float(midtermGrade)]
#make calculations
for k,v in data.iteritems():#iteritems returns you (key, value) pair on each iteration
print 'Section:' + k + ' Grade:' + str(sum(x[2] for x in v['grade']))
more or less:
infile = open('grade_file.txt', 'r')
score = 0
n = 0
for line in infile.readlines():
score += float(line.rstrip().split()[-1])
n += 1
avg = score / n

Categories

Resources