Suppose I have a text file containing this, where the number on the left says how many of the characters of the right should be there:
2 a
1 *
3 $
How would I get this output in the fastest time?
aa*$$$
This is my code, but has N^2 complexity:
f = open('a.txt')
for item in f:
item2=item.split()
num = int(item2[0])
for i in range(num):
line+=item2[1]
print(line)
f.close()
KISS
with open('file.txt') as f:
for line in f:
count, char = line.strip().split(' ')
print char * int(count),
Just print immediately:
for item in open('a.txt'):
num, char = item.strip().split()
print(int(num) * char, end='')
print() # Newline
You can multiply strings to repeat them in Python:
"foo" * 3 gives you foofoofoo.
line = []
with open("a.txt") as f:
for line in f:
n, c = line.rstrip().split(" ")
line.append(c * int(n))
print("".join(line))
You can print directly but the code above lets you get the output you want in a string if you care about that.
Using a list then joining is more efficient than using += on a string because strings are immutable in Python. This means that a new string must be created for each +=. Of course printing immediately avoids this issue.
You can try like this,
f = open('a.txt')
print ''.join(int(item.split()[0]) * item.split()[1] for item in f.readlines())
Your code is actually O(sum(n_i)) where n_i is the number in the row i. You can't do any better and none of the solutions in the other answers do, even if they might be faster than yours.
Related
So I have this problem I need to solve. I have a file that goes something like:
11-08-2012;1485;10184;7,53;31;706;227;29;6;1102
12-08-2012;2531;10272;7,59;25;695;222;26;22;1234
13-08-2012;1800;13418;8,66;46;714;203;50;6;0757
14-08-2012;2009;11237;9,43;81;655;246;49;7;1783
And I should be able to read the "1485" and then the "2531" part and then the "1800" part and go all the way to the end of the file and finally sum them up. How do I do that? I wrote under this text how I tried to approach this problem with while. But I seem to be lost with this one. Anyone can help?
while True:
f.seek(12)
text=f.read(4)
text=f.readline()
if(text==""):
break
return text
There are number of ways to do this, with numpy, pandas, simple coroutines and so on. I am adding the one closest to your approach.
total = 0
with open('exmplefile.txt','r') as f:
for line in f:
elements = line.split(';')
num_of_interest = int(elements[1])
# you can add a print if you want
total += num_of_interest
print(total)
This solution is by getting the first and second index of a common term, in this case ;.
with open(filename,'r') as file:
file_list = file.readlines()
sum = 0
for line in file_list:
loc = line.find(';')
first_loc = loc + 1
last_loc = loc +line[loc+1:].find(';')+1
sum = sum + int(line[first_loc:last_loc])
print(sum)
Try this
mylist = []
for string in file:
mynum = string.split(';')[1]
mylist.append(mynum)
sum([int(i) for i in mylist])
This solution caters for when the 4 digit is not the second item in the array
with open("path/to/file") as f:
f1 = f.readlines()
sum = 0
for line in f1:
lineInArray = line.split(';')
for digit in lineInArray:
if len(digit.strip()) == 4 and digit.strip().isnumeric():
sum += int(digit)
I just started learning Python a few weeks ago and I want to write a function which opens a file, counts and adds up the characters in each line and prints that those equal the total number of characters in the file.
For example, given a file test1.txt:
lineLengths('test1.txt')
The output should be:
15+20+23+24+0=82 (+0 optional)
This is what I have so far:
def lineLengths(filename):
f=open(filename)
lines=f.readlines()
f.close()
answer=[]
for aline in lines:
count=len(aline)
It does what I want it to do, but I don't know how to include all the of numbers added together when I have the function print.
If you only want to print the sum of the length of each line, you can do it like so:
def lineLengths(filename):
with open(filename) as f:
answer = []
for aline in f:
answer.append(len(aline))
print("%s = %s" %("+".join(str(c) for c in answer), sum(answer))
If you however also need to track lengths of all the individual lines, you can append the length for each line in your answer list by using the append method and then print the sum by using sum(answer)
Try this :
f=open(filename)
mylist = f.read().splitlines()
sum([len(i) for i in mylist])
Simple as this:
sum(map(len, open(filename)))
open(filename) returns an iterator that passes through each line, each of which is run through the len function, and the results are summed.
Once you read lines from file you can count sum using:
sum([len(aline) for aline in lines])
Separate you problem in function : a responsible by return total sum of lines and other to format sum of each line.
def read_file(file):
with open(file) as file:
lines = file.readlines()
return lines
def format_line_sum(lines):
lines_in_str = []
for line in lines:
lines_in_str.append(str(line)
return "+".join(str_lines))
def lines_length(file):
lines = read_file(file)
total_sum = 0
for line in lines:
total_sum += len(line)
return format_lines_sum(lines) + "=" + total_sum
And to use:
print(lines_length('file1.txt'))
Assuming your output is literal, something like this should work.
You can use python sum() function when you figure out how to add numbers to the list
def lineLengths(filename):
with open(filename) as f:
line_lengths = [len(l.rstrip()) for l in f]
summ = '+'.join(map(str, line_lengths)) # can only join strings
return sum(line_lengths), summ
total_chars, summ = lineLengths(filename)
print("{} = {}".format(summ, total_chars))
This should have the output you want : x+y+z=a
def lineLengths(filename):
count=[]
with open(filename) as f: #this is an easier way to open/close a file
for line in f:
count.append(len(line))
print('+'.join(str(x) for x in count) + "=" + str(sum(count))
I am just learning python and need some help for my class assignment.
I have a file with text and numbers in it. Some lines have from one to three numbers and others have no numbers at all.
I need to:
Extract numbers only from the file using regex
Find the sum of all the numbers
I used regex to extract out all the numbers. I am trying to get the total sum of all the numbers but I am just getting the sum of each line that had numbers. I have been battling with different ways to do this assignment and this is the closest I have gotten to getting it right.
I know I am missing some key parts but I am not sure what I am doing wrong.
Here is my code:
import re
text = open('text_numbers.txt')
for line in text:
line = line.strip()
y = re.findall('([0-9]+)',line)
if len(y) > 0:
print sum(map(int, y))
The result I get is something like this
(each is a sum of a line):
14151
8107
16997
18305
3866
And it needs to be one sum like this (sum of all numbers):
134058
import re
import np
text = open('text_numbers.txt')
final = []
for line in text:
line = line.strip()
y = re.findall('([0-9]+)',line)
if len(y) > 0:
lineVal = sum(map(int, y))
final.append(lineVal)
print "line sum = {0}".format(lineVal)
print "Final sum = {0}".format(np.sum(final))
Is that what you're looking for?
I dont know much python but I can give a simple solution.
Try this
import re
hand = open('text_numbers.txt')
x=list()
for line in hand:
y=re.findall('[0-9]+',line)
x=x+y
sum=0
for i in x:
sum=sum + int(i)
print sum
My first attempt to answer with the use of regular expressions, I find it a great skill to practise, that reading other's code.
import re # import regular expressions
chuck_text = open("regex_sum_286723.txt")
numbers = []
Total = 0
for line in chuck_text:
nmbrs = re.findall('[0-9]+', line)
numbers = numbers + nmbrs
for n in numbers:
Total = Total + float(n)
print "Total = ", Total
and thanx to Beer for the 'comprehension list' one liner, though his 'r' seems not needed, not sure what it does. But it reads beautifully, I get more confused reading two lots of loops like my answer
import re
print sum([int(i) for i in re.findall('[0-9]+',open("regex_sum_286723.txt").read())])
import re
text = open('text_numbers.txt')
data=text.read()
print sum(map(int,re.findall(r"\b\d+\b",data)))
Use .read to get content in string format
import re
sample = open ('text_numbers.txt')
total =0
dignum = 0
for line in sample:
line = line.rstrip()
dig= re.findall('[0-9]+', line)
if len(dig) >0:
dignum += len(dig)
linetotal= sum(map(int, dig))
total += linetotal
print 'The number of digits are: '
print dignum
print 'The sum is: '
print total
print 'The sum ends with: '
print total % 1000
import re
print sum([int(i) for i in re.findall('[0-9]+',open(raw_input('What is the file you want to analyze?\n'),'r').read())])
You can compact it into one line, but this is only for fun!
Here is my solution to this problem.
import re
file = open('text_numbers.txt')
sum = 0
for line in file:
line = line.rstrip()
line = re.findall('([0-9]+)', line)
for i in line:
i = int(i)
sum += i
print(sum)
The line elements in first for loop are the lists also and I used second for loop to convert its elements to integer from string so I can sum them.
import re
fl=open('regex_sum_7469.txt')
ls=[]
for x in fl: #create a list in the list
x=x.rstrip()
print x
t= re.findall('[0-9]+',x) #all numbers
for d in t: #for loop as there a empthy values in the list a
ls.append(int(d))
print (sum(ls))
Here is my code:
f = open('regex_sum_text.txt', 'r').read().strip()
y = re.findall('[0-9]+', f)
l = [int(s) for s in y]
s = sum(l)
print(s)
another shorter way is:
with open('regex_sum_text.txt', 'r') as f:
total = sum(map(int, re.findall(r'[0-9]+', f.read())))
print(total)
import re
print(sum(int(value) for value in re.findall('[0-9]+', open('regex_sum_1128122.txt').read())))
Write a program that reads the contents of a random text file. The program should create a dictionary in which the keys are individual words found in the file and the values are the number of times each word appears.
How would I go about doing this?
def main():
c = 0
dic = {}
words = set()
inFile = open('text2', 'r')
for line in inFile:
line = line.strip()
line = line.replace('.', '')
line = line.replace(',', '')
line = line.replace("'", '') #strips the punctuation
line = line.replace('"', '')
line = line.replace(';', '')
line = line.replace('?', '')
line = line.replace(':', '')
words = line.split()
for x in words:
for y in words:
if x == y:
c += 1
dic[x] = c
print(dic)
print(words)
inFile.close()
main()
Sorry for the vague question. Never asked any questions here before. This is what I have so far. Also, this is the first ever programming I've done so I expect it to be pretty terrible.
with open('path/to/file') as infile:
# code goes here
That's how you open a file
for line in infile:
# code goes here
That's how you read a file line-by-line
line.strip().split()
That's how you split a line into (white-space separated) words.
some_dictionary['abcd']
That's how you access the key 'abcd' in some_dictionary.
Questions for you:
What does it mean if you can't access the key in a dictionary?
What error does that give you? Can you catch it with a try/except block?
How do you increment a value?
Is there some function that GETS a default value from a dict if the key doesn't exist?
For what it's worth, there's also a function that does almost exactly this, but since this is pretty obviously homework it won't fulfill your assignment requirements anyway. It's in the collections module. If you're interested, try and figure out what it is :)
There are at least three different approaches to add a new word to the dictionary and count the number of occurences in this file.
def add_element_check1(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 1
else:
my_dict[e] += 1
def add_element_check2(my_dict, elements):
for e in elements:
if e not in my_dict:
my_dict[e] = 0
my_dict[e] += 1
def add_element_except(my_dict, elements):
for e in elements:
try:
my_dict[e] += 1
except KeyError:
my_dict[e] = 1
my_words = {}
with open('pathtomyfile.txt', r) as in_file:
for line in in_file:
words = [word.strip().lower() word in line.strip().split()]
add_element_check1(my_words, words)
#or add_element_check2(my_words, words)
#or add_element_except(my_words, words)
If you are wondering which is the fastest? The answer is: it depends. It depends on how often a given word might occur in the file. If a word does only occur (relatively) few times, the try-except would be the best choice in your case.
I have done some simple benchmarks here
This is a perfect job for the built in Python Collections class. From it, you can import Counter, which is a dictionary subclass made for just this.
How you want to process your data is up to you. One way to do this would be something like this
from collections import Counter
# Open your file and split by white spaces
with open("yourfile.txt","r") as infile:
textData = infile.read()
# Replace characters you don't want with empty strings
textData = textData.replace(".","")
textData = textData.replace(",","")
textList = textData.split(" ")
# Put your data into the counter container datatype
dic = Counter(textList)
# Print out the results
for key,value in dic.items():
print "Word: %s\n Count: %d\n" % (key,value)
Hope this helps!
Matt
My function doesn't work as it is supposed to. I keep getting 'True' when all line[0] are less than line[2]. I know this is pretty trivial, but it's an exercise i've taken to better understand files and for
def contains_greater_than(filename):
"""
(str) --> bool
The text file of which <filename> is the name contains multiple lines.
Each line consists of two integer numbers, separated by a space.
This returns True iff in at least one of those lines, the first number
is larger than the second one.
"""
lines = open(filename).readlines()
for line in lines:
if line[0] > line[2]:
return True
return False
my data:
3 6
3 7
3 8
2 9
3 20
Having been thoroughly schooled in my over-thought previous answer, may I offer this far simpler solution which still short-circuits as intended:
for line in lines:
x, y = line.split()
if int(x) > int(y): return True
return False
line[0] = "3" , line[1] = " "
for all cases in your data ('3' < ' ' = False)
you need to do
split_line = line.split()
then
numbers = [int(x) for x in split_line]
then looks at numbers[0] and numbers[1]
1) You are comparing strings that you need to convert to integers
2) You will only grab the first and third character (so, you won't get the 0 in 20)
Instead use
first, second = line.split()
if first < second:
Here's a whole-hog functional rewrite. Hope this is enlightening ;-)
import functools
def line_iter(fname):
with open(fname) as inf:
for line in inf:
line = line.strip()
if line:
yield line
def any_line(fn, fname):
return any(fn(line) for line in line_iter(fname))
def is_greater_than(line):
a,b = [int(i) for i in line]
return a > b
contains_greater_than = functools.partial(any_line, is_greater_than)
"3 20" is a string, just do map(int, LINE.split()) before.
but how do you want compare 2 numbers with 2 numbers?
The main problem is you are comparing characters of the line, not the values of the two numbers on each one. This can be avoided first splitting the line into white-space-separated words, and then turning those into an integer value for the comparison by applying theint()function to each one:
def contains_greater_than(filename):
with open(filename) as inf:
for line in inf:
a, b = map(int, line.split())
if a > b:
return True
return False
print(contains_greater_than('comparison_data.txt'))
This can all be done very succinctly in Python using the built-inany()function with a couple of generator expressions:
def contains_greater_than(filename):
with open(filename) as inf:
return any(a > b for a, b in (map(int, line.split()) for line in inf))