Find the smallest float in a file and printing the line - python

I have a data file like this:
1 13.4545
2 10.5578
3 12.5578
4 5.224
I am trying to find the line with the smallest float number and print or write the entire line (including the integer) to another file. so i get this:
4 5.224
I have this but does not work:
with open(file) as f:
small = map(float, line)
mini = min(small)
print mini
also tried using this:
with open(file) as f:
mylist = [[line.strip(),next(f).strip()] for line in f]
minimum = min(mylist, key = lambda x: float(x[1]))
print minimum

Using your data file, we can iterate over each line of the file inside min since min takes an iterator:
>>> with open(fn) as f:
... print min(f)
...
1 13.4545
Obviously, that is using the ascii value of the integer for determining min.
Python's min takes a key function:
def kf(s):
return float(s.split()[1])
with open(fn) as f:
print min(f, key=kf)
Or:
>>> with open(fn) as f:
... print min(f, key=lambda line: float(line.split()[1]))
...
4 5.224
The advantage (in both versions) is that the file is processed line by line -- no need to read the entire file into memory.
The entire line is printed but only the float part is used to determine the min value of that line.
To fix YOUR version, the issue is your first list comprehension. Your version has next() in it which you probably thought was the next number. It isn't: It is the next line:
>>> with open(fn) as f:
... mylist = [[line.strip(),next(f).strip()] for line in f]
...
>>> mylist
[['1 13.4545', '2 10.5578'], ['3 12.5578', '4 5.224']]
The first list comprehension should be:
>>> with open(fn) as f:
... mylist=[line.split() for line in f]
...
>>> mylist
[['1', '13.4545'], ['2', '10.5578'], ['3', '12.5578'], ['4', '5.224']]
Then the rest will work OK (but you will have the split list in this case -- not the line -- to print):
>>> minimum=min(mylist, key = lambda x: float(x[1]))
>>> minimum
['4', '5.224']

You were quite near, this is the minimal edit needed
with open(fl) as f: # don't use file as variable name
line = [i.strip().split() for i in f] # Get the lines as separate line no and value
line = [(x[0],float(x[1])) for x in line] # Convert the second value in the file to float
m = min(line,key = lambda x:x[1]) # find the minimum float value, that is the minimum second argument.
print "{} {}".format(m[0],m[1]) # print it. Hip Hip Hurray \o/

a=open('d.txt','r')
d=a.readlines()
m=float(d[0].split()[1])
for x in d[1:]:
if float(x.split()[1])<m:
m=float(x.split()[1])
print m

map:
map(function, iterable, ...)
Apply function to every item of iterable and return a list of the results. Link
Demo:
>>> map(float , ["1.9", "2.0", "3"])
[1.9, 2.0, 3.0]
>>> map(float , "1.9")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: .
>>>
Read input file by csv module because structure of input file is fixed.
Set small and small_row variables to None.
Read file row by row.
Type Casting of second item from the row, from string to float.
Check small variable is None or less then second item of row.
If yes the assign small value and small_row respectivly
Demo:
import csv
file1 = "1_d.txt"
small = None
small_row = None
with open(file1) as fp:
root = csv.reader(fp, delimiter=' ')
for row in root:
item = float(row[1])
if small==None or small>item:
small = item
small_row = row
print "small_row:", small_row
Output:
$ python 3.py
small_row: ['4', '5.224']

Related

Sorting dictionary elements from file with capitalization of first letter

Hi i am writing a program to read a dictionary from file and then it capital the first letter and sort it alphabetically
What i am trying is to convert dictionary into list. Then i have tried for loop but somehow it is not working
my file contains
banana:123
sweet:32
nutella:23
my code:
f = open("test.txt", "r")
lst=[]
for line in f.readlines():
data,price = line.split(":")
lst.append([data, (price)])
f.close()
now when i try for loop to get the index it gives me error
for i in range(lst[0:len(lst)][0]):
it gives me error:
TypeError: 'list' object cannot be interpreted as an integer
My question is getting confusing... The main thing i need to be done is that the first letter of each line becomes capital and they get sorted in file alphabetically.
also please tell me what i am doing wrong in this for loop... I want index to start from [0] to the len[lst]
for the range issue just use
for i in range(len(lst)):
you provide a list instead of an integer
for the sorting use this
from operator import itemgetter
f = open("test.txt", "r")
lst = []
for line in f.readlines():
data, price = line.split(":")
lst.append([data, (price)])
f.close()
lst = sorted(lst, key=itemgetter(1))
print map(lambda x: x[0] + ":" + x[1], map(lambda x: [x[0].capitalize(), x[1]], lst))
.....
.....
and write to the file
Just use sorted over mapping capitalize:
with open("test.txt", "r") as f:
result = sorted(map(lambda x: x.capitalize().split(":"), f.readlines()))
Here you have a live example
The problem is range, in your case, only needs an int input. You are supplying a list input. Specifically, lst[0:len(lst)][0] outputs the first element of lst, which is a list itself.
To loop over your list of lists by row, you should use range(len(lst)):
for i in range(len(lst)):
See the docs for more information on range syntax.

Python: Read text file into dict and ignore comments

I am trying to put the following text file into a dictionary but I would like any section starting with '#' or empty lines ignored.
My text file looks something like this:
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
My desired output would be:
myVariables = {'Apples': 1, 'Oranges': 3, 'Bananas': 5}
My Python code reads as follows:
filename = "myFile.txt"
myVariables = {}
with open(filename) as f:
for line in f:
if line.startswith('#') or not line:
next(f)
key, val = line.split()
myVariables[key] = val
print "key: " + str(key) + " and value: " + str(val)
The error I get:
Traceback (most recent call last):
File "C:/Python27/test_1.py", line 11, in <module>
key, val = line.split()
ValueError: need more than 1 value to unpack
I understand the error but I do not understand what is wrong with the code.
Thank you in advance!
Given your text:
text = """
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
"""
We can do this in 2 ways. Using regex, or using Python generators. I would choose the latter (described below) as regex is not particularly fast(er) in such cases.
To open the file:
with open('file_name.xyz', 'r') as file:
# everything else below. Just substitute `for line in lines` with
# `for line in file.readline()`
Now to create a similar, we split the lines, and create a list:
lines = text.split('\n') # as if read from a file using `open`.
Here is how we do all you want in a couple of lines:
# Discard all comments and empty values.
comment_less = filter(None, (line.split('#')[0].strip() for line in lines))
# Separate items and totals.
separated = {item.split()[0]: int(item.split()[1]) for item in comment_less}
Lets test:
>>> print(separated)
{'Apples': 1, 'Oranges': 3, 'Bananas': 5}
Hope this helps.
This doesn't exactly reproduce your error, but there's a problem with your code:
>>> x = "Apples\t1\t# This is a comment"
>>> x.split()
['Apples', '1', '#', 'This', 'is', 'a', 'comment']
>>> key, val = x.split()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Instead try:
key = line.split()[0]
val = line.split()[1]
Edit: and I think your "need more than 1 value to unpack" is coming from the blank lines. Also, I'm not familiar with using next() like this. I guess I would do something like:
if line.startswith('#') or line == "\n":
pass
else:
key = line.split()[0]
val = line.split()[1]
To strip comments, you could use str.partition() which works whether the comment sign is present or not in the line:
for line in file:
line, _, comment = line.partition('#')
if line.strip(): # non-blank line
key, value = line.split()
line.split() may raise an exception in this code too—it happens if there is a non-blank line that does not contain exactly two whitespace-separated words—it is application depended what you want to do in this case (ignore such lines, print warning, etc).
You need to ignore empty lines and lines starting with # splitting the remaining lines after either splitting on # or using rfind as below to slice the string, an empty line will have a new line so you need and line.strip() to check for one, you cannot just split on whitespace and unpack as you have more than two elements after splitting including what is in the comment:
with open("in.txt") as f:
d = dict(line[:line.rfind("#")].split() for line in f
if not line.startswith("#") and line.strip())
print(d)
Output:
{'Apples': '1', 'Oranges': '3', 'Bananas': '5'}
Another option is to split twice and slice:
with open("in.txt") as f:
d = dict(line.split(None,2)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
Or splitting twice and unpacking using an explicit loop:
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
k, v, _ = line.split(None, 2)
d[k] = v
You can also use itertools.groupby to group the lines you want.
from itertools import groupby
with open("in.txt") as f:
grouped = groupby(f, lambda x: not x.startswith("#") and x.strip())
d = dict(next(v).split(None, 2)[:2] for k, v in grouped if k)
print(d)
To handle where we have multiple words in single quotes we can use shlex to split:
import shlex
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
data = shlex.split(line)
d[data[0]] = data[1]
print(d)
So changing the Banana line to:
Bananas 'north-side disabled' # I want to ignore this comment too!
We get:
{'Apples': '1', 'Oranges': '3', 'Bananas': 'north-side disabled'}
And the same will work for the slicing:
with open("in.txt") as f:
d = dict(shlex.split(line)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
If the format of the file is correctly defined you can try a solution with regular expressions.
Here's just an idea:
import re
fruits = {}
with open('fruits_list.txt', mode='r') as f:
for line in f:
match = re.match("([a-zA-Z0-9]+)[\s]+([0-9]+).*", line)
if match:
fruit_name, fruit_amount = match.groups()
fruits[fruit_name] = fruit_amount
print fruits
UPDATED:
I changed the way of reading lines taking care of large files. Now I read line by line and not all in one. This improves the memory usage.

Python- how to convert lines in a .txt file to dictionary elements?

Say I have a file "stuff.txt" that contains the following on separate lines:
q:5
r:2
s:7
I want to read each of these lines from the file, and convert them to dictionary elements, the letters being the keys and the numbers the values.
So I would like to get
y ={"q":5, "r":2, "s":7}
I've tried the following, but it just prints an empty dictionary "{}"
y = {}
infile = open("stuff.txt", "r")
z = infile.read()
for line in z:
key, value = line.strip().split(':')
y[key].append(value)
print(y)
infile.close()
try this:
d = {}
with open('text.txt') as f:
for line in f:
key, value = line.strip().split(':')
d[key] = int(value)
You are appending to d[key] as if it was a list. What you want is to just straight-up assign it like the above.
Also, using with to open the file is good practice, as it auto closes the file after the code in the 'with block' is executed.
There are some possible improvements to be made. The first is using context manager for file handling - that is with open(...) - in case of exception, this will handle all the needed tasks for you.
Second, you have a small mistake in your dictionary assignment: the values are assigned using = operator, such as dict[key] = value.
y = {}
with open("stuff.txt", "r") as infile:
for line in infile:
key, value = line.strip().split(':')
y[key] = (value)
print(y)
Python3:
with open('input.txt', 'r', encoding = "utf-8") as f:
for line in f.readlines():
s=[] #converting strings to list
for i in line.split(" "):
s.append(i)
d=dict(x.strip().split(":") for x in s) #dictionary comprehension: converting list to dictionary
e={a: int(x) for a, x in d.items()} #dictionary comprehension: converting the dictionary values from string format to integer format
print(e)

How to read specific lines of a large csv file

I am trying to read some specific rows of a large csv file, and I don't want to load the whole file into memory. The index of the specific rows are given in a list L = [2, 5, 15, 98, ...] and my csv file looks like this:
Col 1, Col 2, Col3
row11, row12, row13
row21, row22, row23
row31, row32, row33
...
Using the ideas mentioned here I use the following command to read the rows
with open('~/file.csv') as f:
r = csv.DictReader(f) # I need to read it as a dictionary for my purpose
for i in L:
for row in enumerate(r):
print row[i]
I immediately get the following error:
IndexError Traceback (most recent call last)
<ipython-input-25-78951a0d4937> in <module>()
6 for i in L:
7 for row in enumerate(r):
----> 8 print row[i]
IndexError: tuple index out of range
Question 1. It seems like my use of the for loops here is obviously wrong. Any ideas on how to fix this?
On the other hand, the following gets the job done, but it's too slow:
def read_csv_line(line_number):
with open("~/file.csv") as f:
r = csv.DictReader(f)
for i, line in enumerate(r):
if i == (line_number - 2):
return line
return None
for i in L:
print read_csv_line(i)
Question 2. Any idea on how to improve this basic method of going through the whole file until I reach row i then print it?
A file doesn't have "lines" or "rows". What you consider a "line" is "what is found between two newline characters". As such you cannot read the nth line without reading the lines before it, as you couldn't count the newline characters.
Answer 1: if you consider your example, but with L=[9], unrolling your loops would give:
i=9
row = (0, {'Col 2': 'row12', 'Col 3': 'row13', 'Col 1': 'row11'})
As you can see, row is a tuple with two members, calling row[i] means row[9], hence the IndexError.
Answer 2: This is very slow because you are reading the file up to the line number every time. In your example, you read the first 2 lines, then the first 5, then the first 15, then the first 98, etc. So you've read the first 5 lines 3 times. You could create a generator that only returns the lines you want (beware, line numbers would be 0-indexed):
def read_my_lines(csv_reader, lines_list):
for line_number, row in enumerate(csv_reader):
if line_number in lines_list:
yield line_number, row
So when you want to process the lines, you would do:
L = [2, 5, 15, 98, ...]
with open('~/file.csv') as f:
r = csv.DictReader(f)
for line_number, line in read_my_lines(r, L):
do_something_with_line(line)
* Edit *
This could further be improved to stop reading the file when you've read all the lines you wanted:
def read_my_lines(csv_reader, lines_list):
# make sure every line number shows up only once:
lines_set = set(lines_list)
for line_number, row in enumerate(csv_reader):
if line_number in lines_set:
yield line_number, row
lines_set.remove(line_number)
# Stop when the set is empty
if not lines_set:
raise StopIteration
Assuming L is a list containing the line numbers you want, you could do :
with open("~/file.csv") as f:
r = csv.DictReader(f)
for i, line in enumerate(r):
if i in L: # or (i+2) in L: from your second example
print line
That way :
you read the file only once
you do not load the whole file in memory
you only get the lines you are interested in
The only caveat is that you read whole file even if L = [3]
for row in enumerate(r):
will pull tuples. You are then trying to select your ith element from a 2 element tuple.
for example
>> for i in enumerate({"a":1, "b":2}): print i
(0, 'a')
(1, 'b')
Additionally, since dictionaries are hash tables, your initial order is not necessarily preserved. for instance:
>>list({"a":1, "b":2, "c":3, "d":5})
['a', 'c', 'b', 'd']
Just to sum up the great ideas, I ended up using something like this: L can be sorted relatively quickly, and in my case it was actually already sorted. So, instead of several membership checks in L it pays off to sort it and then only check each index against the first entry of it. Here is my piece of code:
count=0
with open('~/file.csv') as f:
r = csv.DictReader(f)
for row in r:
count += 1
if L == []:
break
elif count == L[0]:
print (row)
L.pop(0)
Note that this stops as soon as we've gone through L once.

Change the display of a list took from text file

I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.

Categories

Resources