Python read line from file and sort int descending - python

So I got this text file looking like this:
PID TTY TIME CMD
1000 pts/2 00:00:00 aash
9000 pts/2 00:00:00 bash
3000 pts/2 00:00:00 cash
What I want to end up with is some kind of dictionary where I save |(PID,CMD)| sorted by PID descending.
So it would look like this:
[(9000,bash),(3000,cash),(1000,aash)]
Any Ideas?
This is how I read the file and save in dictionary.
dict = {}
with open('newfile.txt') as f:
next(f) #skipping first line
for line in f:
result[line.split()[3]] = int(line.split()[0])
Appreciate any kind of help! Thanks in advance !

So this is the solution:
import collections
result = {}
with open('newfile.txt') as f:
next(f)
for line in f:
result[line.split()[3]] = int(line.split()[0])
print(collections.OrderedDict(sorted(result.items(), key=lambda t: t[1])))
This is what it prints out:
OrderedDict([('aash', 1000), ('cash', 3000), ('bash', 9000)])])

If you need to end up with a list, then best is to read the data into a list and then to sort it, here is how:
lst = []
with open('newfile.txt') as f:
next(f)
for line in f:
if line.split() != '': # watch out for empty lines
a, b, c, d = line.split()
lst.append((int(a), d))
lst = sorted(lst)
print(lst)
====
[(1000, 'aash'), (3000, 'cash'), (9000, 'bash')]
sorted() sorts by the first item on the tuple, so you can use it in its basic form.
If what you need is a dictionary where the keys are sorted, then you can use OrderedDict, just import it and add another line to the code:
from collections import OrderedDict
and then
d = OrderedDict(lst)
print(d)
And here is the result:
OrderedDict([(1000, 'aash'), (3000, 'cash'), (9000, 'bash')])

Related

how to read file line by line in python

I'm trying to read below text file in python, I'm struggling to get as key value in output but its not working as expected:
test.txt
productId1 ProdName1,ProdPrice1,ProdDescription1,ProdDate1
productId2 ProdName2,ProdPrice2,ProdDescription2,ProdDate2
productId3 ProdName3,ProdPrice3,ProdDescription3,ProdDate3
productId4 ProdName4,ProdPrice4,ProdDescription4,ProdDate4
myPython.py
import sys
with open('test.txt') as f
lines = list(line.split(' ',1) for line in f)
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))
I'm trying to parse the text file and trying to print key and value separately. Looks like I'm doing something wrong here. Need some help to fix this?
Thanks!
You're needlessly storing a list.
Loop, split and print
with open('test.txt') as f:
for line in f:
k, v = line.rstrip().split(' ',1)
print("Key : {0}, Value: {1}".format(k,v))
This should work, with a list comprehension:
with open('test.txt') as f:
lines = [line.split(' ',1) for line in f]
for k, v in lines:
print("Key: {0}, Value: {1}".format(k, v))
You can make a dict right of the bat with a dict comp and than iterate the list to print as you wanted. What you had done was create a list, which does not have an items() method.
with open('notepad.txt') as f:
d = {line.split(' ')[0]:line.split(' ')[1] for line in f}
for k,v in d.items():
print("Key : {0}, Value: {1}".format(k,v))
lines is a list of lists, so the good way to finish the job is:
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f)
for k,v in lines:
print("Key : {0}, Value: {1}".format(k,v))
Perhaps I am reading too much into your description but I see one key, a space and a comma limited name of other fields. If I interpret that as their being data for those items that is comma limited then I would conclude you want a dictionary of dictionaries. That would lead to code like:
data_keys = 'ProdName', 'ProdPrice', 'ProdDescription', 'ProdDate'
with open('test.txt') as f:
for line in f:
id, values = l.strip().split() # automatically on white space
keyed_values = zip(data_keys, values.split(','))
print(dict([('key', id)] + keyed_values))
You can use the f.readlines() function that returns a list of lines in the file f. I changed the code to include f.lines in line 3.
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f.readlines())
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))

Python: Read text file into dict and ignore comments

I am trying to put the following text file into a dictionary but I would like any section starting with '#' or empty lines ignored.
My text file looks something like this:
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
My desired output would be:
myVariables = {'Apples': 1, 'Oranges': 3, 'Bananas': 5}
My Python code reads as follows:
filename = "myFile.txt"
myVariables = {}
with open(filename) as f:
for line in f:
if line.startswith('#') or not line:
next(f)
key, val = line.split()
myVariables[key] = val
print "key: " + str(key) + " and value: " + str(val)
The error I get:
Traceback (most recent call last):
File "C:/Python27/test_1.py", line 11, in <module>
key, val = line.split()
ValueError: need more than 1 value to unpack
I understand the error but I do not understand what is wrong with the code.
Thank you in advance!
Given your text:
text = """
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
"""
We can do this in 2 ways. Using regex, or using Python generators. I would choose the latter (described below) as regex is not particularly fast(er) in such cases.
To open the file:
with open('file_name.xyz', 'r') as file:
# everything else below. Just substitute `for line in lines` with
# `for line in file.readline()`
Now to create a similar, we split the lines, and create a list:
lines = text.split('\n') # as if read from a file using `open`.
Here is how we do all you want in a couple of lines:
# Discard all comments and empty values.
comment_less = filter(None, (line.split('#')[0].strip() for line in lines))
# Separate items and totals.
separated = {item.split()[0]: int(item.split()[1]) for item in comment_less}
Lets test:
>>> print(separated)
{'Apples': 1, 'Oranges': 3, 'Bananas': 5}
Hope this helps.
This doesn't exactly reproduce your error, but there's a problem with your code:
>>> x = "Apples\t1\t# This is a comment"
>>> x.split()
['Apples', '1', '#', 'This', 'is', 'a', 'comment']
>>> key, val = x.split()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Instead try:
key = line.split()[0]
val = line.split()[1]
Edit: and I think your "need more than 1 value to unpack" is coming from the blank lines. Also, I'm not familiar with using next() like this. I guess I would do something like:
if line.startswith('#') or line == "\n":
pass
else:
key = line.split()[0]
val = line.split()[1]
To strip comments, you could use str.partition() which works whether the comment sign is present or not in the line:
for line in file:
line, _, comment = line.partition('#')
if line.strip(): # non-blank line
key, value = line.split()
line.split() may raise an exception in this code too—it happens if there is a non-blank line that does not contain exactly two whitespace-separated words—it is application depended what you want to do in this case (ignore such lines, print warning, etc).
You need to ignore empty lines and lines starting with # splitting the remaining lines after either splitting on # or using rfind as below to slice the string, an empty line will have a new line so you need and line.strip() to check for one, you cannot just split on whitespace and unpack as you have more than two elements after splitting including what is in the comment:
with open("in.txt") as f:
d = dict(line[:line.rfind("#")].split() for line in f
if not line.startswith("#") and line.strip())
print(d)
Output:
{'Apples': '1', 'Oranges': '3', 'Bananas': '5'}
Another option is to split twice and slice:
with open("in.txt") as f:
d = dict(line.split(None,2)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
Or splitting twice and unpacking using an explicit loop:
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
k, v, _ = line.split(None, 2)
d[k] = v
You can also use itertools.groupby to group the lines you want.
from itertools import groupby
with open("in.txt") as f:
grouped = groupby(f, lambda x: not x.startswith("#") and x.strip())
d = dict(next(v).split(None, 2)[:2] for k, v in grouped if k)
print(d)
To handle where we have multiple words in single quotes we can use shlex to split:
import shlex
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
data = shlex.split(line)
d[data[0]] = data[1]
print(d)
So changing the Banana line to:
Bananas 'north-side disabled' # I want to ignore this comment too!
We get:
{'Apples': '1', 'Oranges': '3', 'Bananas': 'north-side disabled'}
And the same will work for the slicing:
with open("in.txt") as f:
d = dict(shlex.split(line)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
If the format of the file is correctly defined you can try a solution with regular expressions.
Here's just an idea:
import re
fruits = {}
with open('fruits_list.txt', mode='r') as f:
for line in f:
match = re.match("([a-zA-Z0-9]+)[\s]+([0-9]+).*", line)
if match:
fruit_name, fruit_amount = match.groups()
fruits[fruit_name] = fruit_amount
print fruits
UPDATED:
I changed the way of reading lines taking care of large files. Now I read line by line and not all in one. This improves the memory usage.

Why my code is recording into the file only when I run it second time?

My goal is to calculate amount of words. When I run my code I am suppose to:
read in strings from the file
split every line in words
add these words into the dictionary
sort keys and add them to the list
write the string that consists of keys and appropriate values into the file
When I run code for the first time it does not write anything in the file, but I see the result on my screen. The file is empty. Only when I run code second time I see content is recorded into the file.
Why is that happening?
#read in the file
fileToRead = open('../folder/strings.txt')
fileToWrite = open('../folder/count.txt', 'w')
d = {}
#iterate over every line in the file
for line in fileToRead:
listOfWords = line.split()
#iterate over every word in the list
for word in listOfWords:
if word not in d:
d[word] = 1
else:
d[word] = d.get(word) + 1
#sort the keys
listF = sorted(d)
#iterate over sorted keys and write them in the file with appropriate value
for word in listF:
string = "{:<18}\t\t\t{}\n".format(word, d.get(word))
print string
fileToWrite.write(string)
A minimalistic version:
import collections
with open('strings.txt') as f:
d = collections.Counter(s for line in f for s in line.split())
with open('count.txt', 'a') as f:
for word in sorted(d.iterkeys()):
string = "{:<18}\t\t\t{}\n".format(word, d[word])
print string,
f.write(string)
Couple changes, it think you meant 'a' (append to file) instead of 'w' overwrite file each time in open('count.txt', 'a'). Please also try to use with statement for reading and writing files, as it automatically closes the file descriptor after the read/write is done.
#read in the file
fileToRead = open('strings.txt')
d = {}
#iterate over every line in the file
for line in fileToRead:
listOfWords = line.split()
#iterate over every word in the list
for word in listOfWords:
if word not in d:
d[word] = 1
else:
d[word] = d.get(word) + 1
#sort the keys
listF = sorted(d)
#iterate over sorted keys and write them in the file with appropriate value
with open('count.txt', 'a') as fileToWrite:
for word in listF:
string = "{:<18}\t\t\t{}\n".format(word, d.get(word))
print string,
fileToWrite.write(string)
When you do file.write(some_data), it writes the data into a buffer but not into the file. It only saves the file to disk when you do file.close().
f = open('some_temp_file.txt', 'w')
f.write("booga boo!")
# nothing written yet to disk
f.close()
# flushes the buffer and writes to disk
The better way to do this would be to store the path in the variable, rather than the file object. Then you can open the file (and close it again) on demand.
read_path = '../folder/strings.txt'
write_path = '../folder/count.txt'
This also allows you to use the with keyword, which handles file opening and closing much more elegantly.
read_path = '../folder/strings.txt'
write_path = '../folder/count.txt'
d = dict()
with open(read_path) as inf:
for line in inf:
for word in line.split()
d[word] = d.get(word, 0) + 1
# remember dict.get's default value! Saves a conditional
# since we've left the block, `inf` is closed by here
sorted_words = sorted(d)
with open(write_path, 'w') as outf:
for word in sorted_words:
s = "{:<18}\t\t\t{}\n".format(word, d.get(word))
# don't shadow the stdlib `string` module
# also: why are you using both fixed width AND tab-delimiters in the same line?
print(s) # not sure why you're doing this, but okay...
outf.write(s)
# since we leave the block, the file closes automagically.
That said, there's a couple things you could do to make this a little better in general. First off: counting how many of something are in a container is a job for a collections.Counter.
In [1]: from collections import Counter
In [2]: Counter('abc')
Out[2]: Counter({'a': 1, 'b': 1, 'c': 1})
and Counters can be added together with the expected behavior
In [3]: Counter('abc') + Counter('cde')
Out[3]: Counter({'c': 2, 'a': 1, 'b': 1, 'd': 1, 'e': 1})
and also sorted the same way you'd sort a dictionary with keys
In [4]: sorted((Counter('abc') + Counter('cde')).items(), key=lambda kv: kv[0])
Out[4]: [('a', 1), ('b', 1), ('c', 2), ('d', 1), ('e', 1)]
Put those all together and you could do something like:
from collections import Counter
read_path = '../folder/strings.txt'
write_path = '../folder/count.txt'
with open(read_path) as inf:
results = sum([Counter(line.split()) for line in inf])
with open(write_path, 'w') as outf:
for word, count in sorted(results.items(), key=lambda kv: kv[0]):
s = "{:<18}\t\t\t{}\n".format(word, count)
outf.write(s)

Python- how to convert lines in a .txt file to dictionary elements?

Say I have a file "stuff.txt" that contains the following on separate lines:
q:5
r:2
s:7
I want to read each of these lines from the file, and convert them to dictionary elements, the letters being the keys and the numbers the values.
So I would like to get
y ={"q":5, "r":2, "s":7}
I've tried the following, but it just prints an empty dictionary "{}"
y = {}
infile = open("stuff.txt", "r")
z = infile.read()
for line in z:
key, value = line.strip().split(':')
y[key].append(value)
print(y)
infile.close()
try this:
d = {}
with open('text.txt') as f:
for line in f:
key, value = line.strip().split(':')
d[key] = int(value)
You are appending to d[key] as if it was a list. What you want is to just straight-up assign it like the above.
Also, using with to open the file is good practice, as it auto closes the file after the code in the 'with block' is executed.
There are some possible improvements to be made. The first is using context manager for file handling - that is with open(...) - in case of exception, this will handle all the needed tasks for you.
Second, you have a small mistake in your dictionary assignment: the values are assigned using = operator, such as dict[key] = value.
y = {}
with open("stuff.txt", "r") as infile:
for line in infile:
key, value = line.strip().split(':')
y[key] = (value)
print(y)
Python3:
with open('input.txt', 'r', encoding = "utf-8") as f:
for line in f.readlines():
s=[] #converting strings to list
for i in line.split(" "):
s.append(i)
d=dict(x.strip().split(":") for x in s) #dictionary comprehension: converting list to dictionary
e={a: int(x) for a, x in d.items()} #dictionary comprehension: converting the dictionary values from string format to integer format
print(e)

How to convert a file into a dictionary?

I have a file comprising two columns, i.e.,
1 a
2 b
3 c
I wish to read this file to a dictionary such that column 1 is the key and column 2 is the value, i.e.,
d = {1:'a', 2:'b', 3:'c'}
The file is small, so efficiency is not an issue.
d = {}
with open("file.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
This will leave the key as a string:
with open('infile.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
You can also use a dict comprehension like:
with open("infile.txt") as f:
d = {int(k): v for line in f for (k, v) in [line.strip().split(None, 1)]}
def get_pair(line):
key, sep, value = line.strip().partition(" ")
return int(key), value
with open("file.txt") as fd:
d = dict(get_pair(line) for line in fd)
By dictionary comprehension
d = { line.split()[0] : line.split()[1] for line in open("file.txt") }
Or By pandas
import pandas as pd
d = pd.read_csv("file.txt", delimiter=" ", header = None).to_dict()[0]
Simple Option
Most methods for storing a dictionary use JSON, Pickle, or line reading. Providing you're not editing the dictionary outside of Python, this simple method should suffice for even complex dictionaries. Although Pickle will be better for larger dictionaries.
x = {1:'a', 2:'b', 3:'c'}
f = 'file.txt'
print(x, file=open(f,'w')) # file.txt >>> {1:'a', 2:'b', 3:'c'}
y = eval(open(f,'r').read())
print(x==y) # >>> True
If you love one liners, try:
d=eval('{'+re.sub('\'[\s]*?\'','\':\'',re.sub(r'([^'+input('SEP: ')+',]+)','\''+r'\1'+'\'',open(input('FILE: ')).read().rstrip('\n').replace('\n',',')))+'}')
Input FILE = Path to file, SEP = Key-Value separator character
Not the most elegant or efficient way of doing it, but quite interesting nonetheless :)
IMHO a bit more pythonic to use generators (probably you need 2.7+ for this):
with open('infile.txt') as fd:
pairs = (line.split(None) for line in fd)
res = {int(pair[0]):pair[1] for pair in pairs if len(pair) == 2 and pair[0].isdigit()}
This will also filter out lines not starting with an integer or not containing exactly two items
I had a requirement to take values from text file and use as key value pair. i have content in text file as key = value, so i have used split method with separator as "=" and
wrote below code
d = {}
file = open("filename.txt")
for x in file:
f = x.split("=")
d.update({f[0].strip(): f[1].strip()})
By using strip method any spaces before or after the "=" separator are removed and you will have the expected data in dictionary format
import re
my_file = open('file.txt','r')
d = {}
for i in my_file:
g = re.search(r'(\d+)\s+(.*)', i) # glob line containing an int and a string
d[int(g.group(1))] = g.group(2)
Here's another option...
events = {}
for line in csv.reader(open(os.path.join(path, 'events.txt'), "rb")):
if line[0][0] == "#":
continue
events[line[0]] = line[1] if len(line) == 2 else line[1:]

Categories

Resources