I'm trying to read below text file in python, I'm struggling to get as key value in output but its not working as expected:
test.txt
productId1 ProdName1,ProdPrice1,ProdDescription1,ProdDate1
productId2 ProdName2,ProdPrice2,ProdDescription2,ProdDate2
productId3 ProdName3,ProdPrice3,ProdDescription3,ProdDate3
productId4 ProdName4,ProdPrice4,ProdDescription4,ProdDate4
myPython.py
import sys
with open('test.txt') as f
lines = list(line.split(' ',1) for line in f)
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))
I'm trying to parse the text file and trying to print key and value separately. Looks like I'm doing something wrong here. Need some help to fix this?
Thanks!
You're needlessly storing a list.
Loop, split and print
with open('test.txt') as f:
for line in f:
k, v = line.rstrip().split(' ',1)
print("Key : {0}, Value: {1}".format(k,v))
This should work, with a list comprehension:
with open('test.txt') as f:
lines = [line.split(' ',1) for line in f]
for k, v in lines:
print("Key: {0}, Value: {1}".format(k, v))
You can make a dict right of the bat with a dict comp and than iterate the list to print as you wanted. What you had done was create a list, which does not have an items() method.
with open('notepad.txt') as f:
d = {line.split(' ')[0]:line.split(' ')[1] for line in f}
for k,v in d.items():
print("Key : {0}, Value: {1}".format(k,v))
lines is a list of lists, so the good way to finish the job is:
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f)
for k,v in lines:
print("Key : {0}, Value: {1}".format(k,v))
Perhaps I am reading too much into your description but I see one key, a space and a comma limited name of other fields. If I interpret that as their being data for those items that is comma limited then I would conclude you want a dictionary of dictionaries. That would lead to code like:
data_keys = 'ProdName', 'ProdPrice', 'ProdDescription', 'ProdDate'
with open('test.txt') as f:
for line in f:
id, values = l.strip().split() # automatically on white space
keyed_values = zip(data_keys, values.split(','))
print(dict([('key', id)] + keyed_values))
You can use the f.readlines() function that returns a list of lines in the file f. I changed the code to include f.lines in line 3.
import sys
with open('test.txt') as f:
lines = list(line.split(' ',1) for line in f.readlines())
for k,v in lines.items();
print("Key : {0}, Value: {1}".format(k,v))
Related
Working on file with thousands of line
trying to find which line is duplicated exactly ( 2 times )
from collections import Counter
with open('log.txt') as f:
string = f.readlines()
c = Counter(string)
print c
it give me the result of all duplicated lines but i need to get the repeated line (2 times only)
You're printing all the strings and not just the repeated ones, to print only the ones which are repeated twice, you can print the strings which have a count of two.
from collections import Counter
with open('log.txt') as f:
string = f.readlines()
c = Counter(string)
for line, count in c.items():
if count==2:
print(line)
The Counter Object also provides information about how often a line occurs.
You can filter it using e.g. list comprehension.
This will print all lines, that occur exactly two times in the file
with open('log.txt') as f:
string = f.readlines()
print([k for k,v in Counter(string).items() if v == 2])
If you want to have all repeated lines (lines duplicated two or more times)
with open('log.txt') as f:
string = f.readlines()
print([k for k,v in Counter(string).items() if v > 1])
You could use Counter.most_common i.e.
from collections import Counter
with open('log.txt') as f:
c = Counter(f)
print(c.most_common(1))
This prints the Counter entry with the highest count.
So I got this text file looking like this:
PID TTY TIME CMD
1000 pts/2 00:00:00 aash
9000 pts/2 00:00:00 bash
3000 pts/2 00:00:00 cash
What I want to end up with is some kind of dictionary where I save |(PID,CMD)| sorted by PID descending.
So it would look like this:
[(9000,bash),(3000,cash),(1000,aash)]
Any Ideas?
This is how I read the file and save in dictionary.
dict = {}
with open('newfile.txt') as f:
next(f) #skipping first line
for line in f:
result[line.split()[3]] = int(line.split()[0])
Appreciate any kind of help! Thanks in advance !
So this is the solution:
import collections
result = {}
with open('newfile.txt') as f:
next(f)
for line in f:
result[line.split()[3]] = int(line.split()[0])
print(collections.OrderedDict(sorted(result.items(), key=lambda t: t[1])))
This is what it prints out:
OrderedDict([('aash', 1000), ('cash', 3000), ('bash', 9000)])])
If you need to end up with a list, then best is to read the data into a list and then to sort it, here is how:
lst = []
with open('newfile.txt') as f:
next(f)
for line in f:
if line.split() != '': # watch out for empty lines
a, b, c, d = line.split()
lst.append((int(a), d))
lst = sorted(lst)
print(lst)
====
[(1000, 'aash'), (3000, 'cash'), (9000, 'bash')]
sorted() sorts by the first item on the tuple, so you can use it in its basic form.
If what you need is a dictionary where the keys are sorted, then you can use OrderedDict, just import it and add another line to the code:
from collections import OrderedDict
and then
d = OrderedDict(lst)
print(d)
And here is the result:
OrderedDict([(1000, 'aash'), (3000, 'cash'), (9000, 'bash')])
I am trying to put the following text file into a dictionary but I would like any section starting with '#' or empty lines ignored.
My text file looks something like this:
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
My desired output would be:
myVariables = {'Apples': 1, 'Oranges': 3, 'Bananas': 5}
My Python code reads as follows:
filename = "myFile.txt"
myVariables = {}
with open(filename) as f:
for line in f:
if line.startswith('#') or not line:
next(f)
key, val = line.split()
myVariables[key] = val
print "key: " + str(key) + " and value: " + str(val)
The error I get:
Traceback (most recent call last):
File "C:/Python27/test_1.py", line 11, in <module>
key, val = line.split()
ValueError: need more than 1 value to unpack
I understand the error but I do not understand what is wrong with the code.
Thank you in advance!
Given your text:
text = """
# This is my header info followed by an empty line
Apples 1 # I want to ignore this comment
Oranges 3 # I want to ignore this comment
#~*~*~*~*~*~*~*Another comment~*~*~*~*~*~*~*~*~*~*
Bananas 5 # I want to ignore this comment too!
"""
We can do this in 2 ways. Using regex, or using Python generators. I would choose the latter (described below) as regex is not particularly fast(er) in such cases.
To open the file:
with open('file_name.xyz', 'r') as file:
# everything else below. Just substitute `for line in lines` with
# `for line in file.readline()`
Now to create a similar, we split the lines, and create a list:
lines = text.split('\n') # as if read from a file using `open`.
Here is how we do all you want in a couple of lines:
# Discard all comments and empty values.
comment_less = filter(None, (line.split('#')[0].strip() for line in lines))
# Separate items and totals.
separated = {item.split()[0]: int(item.split()[1]) for item in comment_less}
Lets test:
>>> print(separated)
{'Apples': 1, 'Oranges': 3, 'Bananas': 5}
Hope this helps.
This doesn't exactly reproduce your error, but there's a problem with your code:
>>> x = "Apples\t1\t# This is a comment"
>>> x.split()
['Apples', '1', '#', 'This', 'is', 'a', 'comment']
>>> key, val = x.split()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Instead try:
key = line.split()[0]
val = line.split()[1]
Edit: and I think your "need more than 1 value to unpack" is coming from the blank lines. Also, I'm not familiar with using next() like this. I guess I would do something like:
if line.startswith('#') or line == "\n":
pass
else:
key = line.split()[0]
val = line.split()[1]
To strip comments, you could use str.partition() which works whether the comment sign is present or not in the line:
for line in file:
line, _, comment = line.partition('#')
if line.strip(): # non-blank line
key, value = line.split()
line.split() may raise an exception in this code too—it happens if there is a non-blank line that does not contain exactly two whitespace-separated words—it is application depended what you want to do in this case (ignore such lines, print warning, etc).
You need to ignore empty lines and lines starting with # splitting the remaining lines after either splitting on # or using rfind as below to slice the string, an empty line will have a new line so you need and line.strip() to check for one, you cannot just split on whitespace and unpack as you have more than two elements after splitting including what is in the comment:
with open("in.txt") as f:
d = dict(line[:line.rfind("#")].split() for line in f
if not line.startswith("#") and line.strip())
print(d)
Output:
{'Apples': '1', 'Oranges': '3', 'Bananas': '5'}
Another option is to split twice and slice:
with open("in.txt") as f:
d = dict(line.split(None,2)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
Or splitting twice and unpacking using an explicit loop:
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
k, v, _ = line.split(None, 2)
d[k] = v
You can also use itertools.groupby to group the lines you want.
from itertools import groupby
with open("in.txt") as f:
grouped = groupby(f, lambda x: not x.startswith("#") and x.strip())
d = dict(next(v).split(None, 2)[:2] for k, v in grouped if k)
print(d)
To handle where we have multiple words in single quotes we can use shlex to split:
import shlex
with open("in.txt") as f:
d = {}
for line in f:
if not line.startswith("#") and line.strip():
data = shlex.split(line)
d[data[0]] = data[1]
print(d)
So changing the Banana line to:
Bananas 'north-side disabled' # I want to ignore this comment too!
We get:
{'Apples': '1', 'Oranges': '3', 'Bananas': 'north-side disabled'}
And the same will work for the slicing:
with open("in.txt") as f:
d = dict(shlex.split(line)[:2] for line in f
if not line.startswith("#") and line.strip())
print(d)
If the format of the file is correctly defined you can try a solution with regular expressions.
Here's just an idea:
import re
fruits = {}
with open('fruits_list.txt', mode='r') as f:
for line in f:
match = re.match("([a-zA-Z0-9]+)[\s]+([0-9]+).*", line)
if match:
fruit_name, fruit_amount = match.groups()
fruits[fruit_name] = fruit_amount
print fruits
UPDATED:
I changed the way of reading lines taking care of large files. Now I read line by line and not all in one. This improves the memory usage.
Say I have a file "stuff.txt" that contains the following on separate lines:
q:5
r:2
s:7
I want to read each of these lines from the file, and convert them to dictionary elements, the letters being the keys and the numbers the values.
So I would like to get
y ={"q":5, "r":2, "s":7}
I've tried the following, but it just prints an empty dictionary "{}"
y = {}
infile = open("stuff.txt", "r")
z = infile.read()
for line in z:
key, value = line.strip().split(':')
y[key].append(value)
print(y)
infile.close()
try this:
d = {}
with open('text.txt') as f:
for line in f:
key, value = line.strip().split(':')
d[key] = int(value)
You are appending to d[key] as if it was a list. What you want is to just straight-up assign it like the above.
Also, using with to open the file is good practice, as it auto closes the file after the code in the 'with block' is executed.
There are some possible improvements to be made. The first is using context manager for file handling - that is with open(...) - in case of exception, this will handle all the needed tasks for you.
Second, you have a small mistake in your dictionary assignment: the values are assigned using = operator, such as dict[key] = value.
y = {}
with open("stuff.txt", "r") as infile:
for line in infile:
key, value = line.strip().split(':')
y[key] = (value)
print(y)
Python3:
with open('input.txt', 'r', encoding = "utf-8") as f:
for line in f.readlines():
s=[] #converting strings to list
for i in line.split(" "):
s.append(i)
d=dict(x.strip().split(":") for x in s) #dictionary comprehension: converting list to dictionary
e={a: int(x) for a, x in d.items()} #dictionary comprehension: converting the dictionary values from string format to integer format
print(e)
I have a file comprising two columns, i.e.,
1 a
2 b
3 c
I wish to read this file to a dictionary such that column 1 is the key and column 2 is the value, i.e.,
d = {1:'a', 2:'b', 3:'c'}
The file is small, so efficiency is not an issue.
d = {}
with open("file.txt") as f:
for line in f:
(key, val) = line.split()
d[int(key)] = val
This will leave the key as a string:
with open('infile.txt') as f:
d = dict(x.rstrip().split(None, 1) for x in f)
You can also use a dict comprehension like:
with open("infile.txt") as f:
d = {int(k): v for line in f for (k, v) in [line.strip().split(None, 1)]}
def get_pair(line):
key, sep, value = line.strip().partition(" ")
return int(key), value
with open("file.txt") as fd:
d = dict(get_pair(line) for line in fd)
By dictionary comprehension
d = { line.split()[0] : line.split()[1] for line in open("file.txt") }
Or By pandas
import pandas as pd
d = pd.read_csv("file.txt", delimiter=" ", header = None).to_dict()[0]
Simple Option
Most methods for storing a dictionary use JSON, Pickle, or line reading. Providing you're not editing the dictionary outside of Python, this simple method should suffice for even complex dictionaries. Although Pickle will be better for larger dictionaries.
x = {1:'a', 2:'b', 3:'c'}
f = 'file.txt'
print(x, file=open(f,'w')) # file.txt >>> {1:'a', 2:'b', 3:'c'}
y = eval(open(f,'r').read())
print(x==y) # >>> True
If you love one liners, try:
d=eval('{'+re.sub('\'[\s]*?\'','\':\'',re.sub(r'([^'+input('SEP: ')+',]+)','\''+r'\1'+'\'',open(input('FILE: ')).read().rstrip('\n').replace('\n',',')))+'}')
Input FILE = Path to file, SEP = Key-Value separator character
Not the most elegant or efficient way of doing it, but quite interesting nonetheless :)
IMHO a bit more pythonic to use generators (probably you need 2.7+ for this):
with open('infile.txt') as fd:
pairs = (line.split(None) for line in fd)
res = {int(pair[0]):pair[1] for pair in pairs if len(pair) == 2 and pair[0].isdigit()}
This will also filter out lines not starting with an integer or not containing exactly two items
I had a requirement to take values from text file and use as key value pair. i have content in text file as key = value, so i have used split method with separator as "=" and
wrote below code
d = {}
file = open("filename.txt")
for x in file:
f = x.split("=")
d.update({f[0].strip(): f[1].strip()})
By using strip method any spaces before or after the "=" separator are removed and you will have the expected data in dictionary format
import re
my_file = open('file.txt','r')
d = {}
for i in my_file:
g = re.search(r'(\d+)\s+(.*)', i) # glob line containing an int and a string
d[int(g.group(1))] = g.group(2)
Here's another option...
events = {}
for line in csv.reader(open(os.path.join(path, 'events.txt'), "rb")):
if line[0][0] == "#":
continue
events[line[0]] = line[1] if len(line) == 2 else line[1:]