Adding characters from an external file to a dictionary - python

In python, how would i select a single character from a txt document that contains the following:
A#
M*
N%
(on seperate lines)...and then update a dictionary with the letter as the key and the symbol as the value.
The closest i have got is:
ftwo = open ("clues.txt", "r")
for lines in ftwo.readlines():
for char in lines:
I'm pretty new to coding so cant work it out!

Supposing that each line contains extactly two characters (first the key, then the value):
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f}
If you have empty lines in your input file, you can filter these out:
with open('clues.txt', 'r') as f:
myDict = {a[0]: a[1] for a in f if a.strip()}

First, you'll want to read each line one at a time:
my_dict = {}
with open ("clues.txt", "r") as ftwo:
for line in ftwo:
# Then, you'll want to put your elements in a dict
my_dict[line[0]] = line[1]

Related

left padding with python

I have following data and link combination of 100000 entries
dn:id=2150fccc-beb8-42f8-b201-182a6bf5ddfe,ou=test,dc=com
link:545214569
dn:id=ffa55959-457d-49e6-b4cf-a34eff8bbfb7,ou=test,dc=com
link:32546897
dn:id=3452a4c3-b768-43f5-8f1e-d33c14787b9b,ou=test,dc=com
link:6547896541
I am trying to write a program in python 2.7 to add left padding zeros if value of link is less than 10 .
Eg:
545214569 --> 0545214569
32546897 --> 0032546897
can you please guide me what am i doing wrong with the following program :
with open("test.txt", "r") as f:
line=f.readline()
line1=f.readline()
wordcheck = "link"
wordcheck1= "dn"
for wordcheck1 in line1:
with open("pad-link.txt", "a") as ff:
for wordcheck in line:
with open("pad-link.txt", "a") as ff:
key, val = line.strip().split(":")
val1 = val.strip().rjust(10,'0')
line = line.replace(val,val1)
print (line)
print (line1)
ff.write(line1 + "\n")
ff.write('%s:%s \n' % (key, val1))
The usual pythonic way to pad values in Python is by using string formatting and the Format Specification Mini Language
link = 545214569
print('{:0>10}'.format(link))
Your for wordcheck1 in line1: and for workcheck in line: aren't doing what you think. They iterate one character at a time over the lines and assign that character to the workcheck variable.
If you only want to change the input file to have leading zeroes, this can be simplified as:
import re
# Read the whole file into memory
with open('input.txt') as f:
data = f.read()
# Replace all instances of "link:<digits>", passing the digits to a function that
# formats the replacement as a width-10 field, right-justified with zeros as padding.
data = re.sub(r'link:(\d+)', lambda m: 'link:{:0>10}'.format(m.group(1)), data)
with open('output.txt','w') as f:
f.write(data)
output.txt:
dn:id=2150fccc-beb8-42f8-b201-182a6bf5ddfe,ou=test,dc=com
link:0545214569
dn:id=ffa55959-457d-49e6-b4cf-a34eff8bbfb7,ou=test,dc=com
link:0032546897
dn:id=3452a4c3-b768-43f5-8f1e-d33c14787b9b,ou=test,dc=com
link:6547896541
i don't know why you have to open many times. Anyway, open 1 time, then for each line, split by :. the last element in list is the number. Then you know what lenght the digits should consistently b, say 150, then use zfill to padd the 0. then put the lines back by using join
for line in f.readlines():
words = line.split(':')
zeros = 150-len(words[-1])
words[-1] = words[-1].zfill(zeros)
newline = ':'.join(words)
# write this line to file

Read multiple files, search for string and store in a list

I am trying to search through a list of files, look for the words 'type' and the following word. then put them into a list with the file name. So for example this is what I am looking for.
File Name, Type
[1.txt, [a, b, c]]
[2.txt, [a,b]]
My current code returns a list for every type.
[1.txt, [a]]
[1.txt, [b]]
[1.txt, [c]]
[2.txt, [a]]
[2.txt, [b]]
Here is my code, i know my logic will return a single value into the list but I'm not sure how to edit it to it will just be the file name with a list of types.
output = []
for file_name in find_files(d):
with open(file_name, 'r') as f:
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
output.append([file_name, match])
Learn to categorize your actions at the proper loop level.
In this case, you say that you want to accumulate all of the references into a single list, but then your code creates one output line per reference, rather than one per file. Change that focus:
with open(file_name, 'r') as f:
ref_list = []
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
ref_list.append(match)
# Once you've been through the entire file,
# THEN you add a line for that file,
# with the entire reference list
output.append([file_name, ref_list])
You might find it useful to use a dict here instead
output = {}
for file_name in find_files(d):
with open(file_name, 'r') as f:
output[file_name] = []
for line in f:
line = line.lower().strip()
match = re.findall('type ([a-z]+)', line)
if match:
output[file_name].append(*match)

Text to dictionary doesn't work

I have the following text file in the same folder as my Python Code.
78459581
Black Ballpoint Pen
12345670
Football
49585922
Perfume
83799715
Shampoo
I have written this Python code.
file = open("ProductDatabaseEdit.txt", "r")
d = {}
for line in file:
x = line.split("\n")
a=x[0]
b=x[1]
d[a]=b
print(d)
This is the result I receive.
b=x[1] # IndexError: list index out of range
My dictionary should appear as follows:
{"78459581" : "Black Ballpoint Pen"
"12345670" : "Football"
"49585922" : "Perfume"
"83799715" : "Shampoo"}
What am I doing wrong?
A line is terminated by a linebreak, thus line.split("\n") will never give you more than one line.
You could cheat and do:
for first_line in file:
second_line = next(file)
You can simplify your solution by using a dictionary generator, this is probably the most pythonic solution I can think of:
>>> with open("in.txt") as f:
... my_dict = dict((line.strip(), next(f).strip()) for line in f)
...
>>> my_dict
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}
Where in.txt contains the data as described in the problem. It is necessary to strip() each line otherwise you would be left with a trailing \n character for your keys and values.
You need to strip the \n, not split
file = open("products.txt", "r")
d = {}
for line in file:
a = line.strip()
b = file.next().strip()
# next(file).strip() # if using python 3.x
d[a]=b
print(d)
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}
What's going on
When you open a file you get an iterator, which will give you one line at a time when you use it in a for loop.
Your code is iterating over the file, splitting every line in a list with \n as the delimiter, but that gives you a list with only one item: the same line you already had. Then you try to access the second item in the list, which doesn't exist. That's why you get the IndexError: list index out of range.
How to fix it
What you need is this:
file = open('products.txt','r')
d = {}
for line in file:
d[line.strip()] = next(file).strip()
In every loop you add a new key to the dictionary (by assigning a value to a key that didn't exist yet) and assign the next line as the value. The next() function is just telling to the file iterator "please move on to the next line". So, to drive the point home: in the first loop you set first line as a key and assign the second line as the value; in the second loop iteration, you set the third line as a key and assign the fourth line as the value; and so on.
The reason you need to use the .strip() method every time, is because your example file had a space at the end of every line, so that method will remove it.
Or...
You can also get the same result using a dictionary comprehension:
file = open('products.txt','r')
d = {line.strip():next(file).strip() for line in file}
Basically, is a shorter version of the same code above. It's shorter, but less readable: not necessarily something you want (a matter of taste).
In my solution i tried to not use any loops. Therefore, I first load the txt data with pandas:
import pandas as pd
file = pd.read_csv("test.txt", header = None)
Then I seperate keys and values for the dict such as:
keys, values = file[0::2].values, file[1::2].values
Then, we can directly zip these two as lists and create a dict:
result = dict(zip(list(keys.flatten()), list(values.flatten())))
To create this solution I used the information as provided in [question]: How to remove every other element of an array in python? (The inverse of np.repeat()?) and in [question]: Map two lists into a dictionary in Python
You can loop over a list two items at a time:
file = open("ProductDatabaseEdit.txt", "r")
data = file.readlines()
d = {}
for line in range(0,len(data),2):
d[data[i]] = data[i+1]
Try this code (where the data is in /tmp/tmp5.txt):
#!/usr/bin/env python3
d = dict()
iskey = True
with open("/tmp/tmp5.txt") as infile:
for line in infile:
if iskey:
_key = line.strip()
else:
_value = line.strip()
d[_key] = _value
iskey = not iskey
print(d)
Which gives you:
{'12345670': 'Football', '49585922': 'Perfume', '78459581': 'Black Ballpoint Pen', '83799715': 'Shampoo'}

How to fetch these below values in python

I have a file in the below format
.aaa b/b
.ddd e/e
.fff h/h
.lop m/n
I'm trying to read this file. My desired output is if I find ".aaa" I should get b/b, if I find ".ddd" I should get e/e and so on.
I know how to fetch 1st column and 2nd column but I don't know how to compare them and fetch the value. This is what I've written.
file = open('some_file.txt')
for line in file:
fields = line.strip().split()
print (fields[0]) #This will give 1st column
print (fields[1]) # This will give 2nd column
This is not the right way of doing things. What approach follow?
Any time you want to do lookups, a dictionary is going to be your friend.
You could write a function to load the data into a dictionary:
def load_data(filename):
result = dict()
with open(filename, 'r') as f:
for line in f:
k,v = line.strip().split() # will fail if not exactly 2 fields
result[k] = v
return result
And then use it to perform your lookups like this:
data = load_data('foo.txt')
print data['.aaa']
It sounds like what you may want is to build a dictionary mapping column 1 to column 2. You could try:
file = open('some_file.txt')
field_dict = {}
for line in file:
fields = line.strip().split()
field_dict[fields[0]] = fields[1]
Then in your other code, when you see '.ddd' you can simply get the reference from the dictionary (e.g. field_dict['.ddd'] should return 'e/e')
Just do splitting on each line according to the spaces and check whether the first item matches the word you gave. If so then do printing the second item from the list.
word = input("Enter the word to search : ")
with open(file) as f:
for line in f:
m = line.strip().split()
if m[0] == word:
print m[1]

Iterate over a CSV file Python

I have a CSV file that looks like this
a,b,c
d1,g4,4m
t,35,6y
mm,5,m
I'm trying to replace all the m's and y's preceded by a number with 'month' and 'year' respectively. I'm using the following script.
import re,csv
out = open ("out.csv", "wb")
file = "in.csv"
with open(file, 'r') as f:
reader = csv.reader(f)
for ss in reader:
s = str(ss)
month_pair = (re.compile('(\d\s*)m'), 'months')
year_pair = (re.compile('(\d\s*)y'), 'years')
def substitute(s, pairs):
for (pattern, substitution) in pairs:
match = pattern.search(s)
if match:
s = pattern.sub(match.group(1)+substitution, s)
return s
pairs = [month_pair, year_pair]
print (substitute(s, pairs))
It does replace but it does that only on the last row, ignoring the ones before it. How can I have it iterate over all the rows and write to another csv file?
You can use positive look-behind :
>>> re.sub(r'(?<=\d)m','months',s)
'a,b,c\nd1,g4,4months\nt,35,6y\nmm,5,m'
>>> re.sub(r'(?<=\d)y','years',s)
'a,b,c\nd1,g4,4m\nt,35,6years\nmm,5,m'
In this line
print (substitute(s, pairs))
your variable s is only the last line in your file. Note how you update s in your file reading to be the current line.
Solutions (choose one):
You could try another for-loop to iterate over all lines.
Or move the substitution into the for-loop where you read the lines of the file. This is definitely the better solution!
You can easily lookup how to write a new file or change the file you are working on.

Categories

Resources