Python: read a text file into a dictionary - python

I really hope you can help me since I'm quite new to Python.
I have a simple text file, without any columns, just rows. Something like this:
Bob
Opel
Mike
Ford
Rodger
Renault
Mary
Volkswagen
Note that in the text file the names and the cars are without the additional enter. I had to this, otherwise, StackOverflow would project the names next to each other.
The idea is to create a dictionary out of the text file to get a format like this:
{[Bob : Opel], [Mike : Ford], [Rodger : Renault], [Mary : Volkswagen]}
Can you guys help me out and give an example on how to do this? Would be much appreciated!

you can open a file and iterate through the lines with readlines
lines, dict = open('file.dat', 'r').readlines(), {}
for i in range(len(lines)/2):
dict[lines[i*2]] = lines[i*2+1]
print dict

with open('test.txt', 'r') as file_:
dict = {}
array = list(filter(lambda x: x != '', file_.read().split('\n')))
for i in range(0, len(array), 2):
dict[array[i]] = array[i+1]
print(dict)

Related

Python: Writing peoples scores to individual lines

I have a task where I need to record peoples scores in a text file. My Idea was to set it out like this:
Jon: 4, 1, 3
Simon: 1, 3, 6
This has the name they inputted along with their 3 last scores (Only 3 should be recorded).
Now for my question; Can anyone point me in the right direction to do this? Im not asking for you to write my code for me, Im simply asking for some tips.
Thanks.
Edit: Im guessing it would look something like this: I dont know how I'd add scores after their first though like above.
def File():
score = str(Name) + ": " + str(correct)
File = open('Test.txt', 'w+')
File.write(score)
File.close()
Name = input("Name: ")
correct = input("Number: ")
File()
You could use pandas to_csv() function and store your data in a dictionary. It will be much easier than creating your own format.
from pandas import DataFrame, read_csv
import pandas as pd
def tfile(names):
df = DataFrame(data = names, columns = names.keys())
with open('directory','w') as f:
f.write(df.to_string(index=False, header=True))
names = {}
for i in xrange(num_people):
name = input('Name: ')
if name not in names:
names[name] = []
for j in xrange(3):
score = input('Score: ')
names[name].append(score)
tfile(names)
Simon Jon
1 4
3 1
6 3
This should meet your text requirement now. It converts it to a string and then writes the string to the .txt file. If you need to read it back in you can use pandas read_table(). Here's a link if you want to read about it.
Since you are not asking for the exact code, here is an idea and some pointers
Collect the last three scores per person in a list variable called last_three
do something like:
",".join(last_three) #this gives you the format 4,1,3 etc
write to file an entry such as
name + ":" + ",".join(last_three)
You'll need to do this for each "line" you process
I'd recommend using with clause to open the file in write mode and process your data (as opposed to just an "open" clause) since with handles try/except/finally problems of opening/closing file handles...So...
with open(my_file_path, "w") as f:
for x in my_formatted_data:
#assuming x is a list of two elements name and last_three elems (example: [Harry, [1,4,5]])
name, last_three = x
f.write(name + ":" + ",".join(last_three))
f.write("\n")# a new line
In this way you don't really need to open/close file as with clause takes care of it for you

Comparing csv files in python to see what is in both

I have 2 csv files that I want to compare one of which is a master file of all the countries and then another one that has only a few countries. This is an attempt I made for some rudimentary testing:
char = {}
with open('all.csv', 'rb') as lookupfile:
for number, line in enumerate(lookupfile):
chars[line.strip()] = number
with open('locations.csv') as textfile:
text = textfile.read()
print text
for char in text:
if char in chars:
print("Country found {0} found in row {1}".format(char, chars[char]))
I am trying to get a final output of the master file of countries with a secondary column indicating if it came up in the other list
Thanks !
Try this:
Write a function to turn the CSV into a Python dictionary containing as keys each of the country you found in the CSV. It can just look like this:
{'US':True, 'UK':True}
Do this for both CSV files.
Now, iterate over the dictionary.keys() for the csv you're comparing against, and just check to see if the other dictionary has the same key.
This will be an extremely fast algorithm because dictionaries give us constant time lookup, and you have a data structure which you can easily use to see which countries you found.
As Eric mentioned in comments, you can also use set membership to handle this. This may actually be the simpler, better way to do this:
set1 = set() # A new empty set
set1.add("country")
if country in set:
#do something
You could use exactly the same logic as the original loop:
with open('locations.csv') as textfile:
for line in textfile:
if char.strip() in chars:
print("Country found {0} found in row {1}".format(char, chars[char]))

Find and replace strings in Excel (.xlsx) using Python

I am trying to replace a bunch of strings in an .xlsx sheet (~70k rows, 38 columns). I have a list of the strings to be searched and replaced in a file, formatted as below:-
bird produk - bird product
pig - pork
ayam - chicken
...
kuda - horse
The word to be searched is on the left, and the replacement is on the right (find 'bird produk', replace with 'bird product'. My .xlsx sheet looks something like this:-
name type of animal ID
ali pig 3483
abu kuda 3940
ahmad bird produk 0399
...
ahchong pig 2311
I am looking for the fastest solution for this, since I have around 200 words in the list to be searched, and the .xlsx file is quite large. I need to use Python for this, but I am open to any other faster solutions.
Edit:- added sheet example
Edit2:- tried some python codes to read the cells, took quite a long time to read. Any pointers?
from xlrd import open_workbook
wb = open_workbook('test.xlsx')
for s in wb.sheets():
print ('Sheet:',s.name)
for row in range(s.nrows):
values = []
for col in range(s.ncols):
print(s.cell(row,col).value)
Thank you!
Edit3:- I finally figured it out. Both VBA module and Python codes work. I resorted to .csv instead to make things easier. Thank you! Here is my version of the Python code:-
import csv
###### our dictionary with our key:values. ######
reps = {
'JUALAN (PRODUK SHJ)' : 'SALE( PRODUCT)',
'PAMERAN' : 'EXHIBITION',
'PEMBIAKAN' : 'BREEDING',
'UNGGAS' : 'POULTRY'}
def replace_all(text, dic):
for i, j in reps.items():
text = text.replace(i, j)
return text
with open('test.csv','r') as f:
text=f.read()
text=replace_all(text,reps)
with open('file2.csv','w') as w:
w.write(text)
I would copy the contents of your text file into a new worksheet in the excel file and name that sheet "Lookup." Then use text to columns to get the data in the first two columns of this new sheet starting in the first row.
Paste the following code into a module in Excel and run it:
Sub Replacer()
Dim w1 As Worksheet
Dim w2 As Worksheet
'The sheet with the words from the text file:
Set w1 = ThisWorkbook.Sheets("Lookup")
'The sheet with all of the data:
Set w2 = ThisWorkbook.Sheets("Data")
For i = 1 To w1.Range("A1").CurrentRegion.Rows.Count
w2.Cells.Replace What:=w1.Cells(i, 1), Replacement:=w1.Cells(i, 2), LookAt:=xlPart, _
SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
ReplaceFormat:=False
Next i
End Sub
Make 2 arrays
A[bird produk, pig, ayam, kuda] //words to be changed
B[bird product, pork, chicken, horse] //result after changing the word
Now check each row of your excel and compare it with every element of A. If i matches then replace it with corresponding element of B.
for example
// not actual code something like pseudocode
for (i=1 to no. of rows.)
{
for(j=1 to 200)
{
if(contents of row[i] == A[j])
then contents of row[i]=B[j] ;
break;
}
}
To make it fast you have to stop the current iteration as soon as the word is replaced and check the next row.
Similar idea to #coder_A 's, but use a dictionary to do the "translation" for you, where the keys are the original words and the value for each key is what it gets translated to.
For reading and writing xls with Python, use xlrd and xlwt, see http://www.python-excel.org/
A simple xlrd example:
from xlrd import open_workbook
wb = open_workbook('simple.xls')
for s in wb.sheets():
print 'Sheet:',s.name
for row in range(s.nrows):
values = []
for col in range(s.ncols):
print(s.cell(row,col).value)
and for replacing target text, use a dict
replace = {
'bird produk': 'bird product',
'pig': 'pork',
'ayam': 'chicken'
...
'kuda': 'horse'
}
Dict will give you O(1)(most of the time, if keys don't collide) time complexity when checking membership using 'text' in replace. there's no way to get better performance than that.
Since I don't know what your bunch of strings look like, this answer may be inaccurate or incomplete.

python string manipulation

Going to re-word the question.
Basically I'm wondering what is the easiest way to manipulate a string formatted like this:
Safety/Report/Image/489
or
Safety/Report/Image/490
And sectioning off each word seperated by a slash(/), and storing each section(token) into a store so I can call it later. (Reading in about 1200 cells from a CSV file).
The answer for your question:
>>> mystring = "Safety/Report/Image/489"
>>> mystore = mystring.split('/')
>>> mystore
['Safety', 'Report', 'Image', '489']
>>> mystore[2]
'Image'
>>>
If you want to store data from more than one string, then you have several options depending on how do you want to organize it. For example:
liststring = ["Safety/Report/Image/489",
"Safety/Report/Image/490",
"Safety/Report/Image/491"]
dictstore = {}
for line, string in enumerate(liststring):
dictstore[line] = string.split('/')
print dictstore[1][3]
print dictstore[2][3]
prints:
490
491
In this case you can use in the same way a dictionary or a list (a list of lists) for storage. In case each string has a especial identifier (one better than the line number), then the dictionary is the option to choose.
I don't quite understand your code and don't have too much time to study it, but I thought that the following might be helpful, at least if order isn't important ...
in_strings = ['Safety/Report/Image/489',
'Safety/Report/Image/490',
'Other/Misc/Text/500'
]
out_dict = {}
for in_str in in_strings:
level1, level2, level3, level4 = in_str.split('/')
out_dict.setdefault(level1, {}).setdefault(
level2, {}).setdefault(
level3, []).append(level4)
print out_dict
{'Other': {'Misc': {'Text': ['500']}}, 'Safety': {'Report': {'Image': ['489', '490']}}}
If your csv is line seperated:
#do something to load the csv
split_lines = [x.strip() for x in csv_data.split('\n')]
for line_data in split_lines:
split_parts = [x.strip() for x in line_data.split('/')]
# do something with individual part data
# such as some_variable = split_parts[1] etc
# if using indexes, I'd be sure to catch for index errors in case you
# try to go to index 3 of something with only 2 parts
check out the python csv module for some importing help (I'm not too familiar).

Searching CSV Files (Python)

I've made this CSV file up to play with.. From what I've been told before, I'm pretty sure this CSV file is valid and can be used in this example.
Basically I have this CSV file 'book_list.csv':
name,author,year
Lord of the Rings: The Fellowship of the Ring,J. R. R. Tolkien,1954
Nineteen Eighty-Four,George Orwell,1984
Lord of the Rings: The Return of the King,J. R. R. Tolkien,1954
Animal Farm,George Orwell,1945
Lord of the Rings: The Two Towers, J. R. R. Tolkien, 1954
And I also have this text file 'search_query.txt', whereby I put in keywords or search terms I want to search for in the CSV file:
Lord
Rings
Animal
I've currently come up with some code (with the help of stuff I've read) that allows me to count the number of matching entries. I then have the program write a separate CSV file 'results.csv' which just returns either 'Matching' or ' '.
The program then takes this 'results.csv' file and counts how many 'Matching' results I have and it prints the count.
import csv
import collections
f1 = file('book_list.csv', 'r')
f2 = file('search_query.txt', 'r')
f3 = file('results.csv', 'w')
c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)
input = [row for row in c2]
for booklist_row in c1:
row = 1
found = False
for input_row in input:
results_row = []
if input_row[0] in booklist_row[0]:
results_row.append('Matching')
found = True
break
row = row + 1
if not found:
results_row.append('')
c3.writerow(results_row)
f1.close()
f2.close()
f3.close()
d = collections.defaultdict(int)
with open("results.csv", "rb") as info:
reader = csv.reader(info)
for row in reader:
for matches in row:
matches = matches.strip()
if matches:
d[matches] += 1
results = [(matches, count) for matches, count in d.iteritems() if count >= 1]
results.sort(key=lambda x: x[1], reverse=True)
for matches, count in results:
print 'There are', count, 'matching results'+'.'
In this case, my output returns:
There are 4 matching results.
I'm sure there is a better way of doing this and avoiding writing a completely separate CSV file.. but this was easier for me to get my head around.
My question is, this code that I've put together only returns how many matching results there are.. how do I modify it in order to return the ACTUAL results as well?
i.e. I want my output to return:
There are 4 matching results.
Lord of the Rings: The Fellowship of the Ring
Lord of the Rings: The Return of the King
Animal Farm
Lord of the Rings: The Two Towers
As I said, I'm sure there's a much easier way to do what I already have.. so some insight would be helpful. :)
Cheers!
EDIT: I just realized that if my keywords were in lower case, it won't work.. is there a way to avoid case-sensitivity?
Throw away the query file and get your search terms from sys.argv[1:] instead.
Throw away your output file and use sys.stdout instead.
Append matched booklist titles to a result_list. The result_row that you currently have has a rather misleading name. The count that you want is len(result_list). Print that. Then print the contents of result_list.
Convert your query words to lowercase once (before you start reading the input file). As you read each book_list row, convert its title to lowercase. Do your your matching with the lowercase query words and the lowercase title.
Overall plan:
Read in the entire book list csv into a dictionary of {title: info}.
Read in the questions csv. For each keyword, filter the dictionary:
[key for key, value in books.items() if "Lord" in key]
say. Do what you will with the results.
If you want, put the results in another csv.
If you want to deal with casing issues, try turning all the titles to lowercase ("FOO".lower()) when you store them in the dictionary.

Categories

Resources