Google search from python app - python

I'm trying to take an input file read each line and search google with that line and print the search results from the query. I get the first search result which is from wikipedia which is great but then I get the error: File "test.py", line 24, in
dictionary[str(lineToRead)].append(str(i))
KeyError: 'mouse'
input file pets.txt looks like this:
cat
dog
bird
mouse
inputFile = open("pets.txt", 'r') # Makes File object
outputFile = open("results.csv", "w")
dictionary = {} # Our "hash table"
compare = "https://en.wikipedia.org/wiki/" # urls will compare against this string
for line in inputFile.read().splitlines():
# ---- testing ---
print line
lineToRead = line
inputFile.close()
from googlesearch import GoogleSearch
gs = GoogleSearch(lineToRead)
#gs.results_per_page = 5
#results = gs.get_results()
for i in gs.top_urls():
print i # check to make sure this is printing out url's
compare2 = i
if compare in compare2: # compare the two url's
dictionary[str(lineToRead)].append(str(i)) #write out query string to dictionary key & append the urls
for i in dictionary:
print i
outputFile.write(str(i))
for j in dictionary[i]:
print j
outputFile.write(str(j))
#outputFile.write(str(i)) #write results for the query string to the results file.
#to check if hash works print key /n print values /n print : /n print /n
#-----------------------------------------------------------------------------

Jeremy Banks is right. If you write dictionary[str(lineToRead)].append(str(i)) without first initializing a value for dictionary[str(lineToRead)] you will get an error.
It looks like you have an additional bug. The value of lineToRead will always be mouse, since you have already looped through and closed your input file before searching for anything. Likely, you want to loop thru every word in inputFile (i.e. cat, dog, bird, mouse)
To fix this, we can write the following (assuming you want to keep a list of query strings as values in the dictionary for each search term):
for line in inputFile.read().splitlines(): # loop through each line in input file
lineToRead = line
dictionary[str(lineToRead)] = [] #initialize to empty list
for i in gs.top_urls():
print i # check to make sure this is printing out url's
compare2 = i
if compare in compare2: # compare the two url's
dictionary[str(lineToRead)].append(str(i)) #write out query string to dictionary key & append the urls
inputfile.close()
You can delete the for loop you wrote for 'testing' the inputFile.

Related

I'm trying to find words from a text file in another text file

I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier. The GUI utilizes data that has been scraped from various sources using the 'requests' library. It works well but there is a problem; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly. This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code - I need to change this. Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.
#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]
#This is how the info in the scrapped file looks like:
Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,
#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.
with open("freeze.csv",'r') as freeze:
for word in basketball:
if word in freeze:
freeze.write(word)
# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way
# I tried differents solutions but none of them work for me
with open('final_G_league.csv') as text, open('freeze1.csv') as filter_words:
st = set(map(str.rstrip,filter_words))
txt = next(text).split()
out = [word for word in txt if word not in st]
# This one gives me the first line of the scrapped text
import csv
file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')
data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)
# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]
for i in range(len(data1)):
if data1[i] != data2[i]:
print("Line " + str(i) + " is a mismatch.")
print(f"{data1[i]} doesn't match {data2[i]}")
file1.close()
file2.close()
#This one returns a list with a bunch of names and a list index error.
file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')
list1 = file1.readlines()
list2 = file2.readlines()
for i in list1:
for j in list2:
if j in i:
# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file
Let's assume we have following input files:
freeze_list.txt - comma separated list of filter words (players) enclosed in quotes:
'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'
final_G_league.csv - scrapped lines that we want to filter, using words from the freeze_list.txt file:
Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,
I would split the responsibilities of the script in code segments to make it more readable and manageable:
Define constants (later you could make them parameters)
Read filter words from a file
Filter scrapped lines
Dump output to a file
The constants:
FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"
Read filter words from a file:
with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
filter_words = eval('(' + filter_words_file.read() + ')')
Filter lines from the scrapped file:
matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
for line in scrapped_file:
# Check if any of the keywords is found in the line
for filter_word in filter_words:
if filter_word in line:
matched_lines.append(line)
# stop checking other words for performance and
# to avoid sending same line multipe times to the output
break
Dump filtered lines into a file:
with open(FILTERED_FILE_NAME, "w") as filtered_file:
for line in matched_lines:
filtered_file.write(line)
The output freeze.csv after running above segments in a sequence is:
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
Suggestion
Not sure why you have chosen to store the filter words in a comma separated list. I would prefer using a plain list of words - one word per line.
freeze_list.txt:
Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn
The reading becomes straightforward:
with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
filter_words = [word.strip() for word in filter_words_file]
The output freeze.csv is the same:
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
If file2 is just a list of names and want to extract those rows in first file where the name column matches a name in the list.
Suggest you make the "freeze" file a text file with one-name per line and remove the single quotes from the names then can more easily parse it.
Can then do something like this to match the names from one file against the other.
import csv
# convert the names data to a list
with open("freeze1.txt",'r') as file2:
names = [s.strip() for s in file2]
print("names:", names)
# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
reader = csv.reader(file1)
next(reader) # skip header
for row in reader:
if row[0] in names:
# print matching name that matches
print(row[0])
If names don't match exactly as appears in the final_G_league file then may need to adjust accordingly such as doing a case-insensitive match or normalizing names (last, first vs first last), etc.

Rewriting Single Words in a .txt with Python

I need to create a Database, using Python and a .txt file.
Creating new items is no Problem,the inside of the Databse.txt looks like this:
Index Objektname Objektplace Username
i.e:
1 Pen Office Daniel
2 Saw Shed Nic
6 Shovel Shed Evelyn
4 Knife Room6 Evelyn
I get the index from a QR-Scanner (OpenCV) and the other informations are gained via Tkinter Entrys and if an objekt is already saved in the Database, you should be able to rewrite Objektplace and Username.
My Problems now are the following:
If I scan the Code with the index 6, how do i navigate to that entry, even if it's not in line 6, without causing a Problem with the Room6?
How do I, for example, only replace the "Shed" from Index 4 when that Objekt is moved to f.e. Room6?
Same goes for the Usernames.
Up until now i've tried different methods, but nothing worked so far.
The last try looked something like this
def DBChange():
#Removes unwanted bits from the scanned code
data2 = data.replace("'", "")
Index = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
#Adds a whitespace at the end of the Entrys to seperate them
Userlen = len(User)
User2 = User.ljust(Userlen)
Einlagerungsortlen = len(Einlagerungsort)+1
Einlagerungsort2 = Einlagerungsort.ljust(Einlagerungsortlen)
#Navigate to the exact line of the scanned Index and replace the words
#for the place and the user ONLY in this line
file = open("Datenbank.txt","r+")
lines=file.readlines()
for word in lines[Index].split():
List.append(word)
checkWords = (List[2],List[3])
repWords = (Einlagerungsort2, User2)
for line in file:
for check, rep in zip(checkWords, repWords):
line = line.replace(check, rep)
file.write(line)
file.close()
Return()
Thanks in advance
I'd suggest using Pandas to read and write your textfile. That way you can just use the index to select the approriate line. And if there is no specific reason to use your text format, I would switch to csv for ease of use.
import pandas as pd
def DBChange():
#Removes unwanted bits from the scanned code
# I haven't changed this part, since I guess you need this for some input data
data2 = data.replace("'", "")
Indexnr = data2.replace("b","")
#Gets the Data from the Entry-Widgets
User = Nutzer.get()
Einlagerungsort = Ort.get()
# I removed the lines here. This isn't necessary when using csv and Pandas
# read in the csv file
df = pd.read_csv("Datenbank.csv")
# Select line with index and replace value
df.loc[Indexnr, 'Username'] = User
df.loc[Indexnr, 'Objektplace'] = Einlagerungsort
# Write back to csv
df.to_csv("Datenbank.csv")
Return()
Since I can't reproduce your specific problem, I haven't tested it. But something like this should work.
Edit
To read and write text-file, use ' ' as the seperator. (I assume all values do not contain spaces, and your text file now uses 1 space between values).
reading:
df = pd.read_csv('Datenbank.txt', sep=' ')
Writing:
df.to_csv('Datenbank.txt', sep=' ')
First of all, this is a terrible way to store data. My suggestion is not particularily well code, don't do this in production! (edit
newlines = []
for line in lines:
entry = line.split()
if entry[0] == Index:
#line now is the correct line
#Index 2 is the place, index 0 the ID, etc
entry[2] = Einlagerungsort2
newlines.append(" ".join(entry))
# Now write newlines back to the file

How do i search for a line of text in whole text file

How do I read every line of a file in Python and check if that line is in another one of the lines of the same text?
I've created hash of 2000 images and stored it in the same text file. So to find if the duplicate image exists I want to cross-check hash of all the images generated.
Below is the code in list in which I have extracted the data,
with open('hash_info.txt') as f:
content = f.readlines()
['fbefcfdf961f1919\n', 'aecc9e9696961f2f\n', 'cc1c9c9c1c1e272f\n', 'a4ce9e9e9793134b\n', 'e2e7e7e7e7e7e763\n', 'e64fcbcfcf8f0f27\n', '9c1c3c3c3e1e1e9c\n', 'c8cc9cb43e3c3b1b\n', 'cccd9e9e9e1e1f9f\n', 'ccce9e9e9ece0e4e\n', 'a6a7cbcfcf071736\n', 'f69c9c3c3636373b\n', 'ec9c9cbc3c26272b\n', 'f0cccc9c8c0e3f3b\n', '4c9c363e3e3e1e5d\n', '9c9cbc3e3c3c376f\n', 'f5ccce9e9e9e1f2c\n', 'cccc8c9ccc9ccdca\n', 'dc98ac2c363e5e5f\n', 'f2e7e7e7e7e76746\n', '9a9a1e3e3e3e373f\n', 'cc8c9e9e8ecece8f\n', 'db9f9f1e363e9e9e\n', 'e4cece8e9ececfcf\n', 'cecede9f9bce8f8f\n', 'b8ce4e4e9f1b1b29\n', 'ece6e6e7efcf0d05\n', 'cd8e9696b732163f\n', 'cece9e9ecececfcd\n', 'cc9d9f9f9f8dcdd9\n', '992d2c2c3c3ebe9e\n', 'e6e6cece8f2d2939\n', 'eccfcfcfcf4f6f7d\n', 'e6cecfcfcfefcec6\n', 'edf8e4cecece4e0e\n', 'e9d6e6e7e7a76667\n', 'edcecfcfcfcfcecf\n', 'a5a6c6ce8e0f43c7\n', '3a3e7c7c3d3e3f2f\n', 'cc9c963c361f173f\n', '8c9c9c9d9d9d1a9a\n', 'f0cc8e9e9e9f9d9e\n', '989c3c3c1c2e6e5b\n', 'f0989c1c9e1e1adb\n', 'f09c9c9c9c9e9e9f\n', 'e6ce4e1e86333309\n', 'a6cece9e8f0f0f2f\n', 'e8cccc9cccdc8d8c\n', 'f0ecced6969f0f2d\n', 'e0d89c3c3c3d3d1f\n', 'e6e7c7cfc7c64e4f\n', 'a6cf4b0f0e073739\n', 'cececececccf4b5b\n', 'a6c6cfcfcfc6c6c6\n', 'f0fcf3e3e3e3f303\n', 'f9f2e7e7cbcfcf97\n','fbefcfdf961f1919\n', 'f3e7e5e5e7e5c7c3\n', 'b3e7e7c7c7070f1e\n', 'cb9d97963e3f3325\n', '9b1e2c1c1e1e2e2b\n', '9d9e969f9f9f9f0f\n', 'e6a7a7e7e666666c\n', 'c64e9e9b0b072727\n','fbefcfdf961f1919\n', 'c7cfcfcfcfc7ce86\n', 'e6cecfcfcfc7c745\n', 'e6e6cecececfcfcf\n', 'cbcd9f9f9e1f3a7a\n', 'ccce9ecececec646\n', 'f1c7cfdf9f970325\n', '989d9c1c1e9e9f1f\n', '9c9e1c1e9e9d9c9a\n', '5f3d7656de5d3b1f\n', '5f3d76565e5d3b1f\n']
Below is the text file of the same as above:
33393cccde1b3f7b
71fb989ed79f3b79
78b0a3a34c7c3737
67781c5e9fcc1f4c
313c2ccf4b5f5f7f
ece8cc9c9696171f
f4ec8c9c9c9c1e1e
e8cc94b68c9c1ece
d89c36161c9c1e3f
ecccdacececf6d6d
a4cecbcacf87173d
f9f3e7ebcbc74707
d9e5c7cbd34b4f4d
e4ece6e3cbdb8f1d
ccde9a9ecccecfad
e6e6ced293d6cfc6
cc8c9c989ccc8e8b
f2ccc696cecfcfcf
cc8c9a9a9ececfcd
cc9c9c9cdc9c9ff3
How I solved it
def check_dup(hash):
f = open('hash_text_file.txt')
s = mmap.mmap(f.fileno(), 0, access = mmap.ACCESS_READ)
if s.find(hash.rstrip()) != -1: #rstrip to remove \n
print("Duplicate Image")
return False
else:
return True
#Opens the text document
file=open("Old.txt", "r")
#Reads the text document and splits it into a list with each line being an element
lines=file.read().split("\n")
new_lines=[]
#Iterate over list of lines
for line in lines:
#If line is not in the empty list of lines( i.e the list that will contain unique lines) add the line to it
#This makes sure that no line exists twice in the list
if line not in new_lines:
new_lines.append(line)
#Open a new text file
file_new=open("New.txt","w")
#Add each line of our new unique lines list to the text file
file_new.write("\n".join(new_lines))
file_new.close()
file.close()
I took some of your sample data and cleaned the "\n" from it, converted to set and testing them for in/not in set like this:
data = ['fbefcfdf961f1919\n', 'aecc9e9696961f2f\n', 'cc1c9c9c1c1e272f\n',
'a4ce9e9e9793134b\n', 'e2e7e7e7e7e7e763\n',]
# create a set from your data, lookups are faster that way
cleaned = set(x.strip("\n") for x in data)
for testMe in ['not in it', 'fbefcfdf961f1919']: # supply your list of "new" things
if testMe in cleaned:
print "Got a duplicate: " + testMe
else:
print "Unique: " + testMe
# append to hash-file
with open("hash_info.txt","w+") as f: # if you have 1000 new hashes this
f.write(testMe+"\n") # reopens the file 1000 times (see below)
To compare huge new data to your existing data you should put the new data in a set as well:
newSet = set( ... your data here ... )
and use set operations to get all that are not yet in your cleaned set:
thingsToAddToFile = newSet - cleaned # substracts from newSet all known ones, only
# new ones will be in thingsToAddToFile
# add them all to your exisitng ones by appending them:
with open("hash_info.txt","w+") as f:
f.write("\n".join(thingsToAddToFile) + "\n") # joins all in set and appends `'\n'` on end
See https://docs.python.org/2/library/sets.html:
x in s test x for membership in s
x not in s test x for non-membership in s
s.issubset(t) s <= t test whether every element in s is in t
s.issuperset(t) s >= t test whether every element in t is in s
s.union(t) s | t new set with elements from both s and t
s.intersection(t) s & t new set with elements common to s and t
s.difference(t) s - t new set with elements in s but not in t
s.symmetric_difference(t)
s ^ t new set with elements in either s or t but not both

Linear search to find spelling errors in Python

I'm working on learning Python with Program Arcade Games and I've gotten stuck on one of the labs.
I'm supposed to compare each word of a text file (http://programarcadegames.com/python_examples/en/AliceInWonderLand200.txt) to find if it is not in the dictionary file (http://programarcadegames.com/python_examples/en/dictionary.txt) and then print it out if it is not. I am supposed to use a linear search for this.
The problem is even words I know are not in the dictionary file aren't being printed out. Any help would be appreciated.
My code is as follows:
# Imports regular expressions
import re
# This function takes a line of text and returns
# a list of words in the line
def split_line(line):
split = re.findall('[A-Za-z]+(?:\'\"[A-Za-z]+)?', line)
return split
# Opens the dictionary text file and adds each line to an array, then closes the file
dictionary = open("dictionary.txt")
dict_array = []
for item in dictionary:
dict_array.append(split_line(item))
print(dict_array)
dictionary.close()
print("---Linear Search---")
# Opens the text for the first chapter of Alice in Wonderland
chapter_1 = open("AliceInWonderland200.txt")
# Breaks down the text by line
for each_line in chapter_1:
# Breaks down each line to a single word
words = split_line(each_line)
# Checks each word against the dictionary array
for each_word in words:
i = 0
# Continues as long as there are more words in the dictionary and no match
while i < len(dict_array) and each_word.upper() != dict_array[i]:
i += 1
# if no match was found print the word being checked
if not i <= len(dict_array):
print(each_word)
# Closes the first chapter file
chapter_1.close()
Linear search to find spelling errors in Python
Something like this should do (pseudo code)
sampleDict = {}
For each word in AliceInWonderLand200.txt:
sampleDict[word] = True
actualWords = {}
For each word in dictionary.txt:
actualWords[word] = True
For each word in sampleDict:
if not (word in actualDict):
# Oh no! word isn't in the dictionary
A set may be more appropriate than a dict, since the value of the dictionary in the sample isn't important. This should get you going, though

How to save regular expression objects in a dictionary?

For a first-semester task I'm supposed to write a script that finds first and last names in a file and displays them in the following order (last name, first name) next to the original entry (first name, last name).
The file has one entry per line which looks as follows: "Srđa Slobodan ĐINIC POPOVIC".
My questions are probably basic but I'm stuck:
How can I save all the entries of the file in a hash (multi-part first names/multi-part lastnames)? With re.compile() and re.search() I only manage to get one result. With re.findall() I get all, but can't group.() them and get encoding errors.
How can I connect the original name entry (last name/first name) to the new entry (first name/last name).
import re, codecs
file = codecs.open('FILE.tsv', encoding='utf-8')
test = file.read()
list0 = test.rstrip()
for word in list0:
p = re.compile('(([A-Z]+\s\-?)+)')
u = re.compile('((\(?[A-Z][a-z]+\)?\s?-?\.?)+)')
hash1 = {}
hash1[p.search(test).group()] = u.search(test).group()
hash2 = {}
hash2[u.search(test).group()] = p.search(test).group()
print hash1,'\t',hash2

Categories

Resources