How to scan a file for keywords?

How to scan a file for keywords? - python

So I was bored and my friend suggest I should code an anti cheat because I couldn't come up with what to code myself. It's supposed to dump the history of the web browser then search it through using keywords. So when a user is under investigation instead of them having to screenshare and click through every file and manually check the browser history this would automate that task. Anyways here's the code I've made so far.
from browser_history import get_history
output = get_history()
his = output.histories
outputfile = open("demo.txt", "w")
print(his, file= outputfile)
outputfile.close()
with open('demo.txt') as f:
if "hack" in f.read():
print("True")
It works but I also want it to read keywords out of a file and then print those keywords if they have been found. So for example if the user has searched for example "minecraft cheat" or something like that then it would print that it has found a search for "minecraft cheat".
I'm sorry if its a dumb question but I have spent quite a while looking and I can't really find any good tutorial on it. Also I was just doing some testing now and for some reason it doesnt print any of the history from today only yesterday. So if anyone knows of any good way to get the history I'd love to hear suggestions on how to improve the code.

You just need to make a small change in how you read from the file:
from browser_history import get_history
output = get_history()
his = output.histories
with open("demo.txt", "w") as f:
f.write(his)
with open("demo.txt", "r") as f:
for line in f:
if "hack" in line:
print("True")
But since 'his' is already a list, you could read directly from it instead of storing it in a file first, but it's up to you!

I never heard of the browser-history library before, pretty neat idea and you could have a lot of fun with this project making gradual improvements to your program. Enjoy!
A small addition to the above answer as I think you were suggesting you wanted to search for multiple keywords and print the keyword found rather than just "True". You could iterate over a list of keywords for each line as follows:
from browser_history import get_history
output = get_history()
his = output.histories
outputfile = open("demo.txt", "w")
print(his, file=outputfile)
outputfile.close()
keywords = ['hack', 'minecraft cheat', 'i love cheating']
with open("demo.txt", "r") as f:
for line in f:
for keyword in keywords:
if keyword in line:
print(keyword, "found")
It is important that the keyword loop is inside the line loop here because there are lots of lines so you don't want to iterate over them several times.
By the way I had a look at the browser-history documentation but couldn't work out why it doesn't return all history. For me it is returning back as far as 27th December but my actual browser history goes back much further. Good luck getting to the bottom of that.

Related

Whatsapp Chat Anayzer with Python, what next?

im pretty new to python, and since learning information by just writing over the code the tutorial guy tells me I figured it would be better for me to actually build something, so I decided on a whatsapp chat analyzer.
I only got so far using google and now im stuck again. For the reference I use this website, which tells you how to make the chat analyzer but does not actually give you any code.
This is what I managed to do up until now, and that is, reading and printing out the .txt file.
f = open(r"chat.txt","r+", encoding='utf-8')
file_contents = f.read()
print(file_contents)
That just outputs the entire .txt file of chats.
Next, website says I should count the total number of messages and total number of words.
It suggest doing something aloing these lines:
Strings are treated as lists. So you can do a search like this:
if "- Paridhi:" in chat_line:
counter+=1

You need to first define counter=0
Just like this:
splt=file_contents.split()
print(splt)
counter=0
if 'file' in splt:
counter=counter+1
print(counter)

Appending a text file to a text file

I've run to a error. I've been trying to append a text file to itself like so:
file_obj = open("text.txt", "a+")
number = 6
def appender(obj, num):
count = 0
while count<=num:
read = file_obj.read()
file_obj.seek(0,2)
file_obj.write(read)
count+=1
appender(file_obj, number)
However, the text.txt file is then filled with strange ASCII symbols. At first, the file contains only a simple "hello", but after the code, it contains this:
hellohello䀀 猀· d娀 Ť搀Ŭ娀ͤ攀ɪ昀Ѥ萀 夀ɚ搀ť樀Ŧ搀茀 婙ݤ攀Ѫ昀ࡤ萀 夀њ搀
ɥ攀ժ昀൤
茀 婙୤攀ť樀ɦ搀茀 婙൤萀 ݚ搀࡚攀४攀ƃ娀਍搀⡓ 癳  祐桴湯䌠慨慲瑣牥䴠灡楰杮
䌠摯捥挠ㅰ㔲‰敧敮慲整⁤牦浯✠䅍偐义升嘯久佄卒䴯䍉䙓⽔䥗䑎坏⽓偃㈱〵吮员‧楷桴朠湥潣敤⹣祰
മഊ椊 and so on.
Any help will be appreciated

I think I can fix your problem, even though I can't reproduce it. There's a logic error: after you write, you fail to return to the start of the file for reading. In terms of analysis, you failed to do anything to diagnose the problem. At the very least, use a print statement to see what you're reading: that highlights the problem quite well. Here's the loop I used:
count = 0
while count<=num:
file_obj.seek(0) # Read from the beginning of the file.
read = file_obj.read()
print(count, read) # Trace what we're reading.
file_obj.seek(0, 2)
file_obj.write(read)
count+=1
This gives the expected output of 128 (2^(6+1)) repetitions of "hello".
EXTENSIONS
I recommend that you learn to use both the for loop and the with open ... as idiom. These will greatly shorten your program and improve the readability.

I am using this code and everything is working as expected:
with open("file.txt") as f:
for line in f:
f.write(line)

You just have the wrong mode - use 'r+' rather than 'a+'. See this link for a list of modes and an explanation of reading files.

Python - Use Delimiter to Cut Off Output

The solution to the below may seem pretty "basic" to some of you; I've tried tons of source code and tons of reading to accomplish this task and constantly receive output that's barely readable to me, which simply doesn't execute, or just doesn't let me out of the loop.
I have tried using: split(), splitlines(), import re - re.sub(), replace(), etc.
But I have only been able to make them succeed using basic strings, but not when it has come to using text files, which have delimiters, involve new lines. I'm not perfectly sure how to use for loops to iterate through text files although I have used them in Python to create batch files which rely on increments. I am very confused about the current task.
=========================================================================
Problem:
I've created a text file (file.txt) that features the following info:
2847:784 3637354:
347263:9379 4648292:
63:38940 3547729:
I would like to use the first colon (:) as my delimiter and have my output print only the numbers that appear before it on each individual line. I want it to look like the following:
2847
347263
63
I've read several topics and have tried to play around with the coded solutions but have not received the output I've desired, nor do I think I fully understand what many of these solutions are saying. I've read several books and websites on the topic to no avail so what i am resorting to now is asking in order to retrieve code that may help me, then I will attempt to play around with it to form my own understanding. I hope that does not make anyone feel as though they are working too hard on my behalf. What I have tried so far is:
tt = open('file.txt', 'r').read()
[i for i in tt if ':' not in i]
vv = open('file.txt', 'r').read()
bb = vv.split(':')
print(bb)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
for e in bb:
print(e)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
lines = [line.rstrip('\n') for line in bb]
print(lines)
io = open('file.txt', 'r').read()
for line in io.splitlines():
print(line.split(" ",1)[0]
with open('file.txt') as f:
lines = f.readlines()
print(lines)
The output from each of these doesn't give me what I desire, but I'm not sure what I'm doing wrong at all. Is there a source I can consult for guidance. I have been reading the forum along with, "Fluent Python," "Data Wrangling with Python," "Automate the Boring Stuff," and "Learn Python the Hard Way," and I have not been able to figure this problem out. Thanks in advance for the assistance.

Try this:
with open('file.txt') as myfile:
for line in myfile:
print(line.split(':')[0])

Reading from, and then replacing all the text in a .txt file

I'm very new to Python (and coding in general, if I'm honest) and decided to learn by dipping into the Twitter API to make a weird Twitterbot that scrambles the words in a tweet and reposts them, _ebooks style.
Anyway, the way I have it currently set up, it pulls the latest tweet and then compares it to a .txt file with the previous tweet. If the tweet and the .txt file match (i.e., not a new tweet), it does nothing. If they don't, it replaces the .txt file with the current tweet, then scrambles and posts it. I feel like there's got to be a better way to do this than what I'm doing. Here's the relevant code:
words = hank[0]['text']
target = open("hank.txt", "r")
if words == "STOP":
print "Sam says stop :'("
return
else:
if words == target.read():
print "Nothing New."
else:
target.close()
target = open("hank.txt", "w")
target.write(words)
target.close()
Obviously, opening as 'r' just to check it against the tweet, closing, and re-opening as 'w' is not very efficient. However, if I open as 'w+' it deletes all the contents of the file when I read it, and if I open it as 'r+', it adds the new tweet either to the beginning or the end of the file (dependent on where I set the pointer, obviously). I am 100% sure I am missing something TOTALLY obvious, but after hours of googling and dredging through Python documentation, I haven't found anything simpler. Any help would be more than welcome haha. :)

with open(filename, "r+") as f:
data = f.read()// Redaing the data
//any comparison of tweets etc..
f.truncate()//here basically it clears the file.
f.seek(0)// setting the pointer
f.write("most recent tweet")// writing to the file
No need to close the file instance, it automatically closes.
Just read python docs on these methods used for a more clear picture.

I suggest you use yield to compare hank.txt and words line by line so that more memory space could be saved, if you are so focused on efficiency.
As for file operation, I don't think there is a better way in overwriting a file. If you are using Linux, maybe 'cat > hank.txt' could be faster. Just a guess.

Write strings to another file

The Problem - Update:
I could get the script to print out but had a hard time trying to figure out a way to put the stdout into a file instead of on a screen. the below script worked on printing results to the screen. I posted the solution right after this code, scroll to the [ solution ] at the bottom.
First post:
I'm using Python 2.7.3. I am trying to extract the last words of a text file after the colon (:) and write them into another txt file. So far I am able to print the results on the screen and it works perfectly, but when I try to write the results to a new file it gives me str has no attribute write/writeline. Here it the code snippet:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
def ripple(x):
with open(x) as file:
for line in file:
for word in line.split():
if ':' in word:
try:
print word.split(':')[-1]
except (IndexError):
pass
ripple(x)
The code above works perfectly when printing to the screen. However I have spent hours reading Python's documentation and can't seem to find a way to have the results written to a file. I know how to open a file and write to it with writeline, readline, etc, but it doesn't seem to work with strings.
Any suggestions on how to achieve this?
PS: I didn't add the code that caused the write error, because I figured this would be easier to look at.
End of First Post
The Solution - Update:
Managed to get python to extract and save it into another file with the code below.
The Code:
inputFile = open ('c:/folder/Thefile.txt', 'r')
outputFile = open ('c:/folder/ExtractedFile.txt', 'w')
tempStore = outputFile
for line in inputFile:
for word in line.split():
if ':' in word:
splitting = word.split(':')[-1]
tempStore.writelines(splitting +'\n')
print splitting
inputFile.close()
outputFile.close()
Update:
checkout droogans code over mine, it was more efficient.

Try this:
with open('workfile', 'w') as f:
f.write(word.split(':')[-1] + '\n')
If you really want to use the print method, you can:
from __future__ import print_function
print("hi there", file=f)
according to Correct way to write line to file in Python. You should add the __future__ import if you are using python 2, if you are using python 3 it's already there.

I think your question is good, and when you're done, you should head over to code review and get your code looked at for other things I've noticed:
# the txt file I'm trying to extract last words from and write strings into a file
#Hello:there:buddy
#How:areyou:doing
#I:amFine:thanks
#thats:good:I:guess
First off, thanks for putting example file contents at the top of your question.
x = raw_input("Enter the full path + file name + file extension you wish to use: ")
I don't think this part is neccessary. You can just create a better parameter for ripple than x. I think file_loc is a pretty standard one.
def ripple(x):
with open(x) as file:
With open, you are able to mark the operation happening to the file. I also like to name my file object according to its job. In other words, with open(file_loc, 'r') as r: reminds me that r.foo is going to be my file that is being read from.
for line in file:
for word in line.split():
if ':' in word:
First off, your for word in line.split() statement does nothing but put the "Hello:there:buddy" string into a list: ["Hello:there:buddy"]. A better idea would be to pass split an argument, which does more or less what you're trying to do here. For example, "Hello:there:buddy".split(":") would output ['Hello', 'there', 'buddy'], making your search for colons an accomplished task.
try:
print word.split(':')[-1]
except (IndexError):
pass
Another advantage is that you won't need to check for an IndexError, since you'll have, at least, an empty string, which when split, comes back as an empty string. In other words, it'll write nothing for that line.
ripple(x)
For ripple(x), you would instead call ripple('/home/user/sometext.txt').
So, try looking over this, and explore code review. There's a guy named Winston who does really awesome work with Python and self-described newbies. I always pick up new tricks from that guy.
Here is my take on it, re-written out:
import os #for renaming the output file
def ripple(file_loc='/typical/location/while/developing.txt'):
outfile = "output.".join(os.path.basename(file_loc).split('.'))
with open(outfile, 'w') as w:
lines = open(file_loc, 'r').readlines() #everything is one giant list
w.write('\n'.join([line.split(':')[-1] for line in lines]))
ripple()
Try breaking this down, line by line, and changing things around. It's pretty condensed, but once you pick up comprehensions and using lists, it'll be more natural to read code this way.

You are trying to call .write() on a string object.
You either got your arguments mixed up (you'll need to call fileobject.write(yourdata), not yourdata.write(fileobject)) or you accidentally re-used the same variable for both your open destination file object and storing a string.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to scan a file for keywords? - python

Related

Whatsapp Chat Anayzer with Python, what next?

Appending a text file to a text file

Python - Use Delimiter to Cut Off Output

Reading from, and then replacing all the text in a .txt file

Write strings to another file

Categories

Resources