Python - Use Delimiter to Cut Off Output

Python - Use Delimiter to Cut Off Output - python

The solution to the below may seem pretty "basic" to some of you; I've tried tons of source code and tons of reading to accomplish this task and constantly receive output that's barely readable to me, which simply doesn't execute, or just doesn't let me out of the loop.
I have tried using: split(), splitlines(), import re - re.sub(), replace(), etc.
But I have only been able to make them succeed using basic strings, but not when it has come to using text files, which have delimiters, involve new lines. I'm not perfectly sure how to use for loops to iterate through text files although I have used them in Python to create batch files which rely on increments. I am very confused about the current task.
=========================================================================
Problem:
I've created a text file (file.txt) that features the following info:
2847:784 3637354:
347263:9379 4648292:
63:38940 3547729:
I would like to use the first colon (:) as my delimiter and have my output print only the numbers that appear before it on each individual line. I want it to look like the following:
2847
347263
63
I've read several topics and have tried to play around with the coded solutions but have not received the output I've desired, nor do I think I fully understand what many of these solutions are saying. I've read several books and websites on the topic to no avail so what i am resorting to now is asking in order to retrieve code that may help me, then I will attempt to play around with it to form my own understanding. I hope that does not make anyone feel as though they are working too hard on my behalf. What I have tried so far is:
tt = open('file.txt', 'r').read()
[i for i in tt if ':' not in i]
vv = open('file.txt', 'r').read()
bb = vv.split(':')
print(bb)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
for e in bb:
print(e)
vv = open('file.txt', 'r').read()
bb = vv.split(':')
lines = [line.rstrip('\n') for line in bb]
print(lines)
io = open('file.txt', 'r').read()
for line in io.splitlines():
print(line.split(" ",1)[0]
with open('file.txt') as f:
lines = f.readlines()
print(lines)
The output from each of these doesn't give me what I desire, but I'm not sure what I'm doing wrong at all. Is there a source I can consult for guidance. I have been reading the forum along with, "Fluent Python," "Data Wrangling with Python," "Automate the Boring Stuff," and "Learn Python the Hard Way," and I have not been able to figure this problem out. Thanks in advance for the assistance.

Try this:
with open('file.txt') as myfile:
for line in myfile:
print(line.split(':')[0])

Related

How can I convert a list of words into a dictionary?

I have a txt file of hundreds of thousands of words. I need to get into some format (I think dictionary is the right thing?) where I can put into my script something along the lines of;
for i in word_list:
word_length = len(i)
print("Length of " + i + word_length, file=open("LengthOutput.txt", "a"))
Currently, the txt file of words is separated by each word being on a new line, if that helps. I've tried importing it to my python script with
From x import y
.... and similar, but it seems like it needs to be in some format to actually get imported? I've been looking around stackoverflow for a wile now and nothing seems to really cover this specifically but apologies if this is super-beginner stuff that I'm just really not understanding.

A list would be the correct way to store the words. A dictionary requires a key-value pair and you don't need it in this case.
with open('filename.txt', 'r') as file:
x = [word.strip('\n') for word in file.readlines()]

What you are trying to do is to read a file. An import statement is used when you want to, loosely speaking, use python code from another file.
The docs have a good introduction on reading and writing files -
To read a file, you first open the file, load the contents to memory and finally close the file.
f = open('my_wordfile.txt', 'r')
for line in f:
print(len(line))
f.close()
A better way is to use the with statement and you can find more about that in the docs as well.

How to scan a file for keywords?

So I was bored and my friend suggest I should code an anti cheat because I couldn't come up with what to code myself. It's supposed to dump the history of the web browser then search it through using keywords. So when a user is under investigation instead of them having to screenshare and click through every file and manually check the browser history this would automate that task. Anyways here's the code I've made so far.
from browser_history import get_history
output = get_history()
his = output.histories
outputfile = open("demo.txt", "w")
print(his, file= outputfile)
outputfile.close()
with open('demo.txt') as f:
if "hack" in f.read():
print("True")
It works but I also want it to read keywords out of a file and then print those keywords if they have been found. So for example if the user has searched for example "minecraft cheat" or something like that then it would print that it has found a search for "minecraft cheat".
I'm sorry if its a dumb question but I have spent quite a while looking and I can't really find any good tutorial on it. Also I was just doing some testing now and for some reason it doesnt print any of the history from today only yesterday. So if anyone knows of any good way to get the history I'd love to hear suggestions on how to improve the code.

You just need to make a small change in how you read from the file:
from browser_history import get_history
output = get_history()
his = output.histories
with open("demo.txt", "w") as f:
f.write(his)
with open("demo.txt", "r") as f:
for line in f:
if "hack" in line:
print("True")
But since 'his' is already a list, you could read directly from it instead of storing it in a file first, but it's up to you!

I never heard of the browser-history library before, pretty neat idea and you could have a lot of fun with this project making gradual improvements to your program. Enjoy!
A small addition to the above answer as I think you were suggesting you wanted to search for multiple keywords and print the keyword found rather than just "True". You could iterate over a list of keywords for each line as follows:
from browser_history import get_history
output = get_history()
his = output.histories
outputfile = open("demo.txt", "w")
print(his, file=outputfile)
outputfile.close()
keywords = ['hack', 'minecraft cheat', 'i love cheating']
with open("demo.txt", "r") as f:
for line in f:
for keyword in keywords:
if keyword in line:
print(keyword, "found")
It is important that the keyword loop is inside the line loop here because there are lots of lines so you don't want to iterate over them several times.
By the way I had a look at the browser-history documentation but couldn't work out why it doesn't return all history. For me it is returning back as far as 27th December but my actual browser history goes back much further. Good luck getting to the bottom of that.

Appending a text file to a text file

I've run to a error. I've been trying to append a text file to itself like so:
file_obj = open("text.txt", "a+")
number = 6
def appender(obj, num):
count = 0
while count<=num:
read = file_obj.read()
file_obj.seek(0,2)
file_obj.write(read)
count+=1
appender(file_obj, number)
However, the text.txt file is then filled with strange ASCII symbols. At first, the file contains only a simple "hello", but after the code, it contains this:
hellohello䀀 猀· d娀 Ť搀Ŭ娀ͤ攀ɪ昀Ѥ萀 夀ɚ搀ť樀Ŧ搀茀 婙ݤ攀Ѫ昀ࡤ萀 夀њ搀
ɥ攀ժ昀൤
茀 婙୤攀ť樀ɦ搀茀 婙൤萀 ݚ搀࡚攀४攀ƃ娀਍搀⡓ 癳  祐桴湯䌠慨慲瑣牥䴠灡楰杮
䌠摯捥挠ㅰ㔲‰敧敮慲整⁤牦浯✠䅍偐义升嘯久佄卒䴯䍉䙓⽔䥗䑎坏⽓偃㈱〵吮员‧楷桴朠湥潣敤⹣祰
മഊ椊 and so on.
Any help will be appreciated

I think I can fix your problem, even though I can't reproduce it. There's a logic error: after you write, you fail to return to the start of the file for reading. In terms of analysis, you failed to do anything to diagnose the problem. At the very least, use a print statement to see what you're reading: that highlights the problem quite well. Here's the loop I used:
count = 0
while count<=num:
file_obj.seek(0) # Read from the beginning of the file.
read = file_obj.read()
print(count, read) # Trace what we're reading.
file_obj.seek(0, 2)
file_obj.write(read)
count+=1
This gives the expected output of 128 (2^(6+1)) repetitions of "hello".
EXTENSIONS
I recommend that you learn to use both the for loop and the with open ... as idiom. These will greatly shorten your program and improve the readability.

I am using this code and everything is working as expected:
with open("file.txt") as f:
for line in f:
f.write(line)

You just have the wrong mode - use 'r+' rather than 'a+'. See this link for a list of modes and an explanation of reading files.

writing a string from one very long file to another file in python

Please do not behead me for my noob question. I have looked up many other questions on stackoverflow concerning this topic, but haven't found a solution that works as intended.
The Problem:
I have a fairly large txt-file (about 5 MB) that I want to copy via readlines() or any other build in string-handling function into a new file. For smaller files the following code sure works (only schematically coded here):
f = open('C:/.../old.txt', 'r');
n = open('C:/.../new.txt', 'w');
for line in f:
print(line, file=n);
However, as I found out here (UnicodeDecodeError: 'charmap' codec can't encode character X at position Y: character maps to undefined), internal restrictions of Windows prohibit this from working on larger files. So far, the only solution I came up with is the following:
f = open('C:/.../old.txt', 'r', encoding='utf8', errors='ignore');
n = open('C:/.../new.txt', 'a');
for line in f:
print(line, file=sys.stderr) and append(line, file='C:/.../new.txt');
f.close();
n.close();
But this doesn't work. I do get a new.txt-file, but it is empty. So, how do I iterate through a long txt-file and write every line into a new txt-file? Is there a way to read the sys.stderr as the source for the new file (I actually don't have any idea, what this sys.stderr is)?
I know this is a noob question, but I don't know where to look for an answer anymore.
Thanks in advance!

There is no need to use print() just write() to the file:
with open('C:/.../old.txt', 'r') as f, open('C:/.../new.txt', 'w') as n:
n.writelines(f)
However, it sounds like you may have an encoding issue, so make sure that both files are opened with the correct encoding. If you provide the error output perhaps more help can be provided.
BTW: Python doesn't use ; as a line terminator, it can be used to separate 2 statements if you want to put them on the same line but this is generally considered bad form.

You can set standard output to file like my code.
I successfully copied 6MB text file with this.
import sys
bigoutput = open("bigcopy.txt", "w")
sys.stdout = bigoutput
with open("big.txt", "r") as biginput:
for bigline in biginput.readlines():
print(bigline.replace("\n", ""))
bigoutput.close()

Why don't you just use the shutil module and copy the file?

you can try with this code it works for me.
with open("file_path/../large_file.txt") as f:
with open("file_path/../new_file", "wb") as new_f:
new_f.writelines(f.readlines())
new_f.close()
f.close()

python working with files as they are written

So I'm trying to create a little script to deal with some logs. I'm just learning python, but know about loops and such in other languages. It seems that I don't understand quite how the loops work in python.
I have a raw log from which I'm trying to isolate just the external IP addresses. An example line:
05/09/2011 17:00:18 192.168.111.26 192.168.111.255 Broadcast packet dropped udp/netbios-ns 0 0 X0 0 0 N/A
And heres the code I have so far:
import os,glob,fileinput,re
def parseips():
f = open("126logs.txt",'rb')
r = open("rawips.txt",'r+',os.O_NONBLOCK)
for line in f:
rf = open("rawips.txt",'r+',os.O_NONBLOCK)
ip = line.split()[3]
res=re.search('192.168.',ip)
if not res:
rf.flush()
for line2 in rf:
if ip not in line2:
r.write(ip+'\n')
print 'else write'
else:
print "no"
f.close()
r.close()
rf.close()
parseips()
I have it parsing out the external ip's just fine. But, thinking like a ninja, I thought how cool would it be to handle dupes? The idea or thought process was that I can check the file that the ips are being written to against the current line for a match, and if there is a match, don't write. But this produces many more times the dupes than before :) I could probably use something else, but I'm liking python and it makes me look busy.
Thanks for any insider info.

DISCLAIMER: Since you are new to python, I am going to try to show off a little, so you can lookup some interesting "python things".
I'm going to print all the IPs to console:
def parseips():
with open("126logs.txt",'r') as f:
for line in f:
ip = line.split()[3]
if ip.startswith('192.168.'):
print "%s\n" %ip,
You might also want to look into:
f = open("126logs.txt",'r')
IPs = [line.split()[3] for line in f if line.split()[3].startswith('192.168.')]
Hope this helps,
Enjoy Python!

Something along the lines of this might do the trick:
import os,glob,fileinput,re
def parseips():
prefix = '192.168.'
#preload partial IPs from existing file.
if os.path.exists('rawips.txt'):
with open('rawips.txt', 'rt') as f:
partial_ips = set([ip[len(prefix):] for ip in f.readlines()])
else:
partial_ips = set()
with open('126logs.txt','rt') as input, with open('rawips.txt', 'at') as output:
for line in input:
ip = line.split()[3]
if ip.startswith(prefix) and not ip[len(prefix):] in partial_ips:
partial_ips.add(ip[len(prefix):])
output.write(ip + '\n')
parseips()

Rather than looping through the file you're writing, you might try just using a set. It might consume more memory, but your code will be much nicer, so it's probably worth it unless you run into an actual memory constraint.

Assuming you're just trying to avoid duplicate external IPs, consider creating an additional data structure in order to keep track of which IPs have already been written. Since they're in string format, a dictionary would be good for this.
externalIPDict = {}
#code to detect external IPs goes here- when you get one;
if externalIPString in externalIPDict:
pass # do nothing, you found a dupe
else:
externalIPDict[externalIPDict] = 1
#your code to add the external IP to your file goes here

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python - Use Delimiter to Cut Off Output - python

Try this: with open('file.txt') as myfile: for line in myfile: print(line.split(':')[0])

Related

How can I convert a list of words into a dictionary?

How to scan a file for keywords?

Appending a text file to a text file

writing a string from one very long file to another file in python

python working with files as they are written

Categories

Resources