Python 3: search for a specific string in a file

Python 3: search for a specific string in a file - python

I have a txt file with strings assigned to each other like "sun - moon" and I want to get the assigned value (no matter which one) of a particular string if it would come from user input, and if not then create a new pair for file and write it to it:
user_input = input()
if user_input in open('base.txt').read():
print(True) # it's just to be sure that everything really works
else:
base_file = open('base.txt', 'a+')
base_file.write(user_input)
base_file.write('\n')
base_file.close()

import pickle
myDictA = {'sun': 'moon'}
with open('myFile.pkl', 'w') as file:
pickle.dump(myDict, file)
with open('myFile.pkl', 'r') as file:
myDictB = pickle.load(file)
print("myDictA:", myDictA)
print("myDictB:", myDictB)
you can also integrate gzip in the file save load process to save disk space if you want. Another option is to use cPickle which should be written the same way and is said to be up to 1000x faster.

A little addition to the current code.
user_input = input()
flag=1
with open('base.txt') as f:
data=f.read()
if user_input in data:
print(data)
flag=0
if flag:
base_file = open('base.txt', 'a+')
base_file.write(user_input)
base_file.write('\n')
base_file.close()

Related

file.write can't save int

I am building a save system for a game im making, im trying to save all of the resources you get in the game so you can load into it the next time you play. I was going to use file.write as I saw it being used in other types of games, but it cant save the variables as ints. Is there any sort of workaround or just a different sort of saving that I could use to be able to do this?
from Resources import *
def start_new():
Q = int(input('which save file do you want to save to? 1, 2, or 3.'))
if Q == 1:
file = open("save1.txt", "w")
file.write(Manpower)
file.write(Food)
file.write(Food_Use)
file.write(Wood)
file.write(Farmers)
file.write(Food_Income)
file.write(FarmNum)
file.write(MaxFarmer)
file.write(Deforestation)
file.write(Trees)
file.write(Tree_Spread)
file = open("save1.txt", "r")

Convert integer values to string. You can do this using several method. Lets write Manpower(Assuming type of this variable is int) to file to be an example:
A small advice, there is no need to call file.close() when using with statement. The with statement itself ensures proper acquisition and release of resources.
with open("save1.txt", 'w') as f:
f.write(str(Manpower))
or even better:
with open("save1.txt", 'w') as f:
f.write(f"{Manpower}\n")
\n is EOL(End Of Line) character. After EOL character, new writes will be in next line. You can use it to separate and identify values while reading them again.

On you code you need to close the file after make changes...
file.close()
Your code:
from Resources import *
def start_new():
Q = int(input('which save file do you want to save to? 1, 2, or 3.'))
if Q == 1:
file = open("save1.txt", "w")
file.write(Manpower)
file.write(Food)
file.write(Food_Use)
file.write(Wood)
file.write(Farmers)
file.write(Food_Income)
file.write(FarmNum)
file.write(MaxFarmer)
file.write(Deforestation)
file.write(Trees)
file.write(Tree_Spread)
file.close()
file = open("save1.txt", "r")
# stuff
file.close()

file.read() output string position?

I didnt know how to word the title, but here's the code:
file = open('blah.txt', 'r')
line = file.read()
file.close()
stuff = input('enter stuff here')
if stuff in line:
print('its in the file')
else:
print('its not in the file')
and here's blah.txt:
text,textagain,textbleh
If the user input is found in the file, is there a way to also output the position of the string entered by the user? For example, if the user entered 'textagain', is there a way to output '2' because the string 'textagain' is in the second position in the file?
Thank you in advance.

What #Amiga500 said would likely work with some wrangling, I suggest splitting your strings up.
"text,textagain,textbleh".split(",") would return ['text', 'textagain', 'textbleh'] at which point you can do .index(your_word) and get back the index (1 for textagain since Python uses zero based indexing and it is the second entry).
So the final code might be:
file = open('blah.txt', 'r')
line = file.read()
file.close()
stuff = input('enter stuff here')
if stuff in line.split(","):
print('its in the file')
else:
print('its not in the file')

Try this:
line.index(stuff)
i.e.
if stuff in line:
posit = line.index(stuff)
If you try jumping straight to the index and its not there, it'll throw an error. You could use a try except as a workaround, but its ugly.
Ah, sorry, misread. You've a csv file.
Use the csv reader in python to read it in:
Python import csv to list
Then loop through (assuming line is the sub-list for each line):
if stuff in line:
posit = line.index(stuff)

AttributeError: 'str' object has no attribute 'readlines'. Where did I go wrong in my code?

I am trying to generate the reverse complement for DNA sequences of multiple file types with a python script. Here is what I have written so far
import gzip
import re
############## Reverse Complement Function #################################
def rev_comp(dna):
dna_upper = dna.upper() #Ensures all input is capitalized
dna_rev = dna_upper[::-1] #Reverses the string
conversion = {'A':'T','C':'G','G':'C','T':'A','Y':'R','R':'Y',\
'S':'S','W':'W','K':'M','M':'K','B':'V','V':'B',\
'D':'H','H':'D','N':'N','-':'-'}
rev_comp = ''
rc = open("Rev_Comp.fasta", 'w')
for i in dna_rev:
rev_comp += conversion[i]
rc.write(str(rev_comp))
print("Reverse complement file Rev_Comp.fasta written to directory")
x = input("Enter filename (with extension) of the DNA sequence: ")
if x.endswith(".gz"): #Condition for gzip files
with gzip.open(x, 'rb') as f:
file_content = f.read()
new_file = open("unzipped.fasta", 'w')
new_file.write(str(file_content))
print("unzipped.fasta written to directory")
xread = x.readlines()
fast = ''
if x.endswith(".fasta"): #condition for fasta files
for i in xread:
if not i.startswith('>'):
fast = fast + i.strip('\n')
if x.endswith(".fastq"): #condition for fastq files
for i in range(1,len(xread),4):
fast = fast + xread[i].strip('\n')
rev_comp(x)
And what I wind up with is
AttributeError: 'str' object has no attribute 'readlines'
when I try to run the script using a .fastq file. What exactly is going wrong here? I expect the script to write Rev_comp.fasta, but it doesn't.

x is not a filehandle, just a file name. You need to do
with open(x) as xhandle:
xread = xhandle.readlines()
The overall logic might be better if you don't read all lines into memory. Also, the .gz case ends up in vaguely undefined territory; do you need to set x to the name of the unzipped data at the end of the gz handling, or perhaps put the code after it into an else: branch?

x is the input from the user, which is a string. You need to open a file to be able to call readlines on it.
According to your existing code:
x = input("Enter filename (with extension) of the DNA sequence: ") # x stores a string
file_x = open(x, 'r') # You must open a file ...
xread = file_x.readlines() # and call readlines on the file instance.
# Although it is not explicitly necessary, remember to close the file when you'done, is good practice.
file_x.close()
or use the file as a context manager
with open(x) as file_x:
xread = file_x.readlines()

splitting a file into multiple files with a key word using python

I have a large text file in python. I want to split it into 2, using a keyword. The file above the keyword must be copied to one file and the rest of the file into other. I want to save these files with different extensions in the same directory. Please help me with this.
Also, how to convert a file from one format to another format?
For example, .txt to .xml or .cite to .xml ?

To answer the first part of your question, you can simply use the split function after reading the text and write them to your new files:
with open('oldfile.txt', 'r') as fh:
text_split = fh.read().split(keyword)
with open('newfile' + extension1, 'w') as fh:
fh.write(text_split[0])
with open('newfile' + extension2, 'w') as fh:
# If you know that the keyword only appears once
# you can changes this to fh.write(text_split[1])
fh.write(keyword.join(text_split[1:]))
The second part of your question is much more difficult. I don't know what kind of file format that you are working with, but txt files are just plain text with no specific structure. XML files cannot be converted from any arbitrary format. If you are working with XML files with a .txt format, you can simply change the format to XML, but if you are looking to convert a format like CSV, I suggest you use a library such as lxml.
Edit: If the file does not fit into memory, then you can iterate through the lines instead:
with open('oldfile.txt', 'r') as fh:
fh_new = open('newfile' + extension1, 'w')
keyword_found = False
line = fh.readline()
while line:
if not keyword_found:
text_split = line.split(keyword)
fh_new.write(text_split[0])
if len(text_split) > 1:
fh_new.close()
keyword_found = True
fh_new = open('newfile' + extension2, 'w')
fh_new.write(text_split[1:])
else:
fh_new.write(line)
line = fh.readline()
fh_new.close()

about splitting your file this should do it( considering the largeness of the file):
import mmap
regex=b'your keyword'
f=open('your_path_to_the_main_file','rb')
s = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
first_occurance_position=s.find(regex)
if(first_occurance_position==0)
print('this is a mistake')
f.close()
quit()
buf_size=0xfff
first_part_file=open('your_path_to_the_first_part'+'.its_extension','wb')
second_part_file=open('your_path_to_the_second_part'+'.its_extension','wb')
i=0;
if(buf_size>len(regex)):
buf_size=len(regex)
b=f.read(buf_size)
while(b):
i=i+buf_size
first_part_file.write(b)
if(i==first_occurance_position):
break
if(first_occurance_position-i<buf_size):
buf_size=first_occurance_position-i
b=f.read(buf_size)
b=f.read(0xffff)
while(b):
second_part_file.write(b)
b=f.read(0xffff)
first_part_file.close()
second_part_file.close()
f.close()

Read and process a text file and save to csv

The files I have seem to be in a "dict" format...
file header is as follows: time,open,high,low,close,volume
next line is as follows:
{"t":[1494257340],"o":[206.7],"h":[209.3],"l":[204.50002],"c":[204.90001],"v":[49700650]}`
import csv
with open ('test_data.txt', 'rb') as f:
for line in f:
dict_file = eval(f.read())
time = (dict_file['t']) # print (time) result [1494257340]
open_price = (dict_file['o']) # print (open_price) result [206.7]
high = (dict_file['h']) # print (high) result [209.3]
low = (dict_file['l']) # print (low) result [204.50002]
close = (dict_file['c']) # print (close) result [204.90001]
volume = (dict_file['v']) # print (volume) result [49700650]
print (time, open_price, high, low, close, value)
# print result [1494257340] [206.7] [209.3] [204.50002] [204.90001] [49700650]
# I need to remove the [] from the output.
# expected result
# 1494257340, 206.7, 209.3, 204.50002, 204.90001, 49700650
the result I need is (change time ("epoch date format") to dd,mm,yy
5/8/17, 206.7, 209.3, 204.50002, 204.90001, 49700650
so I know I need the csv.writer function

I see a number of problems in the code you submitted. I recommend you to break your task into small pieces and see if you can make them work individually. So what are you trying to do is:
open a file
read the file line by line
eval each line to get a dict object
get values from that object
write those values in a (separate?) csv file
Right?
Now do each one, one small step at the time
opening a file.
You're pretty much on point there:
with open('test_data.txt', 'rb') as f:
print(f.read())
# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'
You can open the file in r mode instead, it will give you strings instead of byte type objects
with open('test_data.txt', 'r') as f:
print(f.read())
# {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
It might cause some problems but should work since eval can handle it just fine (at least in python 3)
read the file line by line
with open('test_data.txt', 'rb') as f:
for line in f:
print(line)
# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'
Here is another problem in your code, you're not using line variable and trying to f.read() instead. This will just read entire file (starting from the second line, since the first one is been read already). Try to swap one for another and see what happens
eval each line to get a dict object
Again. This works fine. but I would add some protection here. What if you get an empty line in the file or a misformatted one. Also if this file comes from an untrusted source you may become a victim of a code injection here, like if a line in your file changed to:
print("You've been hacked") or {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
with open('test_data.txt', 'rb') as f:
for line in f:
dict_file = eval(line)
print(dict_file)
# You've been hacked
# {'t': [1494257340], 'o': [207.75], 'h': [209.8], 'l': [205.75], 'c': [206.35], 'v': [61035956]}
I don't know your exact specifications, but you should be safer with json.loads instead.
...
Can you continue on your own from there?
get values from the object
I think dict_file['t'] doesn't give you the value you expect.
What does it give you?
Why?
How to fix it?
write those values in a csv file
Can you write some random string to a file?
What scv format looks like? Can you format your values to match it
Check the docs for csv module, can it be of help to you?
And so on and so forth...
EDIT: Solution
# you can save the print output in a file by running:
# $ python convert_to_csv.py > output.cvs
import datetime, decimal, json, os
CSV_HEADER = 'time,open,high,low,close,volume'
with open('test_data.txt', 'rb') as f:
print(CSV_HEADER)
for line in f:
data = json.loads(line, parse_float=decimal.Decimal)
data['t'][0] = datetime.datetime.fromtimestamp(data['t'][0]) \
.strftime('%#d/%#m/%y' if os.name == 'nt' else '%-d/%-m/%y')
print(','.join(str(data[k][0]) for k in 'tohlcv'))
Running:
$ cat test_data.txt
{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
{"t":[1490123123],"o":[107.75],"h":[109.8],"l":[105.75],"c":[106.35],"v":[11035956]}
{"t":[1491234234],"o":[307.75],"h":[309.8],"l":[305.75],"c":[306.35],"v":[31035956]}
$ python convert_to_csv.py
time,open,high,low,close,volume
8/5/17,207.75,209.8,205.75,206.35,61035956
21/3/17,107.75,109.8,105.75,106.35,11035956
3/4/17,307.75,309.8,305.75,306.35,31035956

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python 3: search for a specific string in a file - python

A little addition to the current code. user_input = input() flag=1 with open('base.txt') as f: data=f.read() if user_input in data: print(data) flag=0 if flag: base_file = open('base.txt', 'a+') base_file.write(user_input) base_file.write('\n') base_file.close()

Related

file.write can't save int

file.read() output string position?

AttributeError: 'str' object has no attribute 'readlines'. Where did I go wrong in my code?

splitting a file into multiple files with a key word using python

Read and process a text file and save to csv

Categories

Resources