Data Deletion in File Using Python - python

As new programmer in Python Programming Language, I thought to create a student Database Management System in Python. But while deleting the Data from the file I got stuck and I thought to apply these steps to the file to delete the characters but how shall I Implement it? I have developed my code but it's not working.
The algorithm:
STEP 1: Create an additional file and open the current file in reading mode and open the new file in writing mode
STEP 2: Read and copy the Data to the newly created file except for the line we want to delete
STEP 3: Close both the file and remove the old file and rename the newly created file with the deleted filename
But while implementing it I got stuck on how to implement as it is not remaining the same.
Here is the code which I wrote:
def delete():
rollno = int(input('\n Enter The Roll number : '))
f = open('BCAstudents3.txt','r')
f1 = open('temp.txt','a+')
for line in f:
fo = line.split()
if fo:
if fo[3] != rollno:
f1.write(str(str(fo).replace('[','').replace(']','').replace("'","").replace(",","")))
f.close()
f1.close()
os.remove('BCAstudents3.txt')
os.rename('temp.txt','BCAstudents3.txt')
The Data From the Original File Looks Like This :
Roll Number = 1 Name : Alex Section = C Optimisation Technique = 99 Maths III = 99 Operating System = 99 Software Engneering = 99 Computer Graphics = 99 {Here Line change is present but it is not showing while typing on to stackoverflow } Roll Number = 2 Name : Shay Section = C Optimisation Technique = 99 Maths III = 99 Operating System = 99 Software Engneering = 99 Computer Graphics = 99`
and the Resullt after The Deletion is this :
Roll Number = 1 Name : Alex Section = C Optimisation Technique = 99 Maths III = 99 Operating System = 99 Software Engneering = 99 Computer Graphics = 99Roll Number = 2 Name : Shay Section = C Optimisation Technique = 99 Maths III = 99 Operating System = 99 Software Engneering = 99 Computer Graphics = 99
and I also want to give comma after the end of the data But don't have any idea that how to do this one

I modified your code and it should work how you wanted. A couple of things to consider:
Your original text file seems to indicate that there are line breaks for each Roll Number. I assumed that with my answer.
Because you are reading a text file, there are no integers so fo[3] would not ever match rollno if you are converting the input to an int.
I wasn't sure exactly where you wanted the comma. After each line? Or just at the very end.
I wasn't sure if you wanted new lines for each Roll Number.
def delete():
rollno = input('\n Enter The Roll number : ')
f = open('BCAstudents3.txt','r')
f1 = open('temp.txt','a+')
for line in f:
fo = line.split()
if fo:
if fo[3] != rollno:
newline = " ".join(fo) + ","
#print(newline)
f1.write(newline)
f.close()
f1.close()
os.remove('BCAstudents3.txt')
os.rename('temp.txt','BCAstudents3.txt')

I made your programm a little simpler.
Hopefully you can use it:
def delete():
line = input("Line you want to delete: ")
line = int(line)
line -= 1
file = open("file.txt","r")
data = file.readlines()
del data[line]
file = open("file.txt","w")
for line in data:
file.write(line)
file.close()

Related

Pickle persistence

I'm starting using Python recently and I'm trying to make a program that manipulates data in Python using pickle, however I would like my file to be kind like this:
CODE | PIECE | PRICE
line one 1 1 1,00
line two 2 2 2,00
Consider 1 right down CODE, 1 right down PIECE and 1,00 right down PRICE until gets 50.
Here's the question: Is there anyway to do this using pickle? Like:
columns = int(input('Number of columns : ')) # Which would be 3 (code, piece and price)
data = [ ]
for i in range(columns):
raw = input('Enter data '+str(i)+' : ')
data.append(raw)
file = open('file.dat', 'wb')
pickle.dump(data, file)
file.close()
Obviously, it cannot be done using input, so is there some way to do this?

Python File Reading & Writing

So I need to write a program that reads a text file, and copies its contents to another file. I then have to add a column at the end of the text file, and populate that column with an int that is calculated using the function calc_bill. I can get it to copy the contents of the original file to the new one, but I cannot seem to get my program to read in the ints necessary for calc_bill to run.
Any help would be greatly appreciated.
Here are the first 3 lines of the text file I am reading from:
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
It is copying the file exactly as it is supposed to to the new file. What is not working is writing the bill_amount (calc_bill)/ billVal(main) to the new file in a new column. Here is the expected output to the new file:
CustomerID Title FirstName MiddleName LastName Customer Type Company Name Start Reading End Reading BillVal
1 Mr. Orlando N. Gee Residential 297780 302555 some number
2 Mr. Keith NULL Harris Residential 274964 278126 some number
And here is my code:
def main():
file_in = open("water_supplies.txt", "r")
file_in.readline()
file_out = input("Please enter a file name for the output:")
output_file = open(file_out, 'w')
lines = file_in.readlines()
for line in lines:
lines = [line.split('\t')]
#output_file.write(str(lines)+ "\n")
billVal = 0
c_type = line[5]
start = int(line[7])
end = int(line[8])
billVal = calc_bill(c_type, start, end)
output_file.write(str(lines)+ "\t" + str(billVal) + "\n")
def calc_bill(customer_type, start_reading, end_reading):
price_per_gallon = 0
if customer_type == "Residential":
price_per_gallon = .012
elif customer_type == "Commercial":
price_per_gallon = .011
elif customer_type == "Industrial":
price_per_gallon = .01
if start_reading >= end_reading:
print("Error: please try again")
else:
reading = end_reading - start_reading
bill_amount = reading * price_per_gallon
return bill_amount
main()
There are the issues mentioned above, but here is a small change to your main() method that works correctly.
def main():
file_in = open("water_supplies.txt", "r")
# skip the headers in the input file, and save for output
headers = file_in.readline()
# changed to raw_input to not require quotes
file_out = raw_input("Please enter a file name for the output: ")
output_file = open(file_out, 'w')
# write the headers back into output file
output_file.write(headers)
lines = file_in.readlines()
for line in lines:
# renamed variable here to split
split = line.split('\t')
bill_val = 0
c_type = split[5]
start = int(split[6])
end = int(split[7])
bill_val = calc_bill(c_type, start, end)
# line is already a string, don't need to cast it
# added rstrip() to remove trailing newline
output_file.write(line.rstrip() + "\t" + str(bill_val) + "\n")
Note that the line variable in your loop includes the trailing newline, so you will need to strip that off as well if you're going to write it to the output file as-is. Your start and end indices were off by 1 as well, so I changed to split[6] and split[7].
It is a good idea to not require the user to include the quotes for the filename, so keep that in mind as well. An easy way is to just use raw_input instead of input.
Sample input file (from OP):
CustomerID Title FirstName MiddleName LastName Customer Type
1 Mr. Orlando N. Gee Residential 297780 302555
2 Mr. Keith NULL Harris Residential 274964 278126
$ python test.py
Please enter a file name for the output:test.out
Output (test.out):
1 Mr. Orlando N. Gee Residential 297780 302555 57.3
2 Mr. Keith NULL Harris Residential 274964 278126 37.944
There are a couple things. The inconsistent spacing in your column names makes counting the actual columns a bit confusing, but I believe there are 9 column names there. However, each of your rows of data have only 8 elements, so it looks like you've got an extra column name (maybe "CompanyName"). So get rid of that, or fix the data.
Then your "start" and "end" variables are pointing to indexes 7 and 8, respectively. However, since there are only 8 elements in the row, I think the indexes should be 6 and 7.
Another problem could be that inside your for-loop through "lines", you set "lines" to the elements in that line. I would suggest renaming the second "lines" variable inside the for-loop to something else, like "elements".
Aside from that, I'd just caution you about naming consistency. Some of your column names are camel-case and others have spaces. Some of your variables are separated by underscores and others are camel-case.
Hopefully that helps. Let me know if you have any other questions.
You have two errors in handling your variables, both in the same line:
lines = [line.split()]
You put this into your lines variable, which is the entire file contents. You just lost the rest of your input data.
You made a new list-of-list from the return of split.
Try this line:
line = line.split()
I got reasonable output with that change, once I make a couple of assumptions about your placement of tabs.
Also, consider not overwriting a variable with a different data semantic; it confuses the usage. For instance:
for record in lines:
line = record.split()

Python reading rows from csv, operating and organizing rows of numbers

I am a non-programmer geographer, heard some programming concepts but very newby :-)
I am to read six rows of environmental data. 1000 lines at the most, each time.
Each row housing two digit numbers (0 to 99) a summer issue, only positive numbers.
Once I read them I am to display the numbers 0 to 99 vertically with the number of occurrences for the reading for each of the six rows:
0 = 230.....0 = 3........0 = 230......0 = 123......0 = 223......0 = 334
1 = 67......1 = 657......1 = 627......1 = 767......1 = 467......1 = 337
2 = 762.....2 = 328......2 = 987......2 = 326......2 = 32.......2 = 123
.
.
99 = 3.....99 = 34.......99 = 1.......99 = 89......99 = 78......99 = 123
If I can get this far I will feel great. Once I learn how to do this and I can look at the data I can decide what makes sense to run next; excel, graphs, statistics, statistics in R, get the numbers into a matrix to manipulate from there, etc. First time so I am figuring this out as I go.
Any help will be much appreciated,
Adolfo
I am working in the research for the restoration of Quebrada Verde watershed in Valparaiso, Chile.
from array import array
import sys
if len(sys.argv) > 1:
count = array('H', [0]*100)
file = open(sys.argv[1], 'r')
if file:
for line in file:
count[int(line)]+=1
file.close()
for a in range (100):
print(a, count[a], sep='\t')
else:
print('unable to open the file')
else:
print('usage: python', sys.argv[0], ' file')

Creating column data from multiple sources with varying formats in python

So as part of my code, I'm reading file paths that have varying names, but tend to stick to the following format
p(number)_(temperature)C
What I've done with those paths is separate it into 2 columns (along with 2 more columns with actual data) so I end up with a row that looks like this:
p2 18 some number some number
However, I've found a few folders that use the following format:
p(number number)_(temperature)C
As it stands, for the first case, I use the following code to separate the file path into the proper columns:
def finale():
for root, dirs, files in os.walk('/Users/Bashe/Desktop/12/'):
file_name = os.path.join(root,"Graph_Info.txt")
file_name_out = os.path.join(root,"Graph.txt")
file = os.path.join(root, "StDev.txt")
if os.path.exists(os.path.join(root,"Graph_Info.txt")):
with open(file_name) as fh, open(file) as th, open(file_name_out,"w") as fh_out:
first_line = fh.readline()
values = eval(first_line)
for value, line in zip(values, fh):
first_column = value[0:2]
second_column = value[3:5]
third_column = line.strip()
fourth_column = th.readline().strip()
fh_out.write("%s\t%s\t%s\t%s\n" % (first_column, second_column, third_column, fourth_column))
else:
pass
I've played around with things and found that if I make the following changes, the program works properly.
first_column = value[0:3]
second_column = value[4:6]
Is there a way I can get the program to look and see what the file path is and act accordingly?
welcome to the fabulous world of regex.
import re
#..........
#case 0
if re.match(r"p\(\d+\).*", path) :
#stuff
#case 1
elif re.match(r"p\(\d+\s\d+\).*", path):
#other stuff
>>> for line in s.splitlines():
... first,second = re.search("p([0-9 ]+)_(\d+)C",line).groups()
... print first, " +",second
...
22 + 66
33 44 + 44
23 33 + 22

How do you make tables with previously stored strings?

So the question basically gives me 19 DNA sequences and wants me to makea basic text table. The first column has to be the sequence ID, the second column the length of the sequence, the third is the number of "A"'s, 4th is "G"'s, 5th is "C", 6th is "T", 7th is %GC, 8th is whether or not it has "TGA" in the sequence. Then I get all these values and write a table to "dna_stats.txt"
Here is my code:
fh = open("dna.fasta","r")
Acount = 0
Ccount = 0
Gcount = 0
Tcount = 0
seq=0
alllines = fh.readlines()
for line in alllines:
if line.startswith(">"):
seq+=1
continue
Acount+=line.count("A")
Ccount+=line.count("C")
Gcount+=line.count("G")
Tcount+=line.count("T")
genomeSize=Acount+Gcount+Ccount+Tcount
percentGC=(Gcount+Ccount)*100.00/genomeSize
print "sequence", seq
print "Length of Sequence",len(line)
print Acount,Ccount,Gcount,Tcount
print "Percent of GC","%.2f"%(percentGC)
if "TGA" in line:
print "Yes"
else:
print "No"
fh2 = open("dna_stats.txt","w")
for line in alllines:
splitlines = line.split()
lenstr=str(len(line))
seqstr = str(seq)
fh2.write(seqstr+"\t"+lenstr+"\n")
I found that you have to convert the variables into strings. I have all of the values calculated correctly when I print them out in the terminal. However, I keep getting only 19 for the first column, when it should go 1,2,3,4,5,etc. to represent all of the sequences. I tried it with the other variables and it just got the total amounts of the whole file. I started trying to make the table but have not finished it.
So my biggest issue is that I don't know how to get the values for the variables for each specific line.
I am new to python and programming in general so any tips or tricks or anything at all will really help.
I am using python version 2.7
Well, your biggest issue:
for line in alllines: #1
...
fh2 = open("dna_stats.txt","w")
for line in alllines: #2
....
Indentation matters. This says "for every line (#1), open a file and then loop over every line again(#2)..."
De-indent those things.
This puts the info in a dictionary as you go and allows for DNA sequences to go over multiple lines
from __future__ import division # ensure things like 1/2 is 0.5 rather than 0
from collections import defaultdict
fh = open("dna.fasta","r")
alllines = fh.readlines()
fh2 = open("dna_stats.txt","w")
seq=0
data = dict()
for line in alllines:
if line.startswith(">"):
seq+=1
data[seq]=defaultdict(int) #default value will be zero if key is not present hence we can do +=1 without originally initializing to zero
data[seq]['seq']=seq
previous_line_end = "" #TGA might be split accross line
continue
data[seq]['Acount']+=line.count("A")
data[seq]['Ccount']+=line.count("C")
data[seq]['Gcount']+=line.count("G")
data[seq]['Tcount']+=line.count("T")
data[seq]['genomeSize']+=data[seq]['Acount']+data[seq]['Gcount']+data[seq]['Ccount']+data[seq]['Tcount']
line_over = previous_line_end + line[:3]
data[seq]['hasTGA']= data[seq]['hasTGA'] or ("TGA" in line) or (TGA in line_over)
previous_line_end = str.strip(line[-4:]) #save previous_line_end for next line removing new line character.
for seq in data.keys():
data[seq]['percentGC']=(data[seq]['Gcount']+data[seq]['Ccount'])*100.00/data[seq]['genomeSize']
s = '%(seq)d, %(genomeSize)d, %(Acount)d, %(Ccount)d, %(Tcount)d, %(Tcount)d, %(percentGC).2f, %(hasTGA)s'
fh2.write(s % data[seq])
fh.close()
fh2.close()

Categories

Resources