So, basically, I need a program that opens a .dat file, checks each line to see if it meets certain prerequisites, and if they do, copy them into a new csv file.
The prerequisites are that it must 1) contain "$W" or "$S" and 2) have the last value at the end of the line of the DAT say one of a long list of acceptable terms. (I can simply make-up a list of terms and hardcode them into a list)
For example, if the CSV was a list of purchase information and the last item was what was purchased, I only want to include fruit. In this case, the last item is an ID Tag, and I only want to accept a handful of ID Tags, but there is a list of about 5 acceptable tags. The Tags have very veriable length, however, but they are always the last item in the list (and always the 4th item on the list)
Let me give a better example, again with the fruit.
My original .DAT might be:
DGH$G$H $2.53 London_Port Gyro
DGH.$WFFT$Q5632 $33.54 55n39 Barkdust
UYKJ$S.52UE $23.57 22#3 Apple
WSIAJSM_33$4.FJ4 $223.4 Ha25%ek Banana
Only the line: "UYKJ$S $23.57 22#3 Apple" would be copied because only it has both 1) $W or $S (in this case a $S) and 2) The last item is a fruit. Once the .csv file is made, I am going to need to go back through it and replace all the spaces with commas, but that's not nearly as problematic for me as figuring out how to scan each line for requirements and only copy the ones that are wanted.
I am making a few programs all very similar to this one, that open .dat files, check each line to see if they meet requirements, and then decides to copy them to the new file or not. But sadly, I have no idea what I am doing. They are all similar enough that once I figure out how to make one, the rest will be easy, though.
EDIT: The .DAT files are a few thousand lines long, if that matters at all.
EDIT2: The some of my current code snippets
Right now, my current version is this:
def main():
#NewFile_Loc = C:\Users\J18509\Documents
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
OldText = OldFile.read()
# for i in range(0, len(OldText)):
# if (OldText[i] != " "):
# print OldText[i]
i = split_line(OldText)
if u'$S' in i:
# $S is in the line
print i
main()
But it's very choppy still. I'm just learning python.
Brief update: the server I am working on is down, and might be for the next few hours, but I have my new code, which has syntax errors in it, but here it is anyways. I'll update again once I get it working. Thanks a bunch everyone!
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find($W)) or (LineParts[0].find($S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
There are two parts you need to implement: First, read a file line by line and write lines meeting a specific criteria. This is done by
with open('file.dat') as f:
for line in f:
stripped = line.strip() # remove '\n' from the end of the line
if test_line(stripped):
print stripped # Write to stdout
The criteria you want to check for are implemented in the function test_line. To check for the occurrence of "$W" or "$S", you can simply use the in-Operator like
if not '$W' in line and not '$S' in line:
return False
else:
return True
To check, if the last item in the line is contained in a fixed list, first split the line using split(), then take the last item using the index notation [-1] (negative indices count from the end of a sequence) and then use the in operator again against your fixed list. This looks like
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Now, you combine these two parts into the test_line function like
def test_line(line):
if not '$W' in line and not '$S' in line:
return False
items = line.split() # items is an array of strings
last_item = items[-1] # take the last element of the array
if last_item in ['Apple', 'Banana']:
return True
else:
return False
Note that the program writes the result to stdout, which you can easily redirect. If you want to write the output to a file, have a look at Correct way to write line to file in Python
inlineRequirements = ['$W','$S']
endlineRequirements = ['Apple','Banana']
inputFile = open(input_filename,'rb')
outputFile = open(output_filename,'wb')
for line in inputFile.readlines():
line = line.strip()
#trailing and leading whitespace has been removed
if any(req in line for req in inlineRequirements):
#passed inline requirement
lastWord = line.split(' ')[-1]
if lastWord in endlineRequirements:
#passed endline requirement
outputFile.write(line.replace(' ',','))
#replaced spaces with commas and wrote to file
inputFile.close()
outputFile.close()
tags = ['apple', 'banana']
match = ['$W', '$S']
OldFile_Loc=raw_input("Input File for MCLG:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile.readlines(): # Loop through the file
line = line.strip() # Remove the newline and whitespace
if line and not line.isspace(): # If the line isn't empty
lparts = line.split() # Split the line
if any(tag.lower() == lparts[-1].lower() for tag in tags) and any(c in line for c in match):
# $S or $W is in the line AND the last section is in tags(case insensitive)
print line
import re
list_of_fruits = ["Apple","Bannana",...]
with open('some.dat') as f:
for line in f:
if re.findall("\$[SW]",line) and line.split()[-1] in list_of_fruits:
print "Found:%s" % line
import os
NewFilePath = "A:\test.txt"
Acceptable_Values = ('Apple','Banana')
#Main
def main():
if os.path.isfile(NewFilePath):
os.remove(NewFilePath)
NewFile = open (NewFilePath, 'w')
NewFile.write('Header 1,','Name Header,','Header 3,','Header 4)
OldFile_Loc=raw_input("Input File for Program:")
OldFile = open(OldFile_Loc,"r")
for line in OldFile:
LineParts = line.split()
if (LineParts[0].find(\$W)) or (LineParts[0].find(\$S)):
if LineParts[3] in Acceptable_Values:
print(LineParts[1], ' is accepted')
#This Line is acceptable!
NewFile.write(LineParts[1],',',LineParts[0],',',LineParts[2],',',LineParts[3])
OldFile.close()
NewFile.close()
main()
This worked great, and has all the capabilities I needed. The other answers are good, but none of them do 100% of what I needed like this one does.
Related
I'm testing the code below, but it doensn't do what I would like it to do.
delete_if = ['#', ' ']
with open('C:\\my_path\\AllDataFinal.txt') as oldfile, open('C:\\my_path\\AllDataFinalFinal.txt', 'w') as newfile:
for line in oldfile:
if not any(del_it in line for del_it in delete_if):
newfile.write(line)
print('DONE!!')
Basically, I want to delete any line that contains a '#' character (the lines I want to delete start with a '#' character). Also, I want to delete any/all lines that are completely blank. Can I do this in on go, by reading through items in a list, or will it require several passes through the text file to clean up everything? TIA.
It's easy. Check my code below :
filePath = "your old file path"
newFilePath = "your new file path"
# we are going to list down which lines start with "#" or just blank
marker = []
with open(filePath, "r") as file:
content = file.readlines() # read all lines and store them into list
for i in range(len(content)): # loop into the list
if content[i][0] == "#" or content[i] == "\n": # check if the line starts with "#" or just blank
marker.append(i) # store the index into marker list
with open(newFilePath, "a") as file:
for i in range(len(content)): # loop into the list
if not i in marker: # if the index is not in marker list, then continue writing into file
file.writelines(content[i]) # writing lines into file
The point is, we need to read all the lines first. And check line by line whether it starts with # or it's just blank. If yes, then store it into a list variable. After that, we can continue writing into new file by checking if the index of the line is in marker or not.
Let me know if you have problem.
How about using the ternary operator?
#First option: within your for loop
line = "" if "#" in line or not line else line
#Second option: with list comprehension
newFile = ["" if not line or "#" in line else line for line in oldfile]
I'm not sure if the ternary would work because if the string is empty, an Exception should be shown because "#" won't be in an empty string... How about
#Third option: "Staging your conditions" within your for loop
#First, make sure the string is not empty
if line:
#If it has the "#" char in it, delete it
if "#" in line:
line = ""
#If it is, delete it
else:
line = ""
I am new to Python.
Scenario:
apple=gravity search this pattern in file
search for apple if exist fetch corresponding value for apple,
if it is apple=gravity then case pass .
file structure (test.txt )
car=stop
green=go
apple=gravity
Please provide some suggestions as to how I can search value for key in file using Python
Sample:
f = open('test.txt', 'r')
wordCheck="apple=gravity";
for line in f:
if 'wordCheck' == line:
print ('found')
else:
print ('notfound')
break
Split your line with =
Check if apple is present in your first index! If true then, print the second index!
Note:
While reading lines from file, the '\n' character will be present. To get your line without \n read you content from file and use splitlines()!
To make it clean, strip the spaces from the beginning and end of your line to avoid glitches caused by spaces at the beginning and end of your line!
That is,
f = open('test.txt', 'r')
for line in map(str.strip,f.read().splitlines()):
line = line.split('=')
if 'apple' == line[0]:
print line[1]
else:
print ('notfound')
Output:
notfound
notfound
gravity
Hope it helps!
Iterating through the file directly as you are doing, is just fine, and considered more 'Pythonic' than readlines() (or indeed read().splitlines()).
Here, I strip the newline from each line and then split by the = to get the two halves.
Then, I test for the check word, and if present print out the other half of the line.
Note also that I have used the with context manager to open the file. This makes sure that the file is closed, even if an exception occurs.
with open('test.txt', 'r') as f:
wordcheck="apple"
for line in f:
key, val = line.strip().split('=')
if wordcheck == key:
print (val)
else:
print ('notfound')
Using python I need to read a file and determine if all lines are the same length or not. If they are I move the file into a "good" folder and if they aren't all the same length I move them into a "bad" folder and write a word doc that says which line was not the same as the rest. Any help or ways to start?
You should use all():
with open(filename) as read_file:
length = len(read_file.readline())
if all(len(line) == length for line in read_file):
# Move to good folder
else:
# Move to bad folder
Since all() is short-circuiting, it will stop reading the file at the first non-match.
First off, you can read the file, here example.txt and put all lines in a list, content:
with open(filename) as f:
content = f.readlines()
Next you need to trim all the newline characters from the end of a line and put it in another list result:
for line in content:
line = line.strip()
result.append(line)
Now it's not that hard to get the length of every sentence, and since you want lines that are bad, you loop through the list:
for line in result:
lengths.append(len(line))
So the i-th element of result has length [i-th element of lengths]. We can make a counter for what line length occurs the most in the list, it is as simple as one line!
most_occuring = max(set(lengths), key=lengths.count)
Now we can make another for-loop to check which lengths don't correspond with the most-occuring and add those to bad-lines:
for i in range(len(lengths)):
if (lengths[i] != most_occuring):
bad_lines.append([i, result[i]])
The next step is check where the file needs to go, the good folder, or the bad folder:
if len(bad_lines == 0):
#Good file, move it to the good folder, use the os or shutil module
os.rename("path/to/current/file.foo", "path/to/new/desination/for/file.foo")
else:
#Bad file, one or more lines are bad, thus move it to the bad folder
os.rename("path/to/current/file.foo", "path/to/new/desination/for/file.foo")
The last step is writing the bad lines to another file, which is do-able, since we have the bad lines already in a list bad_lines:
with open("bad_lines.txt", "wb") as f:
for bad_line in bad_lines:
f.write("[%3i] %s\n" % (bad_line[0], bad_line[1]))
It's not a doc file, but I think this is a nice start. You can take a look at the docx module if you really want to write to a doc file.
EDIT: Here is an example python script.
with open("example.txt") as f:
content = f.readlines()
result = []
lengths = []
#Strip the file of \n
for line in content:
line = line.strip()
result.append(line)
lengths.append(len(line))
most_occuring = max(set(lengths), key=lengths.count)
bad_lines = []
for i in range(len(lengths)):
if (lengths[i] != most_occuring):
#Append the bad_line to bad_lines
bad_lines.append([i, result[i]])
#Check if it's a good, or a bad file
#if len(bad_lines == 0):
#Good File
#Move file to the good folder...
#else:
#Bad File
with open("bad_lines.txt", "wb") as f:
for bad_line in bad_lines:
f.write("[%3i] %s\n" % (bad_line[0], bad_line[1]))
highest_score = 0
g = open("grades_single.txt","r")
arrayList = []
for line in highest_score:
if float(highest_score) > highest_score:
arrayList.extend(line.split())
g.close()
print(highest_score)
Hello, wondered if anyone could help me , I'm having problems here. I have to read in a file of which contains 3 lines. First line is no use and nor is the 3rd. The second contains a list of letters, to which I have to pull them out (for instance all the As all the Bs all the Cs all the way upto G) there are multiple letters of each. I have to be able to count how many off each through this program. I'm very new to this so please bear with me if the coding created is wrong. Just wondered if anyone could point me in the right direction of how to pull out these letters on the second line and count them. I then have to do a mathamatical function with these letters but I hope to work that out for myself.
Sample of the data:
GTSDF60000
ADCBCBBCADEBCCBADGAACDCCBEDCBACCFEABBCBBBCCEAABCBB
*
You do not read the contents of the file. To do so use the .read() or .readlines() method on your opened file. .readlines() reads each line in a file seperately like so:
g = open("grades_single.txt","r")
filecontent = g.readlines()
since it is good practice to directly close your file after opening it and reading its contents, directly follow with:
g.close()
another option would be:
with open("grades_single.txt","r") as g:
content = g.readlines()
the with-statement closes the file for you (so you don't need to use the .close()-method this way.
Since you need the contents of the second line only you can choose that one directly:
content = g.readlines()[1]
.readlines() doesn't strip a line of is newline(which usually is: \n), so you still have to do so:
content = g.readlines()[1].strip('\n')
The .count()-method lets you count items in a list or in a string. So you could do:
dct = {}
for item in content:
dct[item] = content.count(item)
this can be made more efficient by using a dictionary-comprehension:
dct = {item:content.count(item) for item in content}
at last you can get the highest score and print it:
highest_score = max(dct.values())
print(highest_score)
.values() returns the values of a dictionary and max, well, returns the maximum value in a list.
Thus the code that does what you're looking for could be:
with open("grades_single.txt","r") as g:
content = g.readlines()[1].strip('\n')
dct = {item:content.count(item) for item in content}
highest_score = max(dct.values())
print(highest_score)
highest_score = 0
arrayList = []
with open("grades_single.txt") as f:
arraylist.extend(f[1])
print (arrayList)
This will show you the second line of that file. It will extend arrayList then you can do whatever you want with that list.
import re
# opens the file in read mode (and closes it automatically when done)
with open('my_file.txt', 'r') as opened_file:
# Temporarily stores all lines of the file here.
all_lines_list = []
for line in opened_file.readlines():
all_lines_list.append(line)
# This is the selected pattern.
# It basically means "match a single character from a to g"
# and ignores upper or lower case
pattern = re.compile(r'[a-g]', re.IGNORECASE)
# Which line i want to choose (assuming you only need one line chosen)
line_num_i_need = 2
# (1 is deducted since the first element in python has index 0)
matches = re.findall(pattern, all_lines_list[line_num_i_need-1])
print('\nMatches found:')
print(matches)
print('\nTotal matches:')
print(len(matches))
You might want to check regular expressions in case you need some more complex pattern.
To count the occurrences of each letter I used a dictionary instead of a list. With a dictionary, you can access each letter count later on.
d = {}
g = open("grades_single.txt", "r")
for i,line in enumerate(g):
if i == 1:
holder = list(line.strip())
g.close()
for letter in holder:
d[letter] = holder.count(letter)
for key,value in d.iteritems():
print("{},{}").format(key,value)
Outputs
A,9
C,15
B,15
E,4
D,5
G,1
F,1
One can treat the first line specially (and in this case ignore it) with next inside try: except StopIteration:. In this case, where you only want the second line, follow with another next instead of a for loop.
with open("grades_single.txt") as f:
try:
next(f) # discard 1st line
line = next(f)
except StopIteration:
raise ValueError('file does not even have two lines')
# now use line
here is an example text file
the bird flew
the dog barked
the cat meowed
here is my code to find the line number of the phrase i want to delete
phrase = 'the dog barked'
with open(filename) as myFile:
for num, line in enumerate(myFile, 1):
if phrase in line:
print 'found at line:', num
what can i add to this to be able to delete the line number (num)
i have tried
lines = myFile.readlines()
del line[num]
but this doesnt work how should i approach this?
You could use the fileinput module to update the file - note this will remove all lines containing the phrase:
import fileinput
for line in fileinput.input(filename, inplace=True):
if phrase in line:
continue
print(line, end='')
A user by the name of gnibbler posted something similar to this on another thread.
Modify the file in place, offending line is replaced with spaces so the remainder of the file does not need to be shuffled around on disk. You can also "fix" the line in place if the fix is not longer than the line you are replacing
If the other program can be changed to output the fileoffset instead of the line number, you can assign the offset to p directly and do without the for loop
import os
from mmap import mmap
phrase = 'the dog barked'
filename = r'C:\Path\text.txt'
def removeLine(filename, num):
f=os.open(filename, os.O_RDWR)
m=mmap(f,0)
p=0
for i in range(num-1):
p=m.find('\n',p)+1
q=m.find('\n',p)
m[p:q] = ' '*(q-p)
os.close(f)
with open(filename) as myFile:
for num, line in enumerate(myFile, 1):
if phrase in line:
removeLine(filename, num)
print 'Removed at line:', num
I found another solution that works efficiently and gets by without doing all the icky and not so elegant counting of lines within the file object:
del_line = 3 #line to be deleted: no. 3 (first line is no. 1)
with open("textfile.txt","r") as textobj:
list = list(textobj) #puts all lines in a list
del list[del_line - 1] #delete regarding element
#rewrite the textfile from list contents/elements:
with open("textfile.txt","w") as textobj:
for n in list:
textobj.write(n)
Detailed explanation for those who want it:
(1) Create a variable containing an integer value of the line-number you want to have deleted. Let's say I want to delete line #3:
del_line = 3
(2) Open the text file and put it into a file-object. Only reading-mode is necessary for now. Then, put its contents into a list:
with open("textfile.txt","r") as textobj:
list = list(textobj)
(3) Now every line should be an indexed element in "list". You can proceed by deleting the element representing the line you want to have deleted:
del list[del_line - 1]
At this point, if you got the line no. that is supposed to be deleted from user-input, make sure to convert it to integer first since it will be in string format most likely(if you used "input()").
It's del_line - 1 because the list's element-index starts at 0. However, I assume you (or the user) start counting at "1" for line no. 1, in which case you need to deduct 1 to catch the correct element in the list.
(4) Open the list file again, this time in "write-mode", rewriting the complete file. After that, iterate over the updated list, rewriting every element of "list" into the file. You don't need to worry about new lines because at the moment you put the contents of the original file into a list (step 2), the \n escapes will also be copied into the list elements:
with open("textfile.txt","w") as textobj:
for n in list:
textobj.write(n)
This has done the job for me when I wanted the user to decide which line to delete in a certain text file.
I think Martijn Pieters's answer does sth. similar, however his explanation is to little for me to be able to tell.
Assuming num is the line number to remove:
import numpy as np
a=np.genfromtxt("yourfile.txt",dtype=None, delimiter="\n")
with open('yourfile.txt','w') as f:
for el in np.delete(a,(num-1),axis=0):
f.write(str(el)+'\n')
You start counting at one, but python indices are always zero-based.
Start your line count at zero:
for num, line in enumerate(myFile): # default is to start at 0
or subtract one from num, deleting from lines (not line):
del lines[num - 1]
Note that in order for your .readlines() call to return any lines, you need to either re-open the file first, or seek to the start:
myFile.seek(0)
Try
lines = myFile.readlines()
mylines = [x for x in lines if x.find(phrase) < 0]
Implementing #atomh33ls numpy approach
So you want to delete any line in the file that contain the phrase string, right? instead of just deleting the phrase string
import numpy as np
phrase = 'the dog barked'
nums = []
with open("yourfile.txt") as myFile:
for num1, line in enumerate(myFile, 0):
# Changing from enumerate(myFile, 1) to enumerate(myFile, 0)
if phrase in line:
nums.append(num1)
a=np.genfromtxt("yourfile.txt",dtype=None, delimiter="\n", encoding=None )
with open('yourfile.txt','w') as f:
for el in np.delete(a,nums,axis=0):
f.write(str(el)+'\n')
where text file is,
the bird flew
the dog barked
the cat meowed
produces
the bird flew
the cat meowed