I have this raw data set in a text file where each line is a new piece of data, I need to iterate through the file line by line and change the lines that are dates to a specific date format. Theses dates occur on lines 2, 7, 16, 23 etc
In order to do this I need to iterate over those specific lines so as not to corrupt the data that is on the other lines.
Would there be any way to iterate this way in python?
here is a screencap of the data..
You can see the lines i want manipulated can be found at lines 2, 9, 16, 23 etc
The dates ranges are in the format of Month/Day - Month/Day in case ye have any difficulty finding.
And I will also include the raw text too
It can be found at this link
Link to raw data
#my rough idea
infile = open("polling_Data.txt", "W+")
for line in infile: # specified range
#code to edit date etc
Let me know if ye have any relevant solutions i know that maybe some form of regex may be suitable, open to all sorts of ideas thanks!
I really suggest writing in a new file and then delete the old file just in case anything went wrong along the way. You can do that using the following code:
import re
month_day_regex = r"(\d{,2}/\d{,2} - \d{,2}/\d{,2})"
new_data = []
# reading "polling_Data" text file
with open("polling_Data.txt", "r") as infile:
for line in infile.readlines():
line = line.strip()
if re.match(month_day_regex, line):
new_data.append("##########") #do whatever you want
else:
new_data.append(line)
Now, the variable new_data has the same data as the text file with one change which is replacing the Month/Day entries with ######### just to stand out. Now, let's write this variable into a new file:
with open("new_polling_data.txt", "w") as outfile:
for line in new_data:
outfile.write(line+'\n')
And here is a screenshot of the new file
Try \b\d\d?/\d\d?[ ]?-[ ]?\d\d?/\d\d?\b
demo
Related
I am new to python and using it for my internship. My goal is to pull specific data from about 100 .ls documents (all in the same folder) and then write it to another .txt file and from there import it into excel. My problem is I can read all the files, but cannot figure out how to pull the specifics from that file into a list. From the list I want to write them into a .txt file and then import to excel.
Is there anyway to read set readlines() to only capture certain lines?
It's hard to know exactly what you want without an example or sample code/content. What you might do is create a list and append the desired line to it.
result_list = [] # Create an empty list
with open("myfile.txt", "r") as f:
Lines = f.readlines() # read the lines of the file
for line in Lines: # loop through the lines
if "desired_string" in line:
result_list.append(line) # if the line contains the string, the line is added
I have a .txt file which contains some data of the voltage, current and radiation lectures of an experiment, and it is displayed as it follows.
I am really new at Python, so I would like to know if there is any function or library that could help me to select only some lines, specifically those where the value displayed in the second row it is positive and higher than let's say, 10, and then re write it on a new file.
Also if you can recommend me any literature to get more into it, I would appreciate it. Thank you in advance!
You can use readlines()
with open('file_name','r') as f:
lines = f.readlines() #this will create a list with each element as a line
with open('file2','w') as f2:
f2.write(lines[0]) #this will write 1st line to a new file
f2.writelines(lines) #this will write all the lines in new file
Nothing is better than official documentation for learning,
If that isn't sufficient check it out
I am trying to write a python script to read in a large text file from some modeling results, grab the useful data and save it as a new array. The text file is output in a way that has a ## starting each line that is not useful. I need a way to search through and grab all the lines that do not include the ##. I am used to using grep -v in this situation and piping to a file. I want to do it in python!
Thanks a lot.
-Tyler
I would use something like this:
fh = open(r"C:\Path\To\File.txt", "r")
raw_text = fh.readlines()
clean_text = []
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line)
Or you could also clean the newline and carriage return non-printing characters at the same time with a small modification:
for line in raw_text:
if not line.startswith("##"):
clean_text.append(line.rstrip("\r\n"))
You would be left with a list object that contains one line of required text per element. You could split this into individual words using string.split() which would give you a nested list per original list element which you could easily index (assuming your text has whitespaces of course).
clean_text[4][7]
would return the 5th line, 8th word.
Hope this helps.
[Edit: corrected indentation in loop]
My suggestion would be to do the following:
listoflines = [ ]
with open(.txt, "r") as f: # .txt = file, "r" = read
for line in f:
if line[:2] != "##": #Read until the second character
listoflines.append(line)
print listoflines
If you're feeling brave, you can also do the following, CREDITS GO TO ALEX THORNTON:
listoflines = [l for l in f if not l.startswith('##')]
The other answer is great as well, especially teaching the .startswith function, but I think this is the more pythonic way and also has the advantage of automatically closing the file as soon as you're done with it.
I have one issue over here in file importing data from text file using python.
I have data like this in my file.
{1:F05ABCDRPRAXXX0000000000}{2:I1230AGRIXXPRXXXXN}{4:
:20:1234567980
:25:AB123465789013246578900000000000
:28c:110/1123156
-}
So from above data I want to fetch data after {4: and line by line like first line is :20:1234567980 and so on.
I want to split data using regular expression So if any python expert have idea how make regular expression for this so provide in answer it will help.
Thank you
If you want to get the lines in a file use
lines = list()
with open("yourfiile.txt") as f:
for line in f:
lines.append(line)
lines.pop(0) #remove the first line (which ends with "{4:")
#do what you want with list of lines
I have an issue which has to do with file input and output in Python (it's a continuation from this question: how to extract specific lines from a data file, which has been solved now).
So I have one big file, danish.train, and eleven small files (called danish.test.part-01 and so on), each of them containing a different selection of the data from the danish.train file. Now, for each of the eleven files, I want to create an accompanying file that complements them. This means that for each small file, I want to create a file that contains the contents of danish.train minus the part that is already in the small file.
What I've come up with so far is this:
trainFile = open("danish.train")
for file_number in range(1,12):
input = open('danish.test.part-%02d' % file_number, 'r')
for line in trainFile:
if line not in input:
with open('danish.train.part-%02d' % file_number, 'a+') as myfile:
myfile.write(line)
The problem is that this code only gives output for file_number 1, although I have a loop from 1-11. If I change the range, for example to in range(2,3), I get an output danish.train.part-02, but this output contains a copy of the whole danish.train without leaving out the contents of the file danish.test.part-02, as I wanted.
I suspect that these issues may have something to do with me not completely understanding the with... as operator, but I'm not sure. Any help would be greatly appreciated.
When you open a file, it returns an iterator through the lines of the file. This is nice, in that it lets you go through the file, one line at a time, without keeping the whole file into memory at once. In your case, it leads to a problem, in that you need to iterate through the file multiple times.
Instead, you can read the full training file into memory, and go through it multiple times:
with open("danish.train", 'r') as f:
train_lines = f.readlines()
for file_number in range(1, 12):
with open("danish.test.part-%02d" % file_number, 'r') as f:
test_lines = set(f)
with open("danish.train.part-%02d" % file_number, 'w') as g:
g.writelines(line for line in train_lines if line not in test_lines)
I've simplified the logic a little bit, as well. If you don't care about the order of the lines, you could also consider reading the training lines into a set, and then just use set operations instead of the generator expression I used in the final line.