Parse text file which groups data

Parse text file which groups data - python

Trying to figure out how to extract strings and put into new file on new line for each string
Can't get my head around RegEx and all the things I'm looking at online show the data all being on one line but mine is already separated.
Trying to parse the output of another program, it outputs three lines Date,Address,Name and then has a newline and another set of three and I only need Address.
fo = open("C:\Sampledata.txt", "r")
item = fo.readlines(
Not even got anything working yet!

outList = []
inText = open("C:\Sampledata.txt", "r").read()
for line in inText.split("\n"):
Date,Address,Name = line.split(",")
outList .append(Address)
outText = "\n".join(outList )
open("outFile.txt","w").write(outText )

I'm not quite sure if this addesses your problem, but maybe something like:
addresses = list()
with open("file1", "r") as input:
for line in input:
if line.startswith("Address"):
addresses.append(line.strip("\n"))
Edit: Or if "Adress" is only contained once per file you can break the loop after detecting the line starting with "Address":
addresses = list()
with open("file1", "r") as input:
for line in input:
if line.startswith("Address"):
addresses.append(line.strip("\n"))
break
Then you can write all adresses into a new file.
with open("newFile", "w") as outfile:
for adress in addresses:
outfile.write(adress + "\n")

Related

Using Regex to search a plaintext file line by line and cherry pick lines based on matches

I'm trying to read a plaintext file line by line, cherry pick lines that begin with a pattern of any six digits. Pass those to a list and then write that list row by row to a .csv file.
Here's an example of a line I'm trying to match in the file:
**000003** ANW2248_08_DESOLATE-WASTELAND-3. A9 C 00:55:25:17 00:55:47:12 10:00:00:00 10:00:21:20
And here is a link to two images, one showing the above line in context of the rest of the file and the expected result: https://imgur.com/a/XHjt9e1
import csv
identifier = re.compile(r'^(\d\d\d\d\d\d)')
matched_line = []
with open('file.edl', 'r') as file:
reader = csv.reader(file)
for line in reader:
line = str(line)
if identifier.search(line) == True:
matched_line.append(line)
else: continue
with open('file.csv', 'w') as outputEDL:
print('Copying EDL contents into .csv file for reformatting...')
outputEDL.write(str(matched_line))
Expected result would be the reader gets to a line, searches using the regex, then if the result of the search finds the series of 6 numbers at the beginning, it appends that entire line to the matched_line list.
What I'm actually getting is, once I write what reader has read to a .csv file, it has only picked out [], so the regex search obviously isn't functioning properly in the way I've written this code. Any tips on how to better form it to achieve what I'm trying to do would be greatly appreciated.
Thank you.

Some more examples of expected input/output would better help with solving this problem but from what I can see you are trying to write each line within a text file that contains a timestamp to a csv. In that case here is some psuedo code that might help you solve your problem as well as a separate regex match function to make your code more readable
import re
def match_time(line):
pattern = re.compile(r'(?:\d+[:]\d+[:]\d+[:]\d+)+')
result = pattern.findall(line)
return " ".join(result)
This will return a string of the entire timecode if a match is found
lines = []
with open('yourfile', 'r') as txtfile:
with open('yourfile', 'w') as csvfile:
for line in txtfile:
res = match_line(line)
#alternatively you can test if res in line which might be better
if res != "":
lines.append(line)
for item in lines:
csvfile.write(line)
Opens a text file for reading, if the line contains a timecode, appends the line to a list, then iterates that list and writes the line to the csv.

to read from a text file 4 lines or several lines at a time

I'm trying to write a code for a cellphone register on python. I'm suppose to read diffrent contacts from a text file. Every contact person on the list takes about 4 lines, I tried to read one line at a time(it works), But I wonder if there is easier way, for example to read 4 lines directly och creat an object list or a list, is it possible? if it is how?

I'm not sure what you mean by 'about 4 lines', but here's a start:
with open('thefile.txt') as infile:
while True:
parts = [infile.readline() for _ in range(4)]
if not any(parts):
break
part1, part2, part3, part4 = parts

Assume the file you try to read is contacts.txt and it's at current path
with open('contacts.txt','r') as f:
lines = f.readlines()
for i in xrange(0,len(lines),4):
contact_source = lines[0:i]
BuildObject(contact_source)

You can possibly use a generator function like this (You don't need to read the entire file initially here),
def multilinefile(fn, no_lns):
f, lines = open(fn), []
while '' not in lines:
lines = map(lambda s: f.readline(), range(no_lns))
if lines[0] == '':
break
yield ''.join(lines)
for line in multilinefile(your_file, 4):
print line

How to create a list from a text file in Python

I have a text file called "test", and I would like to create a list in Python and print it. I have the following code, but it does not print a list of words; it prints the whole document in one line.
file = open("test", 'r')
lines = file.readlines()
my_list = [line.split(' , ')for line in open ("test")]
print (my_list)

You could do
my_list = open("filename.txt").readlines()

When you do this:
file = open("test", 'r')
lines = file.readlines()
Lines is a list of lines. If you want to get a list of words for each line you can do:
list_word = []
for l in lines:
list_word.append(l.split(" "))

I believe you are trying to achieve something like this:
data = [word.split(',') for word in open("test", 'r').readlines()]
It would also help if you were to specify what type of text file you are trying to read as there are several modules(i.e. csv) that would produce the result in a much simpler way.
As pointed out, you may also strip a new line(depends on what line ending you are using) and you'll get something like this:
data = [word.strip('\n').split(',') for word in open("test", 'r').readlines()]
This produces a list of lines with a list of words.

appending a single string to each element of a list in python

I'm trying to read a text file containing a list of user IDs and convert those IDs into email addresses by appending the #wherever.com ending to the IDs. Then I want to write those email addresses to a new file separated by commas.
textFile = open(“userID.txt”, “r”)
identList = textFile.read().split(“, “)
print identList
textFile.close()
emailString = “#wherever.com, ”
newList = [x + emailString for x in identList]
writeFile = open(“userEmail.txt”, “w”)
writeFile.writelines(newList)
writeFile.close()
I'm using python 3.x for Mac. This isn't working at all. I'm not sure if it is reading the initial file at all. It is certainly not writing to the new file. Can someone suggest where the program is failing to work?

Something like the following should work:
with open('userID.txt', 'r') as f_input, open('emails.txt', 'w') as f_output:
emails = ["{}#wherever.com".format(line.strip()) for line in f_input]
f_output.write(", ".join(emails))
So if you had a userID.txt file containing the following names, with one name per line:
fred
wilma
You would get a one line output file as follows:
fred#wherever.com, wilma#wherever.com

You could do it like this, also using context managers for reading and writing, because then you don't need to worry about closing the file:
identList = []
with open('userID.txt', 'r') as f:
identList = f.readlines().rstrip().split(',')
outString = ','.join([ '{0}#wherever.com'.format(x) for x in identList ])
with open('userEmail.txt', 'w') as f:
f.write(outString)
The conversion to string was done with join, which joins in this case the list elements formed in the comprehension with commas between them.

Python using re module to parse an imported text file

def regexread():
import re
result = ''
savefileagain = open('sliceeverfile3.txt','w')
#text=open('emeverslicefile4.txt','r')
text='09,11,14,34,44,10,11, 27886637, 0\n561, Tue, 5,Feb,2013, 06,25,31,40,45,06,07, 19070109, 0\n560, Fri, 1,Feb,2013, 05,21,34,37,38,01,06, 13063500, 0\n559, Tue,29,Jan,2013,'
pattern='\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
#with open('emeverslicefile4.txt') as text:
f = re.findall(pattern,text)
for item in f:
print(item)
savefileagain.write(item)
#savefileagain.close()
The above function as written parses the text and returns sets of seven numbers. I have three problems.
Firstly the 'read' file which contains exactly the same text as text='09,...etc' returns a TypeError expected string or buffer, which I cannot solve even by reading some of the posts.
Secondly, when I try to write results to the 'write' file, nothing is returned and
thirdly, I am not sure how to get the same output that I get with the print statement, which is three lines of seven numbers each which is the output that I want.

This should do the trick:
import re
filename = 'sliceeverfile3.txt'
pattern = '\d\d,\d\d,\d\d,\d\d,\d\d,\d\d,\d\d'
new_file = []
# Make sure file gets closed after being iterated
with open(filename, 'r') as f:
# Read the file contents and generate a list with each line
lines = f.readlines()
# Iterate each line
for line in lines:
# Regex applied to each line
match = re.search(pattern, line)
if match:
# Make sure to add \n to display correctly when we write it back
new_line = match.group() + '\n'
print new_line
new_file.append(new_line)
with open(filename, 'w') as f:
# go to start of file
f.seek(0)
# actually write the lines
f.writelines(new_file)

You're sort of on the right track...
You'll iterate over the file:
How to iterate over the file in python
and apply the regex to each line. The link above should really answer all 3 of your questions when you realize you're trying to write 'item', which doesn't exist outside of that loop.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parse text file which groups data - python

outList = [] inText = open("C:\Sampledata.txt", "r").read() for line in inText.split("\n"): Date,Address,Name = line.split(",") outList .append(Address) outText = "\n".join(outList ) open("outFile.txt","w").write(outText )

Related

Using Regex to search a plaintext file line by line and cherry pick lines based on matches

to read from a text file 4 lines or several lines at a time

How to create a list from a text file in Python

appending a single string to each element of a list in python

Python using re module to parse an imported text file

Categories

Resources