This question is a followed up on this question
I've posted the solution by #tobiask here as well:
match_region = [map(str, blob.sentences[i-1:i+2]) # from prev to after next
for i, s in enumerate(blob.sentences) # i is index, e is element
if search_words & set(s.words)] # same as your condition
I am having trouble exporting the match_region file. I would like to turn this into a csv with the sentences as columns and every result as a row.
This will print the contents of the match_region to a file, but I didn't test it on your code, and it doesn't print the sentences.
with open('output.csv', 'w') as f:
for i, s in enumerate(match_region):
f.write('"' + i + '","' + s + '"\n')
Related
Following on from an earlier post, I have written some Python code to calculate the frequency of occurrences of certain phrases (contained in the "word_list" variable with three examples listed but will have many more) in a large number of text files. The code I've written below requires me to take each element of the list and insert it into a string for comparison to each text file. However the current code is only writing the frequencies for the last phrase in the list rather than all of them to the relevant columns in a spreadsheet. Is this just an indent issue, not placing the writerow in the correct position or is there a logic flaw in my code. Also is there any way to avoid using a list to string assignment in order to compare the phrases to those in the text files?
word_list = ['in the event of', 'frankly speaking', 'on the other hand']
S = {}
p = 0
k = 0
with open(file_path, 'w+', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["Fohone-K"] + word_list)
for filename in glob.glob(os.path.join(path, '*.txt')):
if filename.endswith('.txt'):
f = open(filename)
Fohone-K = filename[8:]
data = f.read()
# new code section from scratch file
l = len(word_list)
for s in range(l):
phrase = word_list[s]
S = data.count((phrase))
if S:
#k = k + 1
print("'{}' match".format(Fohone-K), S)
else:
print("'{} no match".format(Fohone-K))
print("\n")
# for m in word_list:
if S >= 0:
print([Fohone-K] + [S])
writer.writerow([Fohone-K] + [S])
The output currently looks like this.
enter image description here
When it needs to look like this.
enter image description here
You probably were going for something like this:
import csv, glob, os
word_list = ['in the event of', 'frankly speaking', 'on the other hand']
file_path = 'out.csv'
path = '.'
with open(file_path, 'w+', newline='') as csv_file:
writer = csv.writer(csv_file)
writer.writerow(["Fohone-K"] + word_list)
for filename in glob.glob(os.path.join(path, '*.txt')):
if filename.endswith('.txt'):
with open(filename) as f:
postfix = filename[8:]
content = f.read()
matches = [content.count(phrase) for phrase in word_list]
print(f"'{filename}' {'no ' if all(n == 0 for n in matches) else ''}match")
writer.writerow([postfix] + matches)
The key problem was you were writing S on each row, which only contained a single count. That's fixed here by writing a full set of matches.
I tried doing what I can to solve this, the movie titles just won't move up. The problem is at the 2nd block in the for loop.. This is the function I wrote.
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for title in movie_titles:
while i < len(movie_titles[0:]): i = i + 1; f.write(str(i) + '\n')
f.write(', '+ "%s\n" % title.replace(',', '') + '\n')
f.close()
Another answer has a more straightforward and pythonic method, but for your specific code, this would solve it:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for title in movie_titles:
i = i + 1
f.write(str(i) + ', '+ "%s\n" % title.replace(',', '') + '\n')
Note that the final f.close() is not needed. The with command takes care of that.
You can use enumerate() in for loop to get index. For example:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
f.write("No., Title\n")
for i, title in enumerate(movie_titles, 1):
f.write('{},{}\n'.format(i, title.replace(',', '')))
Note: To create CSV file look at csv module.
You got your loops mixed up a bit. your code goes into the for-loop and iterates over all movies, during the first iteration it executes the while-loop and only after that is finished the for-loop is continued.
I would suggest something like this:
def writeFile(filename, movie_titles):
with open(filename, 'w') as f:
headers = "No., Title\n"
f.write(headers)
i = 0
for i in range(len(movie_titles)):
f.write(str(i+1) + ',')
f.write("%s\n" % movie_titles[i].replace(',', ''))
f.close()
the for loop iterates over all numbers from 0 to length of movielist - 1
then the number is written, here you add 1 so that your list starts with 1
after that you write your movie title. i assumed your movielist variable is a list and thus you can index this list by list[index], this index is in our case i and it's highest value corresponds to the last element of movie list.
also you had too many newlines because you only need one new line per line.
one could probably also write numbers and movienames separately but then you would need to specify which row of the file you are writing to.
I try analyze text file with data - columns, and records.
My file:
Name Surname Age Sex Grade
Chris M. 14 M 4
Adam A. 17 M
Jack O. M 8
The text file has some empty data. As above.
User want to show Name and Grade:
import csv
with open('launchlog.txt', 'r') as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split() for line in stripped if line)
with open('log.txt', 'w') as out_file:
writer = csv.writer(out_file)
writer.writerow(('Name', 'Surname', 'Age', 'Sex', 'Grade'))
writer.writerows(lines)
log.txt :
Chris,M.,14,M,4
Adam,A.,17,M
Jack,O.,M,8
How to empty data insert a "None" string?
For example:
Chris,M.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8
What would be the best way to do this in Python?
Use pandas:
import pandas
data=pandas.read_fwf("file.txt")
To get your dictionnary:
data.set_index("Name")["Grade"].to_dict()
Here's something in Pure Pythonâ„¢ that seems to do what you want, at least on the sample data file in your question.
In a nutshell what it does is first determine where each of the field names in column header line start and end, and then for each of the remaining lines of the file, does the same thing getting a second list which is used to determine what column each data item in the line is underneath (which it then puts in its proper position in the row that will be written to the output file).
import csv
def find_words(line):
""" Return a list of (start, stop) tuples with the indices of the
first and last characters of each "word" in the given string.
Any sequence of consecutive non-space characters is considered
as comprising a word.
"""
line_len = len(line)
indices = []
i = 0
while i < line_len:
start, count = i, 0
while line[i] != ' ':
count += 1
i += 1
if i >= line_len:
break
indices.append((start, start+count-1))
while i < line_len and line[i] == ' ': # advance to start of next word
i += 1
return indices
# convert text file with missing fields to csv
with open('name_grades.txt', 'rt') as in_file, open('log.csv', 'wt', newline='') as out_file:
writer = csv.writer(out_file)
header = next(in_file) # read first line
fields = header.split()
writer.writerow(fields)
# determine the indices of where each field starts and stops based on header line
field_positions = find_words(header)
for line in in_file:
line = line.rstrip('\r\n') # remove trailing newline
row = ['None' for _ in range(len(fields))]
value_positions = find_words(line)
for (vstart, vstop) in value_positions:
# determine what field the value is underneath
for i, (hstart, hstop) in enumerate(field_positions):
if vstart <= hstop and hstart <= vstop: # overlap?
row[i] = line[vstart:vstop+1]
break # stop looking
writer.writerow(row)
Here's the contents of the log.csv file it created:
Name,Surname,Age,Sex,Grade
Chris,C.,14,M,4
Adam,A.,17,M,None
Jack,O.,None,M,8
I would use baloo's answer over mine -- but if you just want to get a feel for where your code went wrong, the solution below mostly works (there is a formatting issue with the Grade field, but I'm sure you can get through that.) Add some print statements to your code and to mine and you should be able to pick up the differences.
import csv
<Old Code removed in favor of new code below>
EDIT: I see your difficulty now. Please try the below code; I'm out of time today so you will have to fill in the writer parts where the print statement is, but this will fulfill your request to replace empty fields with None.
import csv
with open('Test.txt', 'r') as in_file:
with open('log.csv', 'w') as out_file:
writer = csv.writer(out_file)
lines = [line for line in in_file]
name_and_grade = dict()
for line in lines[1:]:
parts = line[0:10], line[11:19], line[20:24], line[25:31], line[32:]
new_line = list()
for part in parts:
val = part.replace('/n','')
val = val.strip()
val = val if val != '' else 'None'
new_line.append(val)
print(new_line)
Without using pandas:
Edited based on your comment, I hard coded this solution based on your data. This will not work for the rows doesn't have Surname column.
I'm writing out Name and Grade since you only need those two columns.
o = open("out.txt", 'w')
with open("inFIle.txt") as f:
for lines in f:
lines = lines.strip("\n").split(",")
try:
grade = int(lines[-1])
if (lines[-2][-1]) != '.':
o.write(lines[0]+","+ str(grade)+"\n")
except ValueError:
print(lines)
o.close()
I have an excel files that contain values which i would like to write to text as shown on the right side of the image shown below. I have been doing this by hand but it is very tedious. I have tried using python but I am getting frustrated with my accrued knowledge of python so far. Thanks for the help
for those that can't see it, I would like it outputted as this
[wind#]
Height=
Direction=
Velocity=
You could export your excel file to .csv (I hope you can figure out on how to do this on your own) file and get back something like this:
height,direction,speed
1,2,3
3,2,1
With the following .py script you can take the input file (which is in csv format) and transform it to your output. Where input.csv is your csv file which resides in the same folder as your script and output.txt is the file which will be your result.
f = open('input.csv', 'r')
g = open('output.txt', 'w')
# Header lines must be kept separately since we will be using them for every time
first_line = f.readline()
headers = first_line.split(',')
headers[-1] = headers[-1].strip()
length = len(headers)
# Capitalize each header word.
for i in range(length):
headers[i] = headers[i].capitalize()
counter = 1
for line in f:
values = line.split(',')
values[-1] = values[-1].strip() #remove EOL character
g.write('[Wind' + str(counter) + ']' + "\n")
for i in range(length):
g.write(headers[i] + "=" + values[i] + "\n")
counter += 1
g.close()
f.close()
input:
height,direction,speed
1,2,3
3,2,1
output:
[Wind1]
Height=1
Direction=2
Speed=3
[Wind2]
Height=3
Direction=2
Speed=1
I have been working on a Python script to parse a single delimited column in a csv file. However, the column has multiple different delimiters and I can't figure out how to do this.
I have another script that works on similar data, but can't get this one to work. The data below is in a single column on the row. I want to have the script parse these out and add tabs in between each. Then I want to append this data into a list with only the unique items. Typically I am dealing with several hundred rows of this data and would like to parse the entire file and then return only the unique items in two columns (one for IP and other for URL).
Data to parse: 123.123.123.123::url.com,url2.com,234.234.234.234::url3.com (note ":" and "," are used as delimiters on the same line)
Script I am working with:
import sys
import csv
csv_file = csv.DictReader(open(sys.argv[1], 'rb'), delimiter=':')
uniq_rows = []
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
if row not in uniq_rows:
uniq_rows.append(row)
for row in uniq_rows:
print row
Does anyone know how to accomplish what I am trying to do?
Change the list (uniq_rows = []) to a set (uniq_rows = set()):
csv_file = csv.DictReader(open(sys.argv[1], 'rU'), delimiter=':')
uniq_rows = set()
for column in csv_file:
X = column[' IP'].split(':')[-1]
row = X + '\t'
uniq_rows.add(row)
for row in list(uniq_rows):
print row
If you need further help, leave a comment
you can also just use replace to change your import lines: (not overly pythonic I guess but standard builtin):
>>> a = "123.123.123.123::url.com,url2.com,234.234.234.234::url3.com"
>>> a = a.replace(',','\t')
>>> a = a.replace(':','\t')
>>> print (a)
123.123.123.123 url.com url2.com 234.234.234.234 url3.com
>>>
as mentioned in comment here a simple text manipulation to get you (hopefully) the right output prior to cleaning non duplicates:
import sys
read_raw_file = open('D:filename.csv') # open current file
read_raw_text = read_raw_file.read()
new_text = read_raw_text.strip()
new_text = new_text.replace(',','\t')
# new_text = new_text.replace('::','\t') optional if you want double : to only include one column
new_text = new_text.replace(':','\t')
text_list = new_text.split('\n')
unique_items = []
for row in text_list:
if row not in unique_items:
unique_items.append(row)
new_file ='D:newfile.csv'
with open(new_file,'w') as write_output_file: #generate new file
for i in range(0,len(unique_items)):
write_output_file.write(unique_items[i]+'\n')
write_output_file.close()