how to join two text files in python - python

i have two text files like
['i', 'hate', 'sausages', 'noa', 'hate', 'i']
and then a numbers file
1, 2, 3, 3, 2, 1
now i need to try and join the two files to each other
this is what i have so far
positionlist=open('positions.txt', 'r')#opening the code files with the postions
wordlist=open('Words.txt', 'r')#opening the word files
positions = positionslist.read() #the code is reading the file to see what it is in the txt and is saving them as a vairable
words = wordslist.read() #the code is reading the file to see what it is in the txt and is saving them as a vairable
print(positions) #prints the positions
print(words) #prints the words
positionfinal = list(positions) # makes the postions into a list
#this is where i need to tey and connect the two codes together
remover = words.replace("[","") #replacing the brackes withnothing so that it is red like a string
remover = remover.replace("]","")#replacing the brackes withnothing so that it is red like a string

The format of the input files looks like it's Python literals (list of strings, tuple of integers), so you can use ast.literal_eval to parse it. And then you can use the built-in zip function to loop through them together and print them out. Something like this:
import ast
with open('words.txt') as f:
words = ast.literal_eval(f.read()) # read as list of strings
with open('positions.txt') as f:
positions = ast.literal_eval(f.read()) # read as tuple of ints
for position, word in zip(positions, words):
print(position, word)
Here's the output:
1 i
2 hate
3 sausages
3 noa
2 hate
1 i

Related

I'm trying to find words from a text file in another text file

I built a simple graphical user interface (GUI) with basketball info to make finding information about players easier. The GUI utilizes data that has been scraped from various sources using the 'requests' library. It works well but there is a problem; within my code lies a list of players which must be compared against this scraped data in order for everything to work properly. This means that if I want to add or remove any names from this list, I have to go into my IDE or directly into my code - I need to change this. Having an external text file where all these player names can be stored would provide much needed flexibility when managing them.
#This is how the players list looks in the code.
basketball = ['Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis' ... #and many others]
#This is how the info in the scrapped file looks like:
Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,
#The rest of the code is working well, this is the final part where it uses the list to write the players that were found it both files.
with open("freeze.csv",'r') as freeze:
for word in basketball:
if word in freeze:
freeze.write(word)
# Up to this point I get the correct output, but now I need the list 'basketball' in a text file so can can iterate the same way
# I tried differents solutions but none of them work for me
with open('final_G_league.csv') as text, open('freeze1.csv') as filter_words:
st = set(map(str.rstrip,filter_words))
txt = next(text).split()
out = [word for word in txt if word not in st]
# This one gives me the first line of the scrapped text
import csv
file1 = open("final_G_league.csv",'r')
file2 = open("freeze1.csv",'r')
data_read1= csv.reader(file1)
data_read2 = csv.reader(file2)
# convert the data to a list
data1 = [data for data in data_read1]
data2 = [data for data in data_read2]
for i in range(len(data1)):
if data1[i] != data2[i]:
print("Line " + str(i) + " is a mismatch.")
print(f"{data1[i]} doesn't match {data2[i]}")
file1.close()
file2.close()
#This one returns a list with a bunch of names and a list index error.
file1 = open('final_G_league.csv','r')
file2 = open('freeze_list.txt','r')
list1 = file1.readlines()
list2 = file2.readlines()
for i in list1:
for j in list2:
if j in i:
# I also tried the answers in this post:
#https://stackoverflow.com/questions/31343457/filter-words-from-one-text-file-in-another-text-file
Let's assume we have following input files:
freeze_list.txt - comma separated list of filter words (players) enclosed in quotes:
'Adebayo, Bam', 'Allen, Jarrett', 'Antetokounmpo, Giannis', 'Anthony, Cole', 'Anunoby, O.G.', 'Ayton, Deandre',
'Banchero, Paolo', 'Bane, Desmond', 'Barnes, Scottie', 'Barrett, RJ', 'Beal, Bradley', 'Booker, Devin', 'Bridges, Mikal',
'Brown, Jaylen', 'Brunson, Jalen', 'Butler, Jimmy', 'Forbes, Bryn'
final_G_league.csv - scrapped lines that we want to filter, using words from the freeze_list.txt file:
Charlotte Hornets,"Ball, LaMelo",Out,"Injury/Illness - Bilateral Ankle, Wrist; Soreness (L Ankle, R Wrist)"
"Hayward, Gordon",Available,Injury/Illness - Left Hamstring; Soreness
"Martin, Cody",Out,Injury/Illness - Left Knee; Soreness
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
"Okogie, Josh",Questionable,Injury/Illness - Nasal; Fracture,
I would split the responsibilities of the script in code segments to make it more readable and manageable:
Define constants (later you could make them parameters)
Read filter words from a file
Filter scrapped lines
Dump output to a file
The constants:
FILTER_WORDS_FILE_NAME = "freeze_list.txt"
SCRAPPED_FILE_NAME = "final_G_league.csv"
FILTERED_FILE_NAME = "freeze.csv"
Read filter words from a file:
with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
filter_words = eval('(' + filter_words_file.read() + ')')
Filter lines from the scrapped file:
matched_lines = []
with open(SCRAPPED_FILE_NAME) as scrapped_file:
for line in scrapped_file:
# Check if any of the keywords is found in the line
for filter_word in filter_words:
if filter_word in line:
matched_lines.append(line)
# stop checking other words for performance and
# to avoid sending same line multipe times to the output
break
Dump filtered lines into a file:
with open(FILTERED_FILE_NAME, "w") as filtered_file:
for line in matched_lines:
filtered_file.write(line)
The output freeze.csv after running above segments in a sequence is:
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
Suggestion
Not sure why you have chosen to store the filter words in a comma separated list. I would prefer using a plain list of words - one word per line.
freeze_list.txt:
Adebayo, Bam
Allen, Jarrett
Antetokounmpo, Giannis
Butler, Jimmy
Forbes, Bryn
The reading becomes straightforward:
with open(FILTER_WORDS_FILE_NAME) as filter_words_file:
filter_words = [word.strip() for word in filter_words_file]
The output freeze.csv is the same:
"Forbes, Bryn",Questionable,Injury/Illness - N/A; Illness,
If file2 is just a list of names and want to extract those rows in first file where the name column matches a name in the list.
Suggest you make the "freeze" file a text file with one-name per line and remove the single quotes from the names then can more easily parse it.
Can then do something like this to match the names from one file against the other.
import csv
# convert the names data to a list
with open("freeze1.txt",'r') as file2:
names = [s.strip() for s in file2]
print("names:", names)
# next open league data and extract rows with matching names
with open("final_G_league.csv",'r') as file1:
reader = csv.reader(file1)
next(reader) # skip header
for row in reader:
if row[0] in names:
# print matching name that matches
print(row[0])
If names don't match exactly as appears in the final_G_league file then may need to adjust accordingly such as doing a case-insensitive match or normalizing names (last, first vs first last), etc.

"Replace" from central file?

I am trying to extend the replace function. Instead of doing the replacements on individual lines or individual commands, I would like to use the replacements from a central text file.
That's the source:
import os
import feedparser
import pandas as pd
pd.set_option('max_colwidth', -1)
RSS_URL = "https://techcrunch.com/startups/feed/"
feed = feedparser.parse(RSS_URL)
entries = pd.DataFrame(feed.entries)
entries = entries[['title']]
entries = entries.to_string(index=False, header=False)
entries = entries.replace(' ', '\n')
entries = os.linesep.join([s for s in entries.splitlines() if s])
print(entries)
I want to be able to replace words from a RSS feed, from a central "Replacement"-file, witch So the source file should have two columns:Old word, New word. Like replace function replace('old','new').
Output/Print Example:
truck
rental
marketplace
D’Amelio
family
launches
to
invest
up
to
$25M
...
In most cases I want to delete the words that are unnecessary for me, so e.g. replace('to',''). But I also want to be able to change special names, e.g. replace('D'Amelio','DAmelio'). The goal is to reduce the number of words and build up a kind of keyword radar.
Is this possible? I can't find any help Googling. But it could well be that I do not know the right terms or can not formulate.
with open('<filepath>','r') as r:
# if you remove the ' marks from around your words, you can remove the [1:-1] part of the below code
words_to_replace = [word.strip()[1:-1] for word in r.read().split(',')]
def replace_words(original_text, words_to_replace):
for word in words_to_replace:
original_text = original_text.replace(word, '')
return original_text
I was unable to understand your question properly but as far as I understand you have strings like cat, dog, etc. and you have a file in which you have data with which you want to replace the string. If this was your requirement, I have given the solution below, so try running it if it satisfies your requirement.
If that's not what you meant, please comment below.
TXT File(Don't use '' around the strings in Text File):
papa, papi
dog, dogo
cat, kitten
Python File:
your_string = input("Type a string here: ") #string you want to replace
with open('textfile.txt',"r") as file1: #open your file
lines = file1.readlines()
for line in lines: #taking the lines of file in one by one using loop
string1 = f'{line}'
string1 = string1.split() #split the line of the file into list like ['cat,', 'kitten']
if your_string == string1[0][:-1]: #comparing the strings of your string with the file
your_string = your_string.replace(your_string, string1[1]) #If string matches like user has given input cat, it will replace it with kitten.
print(your_string)
else:
pass
If you got the correct answer please upvote my answer as it took my time to make and test the python file.

How do I read a file and convert each line into strings and see if any of the strings are in a user input?

What I am trying to do in my program is to have the program open a file with many different words inside it.
Receive a user input and check if any word inside the file is in user input.
Inside the file redflags.txt:
happy
angry
ball
jump
each word on a different line.
For example if the user input is "Hey I am a ball" then it will print redflag.
If the user input is "Hey this is a sphere" then it will print noflag.
Redflags = open("redflags.txt")
data = Redflags.read()
text_post = raw_input("Enter text you wish to analyse")
words = text_post.split() and text_post.lower()
if data in words:
print("redflag")
else:
print("noflag")
This should do the trick! Sets are generally much faster than lookups for list comparisons. Sets can tell you the intersection in this case (overlapping words), differences, etc. We consume the file that has a list of words, remove newline characters, lowercase, and we have our first list. The second list is created from the user input and split on spacing. Now we can perform our set intersection to see if any common words exist.
# read in the words and create list of words with each word on newline
# replace newline chars and lowercase
words = [word.replace('\n', '').lower() for word in open('filepath_to_word_list.txt', 'r').readlines()]
# collect user input and lowercase and split into list
user_input = raw_input('Please enter your words').lower().split()
# use set intersection to find if common words exist and print redflag
if set(words).intersection(set(user_input)):
print('redflag')
else:
print('noflag')
with open('redflags.txt', 'r') as f:
# See this post: https://stackoverflow.com/a/20756176/1141389
file_words = f.read().splitlines()
# get the user's words, lower case them all, then split
user_words = raw_input('Please enter your words').lower().split()
# use sets to find if any user_words are in file_words
if set(file_words).intersection(set(user_words)):
print('redflag')
else:
print('noredflag')
I would suggest you to use list comprehensions
Take a look at this code which does what you want (I will explain them below):
Redflags = open("redflags.txt")
data = Redflags.readlines()
data = [d.strip() for d in data]
text_post = input("Enter text you wish to analyse:")
text_post = text_post.split()
fin_list = [i for i in data if i in text_post]
if (fin_list):
print("RedFlag")
else:
print("NoFlag")
Output 1:
Enter text you wish to analyse:i am sad
NoFlag
Output 2:
Enter text you wish to analyse:i am angry
RedFlag
So first open the file and read them using readlines() this gives a list of lines from file
>>> data = Redflags.readlines()
['happy\n', 'angry \n', 'ball \n', 'jump\n']
See all those unwanted spaces,newlines (\n) use strip() to remove them! But you can't strip() a list. So take individual items from the list and then apply strip(). This can be done efficiently using list comprehensions.
data = [d.strip() for d in data]
Also why are you using raw_input() In Python3 use input() instead.
After getting and splitting the input text.
fin_list = [i for i in data if i in text_post]
I'm creating a list of items where i (each item) from data list is also in text_pos list. So this way I get common items which are in both lists.
>>> fin_list
['angry'] #if my input is "i am angry"
Note: In python empty lists are considered as false,
>>> bool([])
False
While those with values are considered True,
>>> bool(['some','values'])
True
This way your if only executes if list is non-empty. Meaning 'RedFlag' will be printed only when some common item is found between two lists. You get what you want.

Converting plural to singular in a text file with Python

I have txt files that look like this:
word, 23
Words, 2
test, 1
tests, 4
And I want them to look like this:
word, 23
word, 2
test, 1
test, 4
I want to be able to take a txt file in Python and convert plural words to singular. Here's my code:
import nltk
f = raw_input("Please enter a filename: ")
def openfile(f):
with open(f,'r') as a:
a = a.read()
a = a.lower()
return a
def stem(a):
p = nltk.PorterStemmer()
[p.stem(word) for word in a]
return a
def returnfile(f, a):
with open(f,'w') as d:
d = d.write(a)
#d.close()
print openfile(f)
print stem(openfile(f))
print returnfile(f, stem(openfile(f)))
I have also tried these 2 definitions instead of the stem definition:
def singular(a):
for line in a:
line = line[0]
line = str(line)
stemmer = nltk.PorterStemmer()
line = stemmer.stem(line)
return line
def stem(a):
for word in a:
for suffix in ['s']:
if word.endswith(suffix):
return word[:-len(suffix)]
return word
Afterwards I'd like to take duplicate words (e.g. test and test) and merge them by adding up the numbers next to them. For example:
word, 25
test, 5
I'm not sure how to do that. A solution would be nice but not necessary.
If you have complex words to singularize, I don't advise you to use stemming but a proper python package link pattern :
from pattern.text.en import singularize
plurals = ['caresses', 'flies', 'dies', 'mules', 'geese', 'mice', 'bars', 'foos',
'families', 'dogs', 'child', 'wolves']
singles = [singularize(plural) for plural in plurals]
print(singles)
returns:
>>> ['caress', 'fly', 'dy', 'mule', 'goose', 'mouse', 'bar', 'foo', 'foo', 'family', 'family', 'dog', 'dog', 'child', 'wolf']
It's not perfect but it's the best I found. 96% based on the docs : http://www.clips.ua.ac.be/pages/pattern-en#pluralization
It seems like you're pretty familiar with Python, but I'll still try to explain some of the steps. Let's start with the first question of depluralizing words. When you read in a multiline file (the word, number csv in your case) with a.read(), you're going to be reading the entire body of the file into one big string.
def openfile(f):
with open(f,'r') as a:
a = a.read() # a will equal 'soc, 32\nsoc, 1\n...' in your example
a = a.lower()
return a
This is fine and all, but when you want to pass the result into stem(), it will be as one big string, and not as a list of words. This means that when you iterate through the input with for word in a, you will be iterating through each individual character of the input string and applying the stemmer to those individual characters.
def stem(a):
p = nltk.PorterStemmer()
a = [p.stem(word) for word in a] # ['s', 'o', 'c', ',', ' ', '3', '2', '\n', ...]
return a
This definitely doesn't work for your purposes, and there are a few different things we can do.
We can change it so that we read the input file as one list of lines
We can use the big string and break it down into a list ourselves.
We can go through and stem each line in the list of lines one at a time.
Just for expedience's sake, let's roll with #1. This will require changing openfile(f) to the following:
def openfile(f):
with open(f,'r') as a:
a = a.readlines() # a will equal 'soc, 32\nsoc, 1\n...' in your example
b = [x.lower() for x in a]
return b
This should give us b as a list of lines, i.e. ['soc, 32', 'soc, 1', ...]. So the next problem becomes what do we do with the list of strings when we pass it to stem(). One way is the following:
def stem(a):
p = nltk.PorterStemmer()
b = []
for line in a:
split_line = line.split(',') #break it up so we can get access to the word
new_line = str(p.stem(split_line[0])) + ',' + split_line[1] #put it back together
b.append(new_line) #add it to the new list of lines
return b
This is definitely a pretty rough solution, but should adequately iterate through all of the lines in your input, and depluralize them. It's rough because splitting strings and reassembling them isn't particularly fast when you scale it up. However, if you're satisfied with that, then all that's left is to iterate through the list of new lines, and write them to your file. In my experience it's usually safer to write to a new file, but this should work fine.
def returnfile(f, a):
with open(f,'w') as d:
for line in a:
d.write(line)
print openfile(f)
print stem(openfile(f))
print returnfile(f, stem(openfile(f)))
When I have the following input.txt
soc, 32
socs, 1
dogs, 8
I get the following stdout:
Please enter a filename: input.txt
['soc, 32\n', 'socs, 1\n', 'dogs, 8\n']
['soc, 32\n', 'soc, 1\n', 'dog, 8\n']
None
And input.txt looks like this:
soc, 32
soc, 1
dog, 8
The second question regarding merging numbers with the same words changes our solution from above. As per the suggestion in the comments, you should take a look at using dictionaries to solve this. Instead of doing this all as one big list, the better (and probably more pythonic) way to do this is to iterate through each line of your input, and stemming them as you process them. I'll write up code about this in a bit, if you're still working to figure it out.
The Nodebox English Linguistics library contains scripts for converting plural form to single form and vice versa. Checkout tutorial: https://www.nodebox.net/code/index.php/Linguistics#pluralization
To convert plural to single just import singular module and use singular() function. It handles proper conversions for words with different endings, irregular forms, etc.
from en import singular
print(singular('analyses'))
print(singular('planetoids'))
print(singular('children'))
>>> analysis
>>> planetoid
>>> child

Using a text file to lay out a words into a grid form

I have a list of 6 words from a text file and would like to open the file to read the list of words as a 3x2 grid, also being able to randomise the order of the words every time the program is run.
words are:
cat, dog, hamster, frog, snail, snake
i want them to display as: (but every time the program is run to do this in a random order)
cat dog hamster
frog snail snake
so far all i've managed to do is get a single word from the list of 6 words to appear in a random order using - help would be much appriciated
import random
words_file = random.choice(open('words.txt', 'r').readlines())
print words_file
Here's another one:
>>> import random
>>> with open("words.txt") as f:
... words = random.sample([x.strip() for x in f], 6)
...
...
>>> grouped = [words[i:i+3] for i in range(0, len(words), 3)]
>>> for l in grouped:
... print "".join("{:<10}".format(x) for x in l)
...
...
snake cat dog
snail frog hamster
First we read the contents of the file and pick six random lines (make sure your lines only contain a single word). Then we group the words into lists of threes and print them using string formatting. The <10 in the format brackets left-aligns the text and pads each item with 10 spaces.
For selecting the 6 words, you should try random.sample:
words = randoms.sample(open('words.txt').readlines(), 6)
You'll want to look into string formatting!
import random
with open('words.txt','r') as infile:
words_file = infile.readlines()
random.shuffle(words_file) # mix up the words
maxlen = len(max(words_file, key=lambda x: len(x)))+1
print_format = "{}{}{}".format("{:",maxlen,"}")
print(*(print_format.format(word) for word in words_file[:3])
print(*(print_format.format(word) for word in words_file[3:])
There are better ways to run through your list grouping by threes, but this works for your limited test case. Here's a link to some more information on chunking lists
My favorite recipe is chunking with zip and iter:
def get_chunks(iterable,chunksize):
return zip(*[iter(iterable)]*chunksize)
for chunk in get_chunks(words_file):
print(*(print_format.format(word) for word in chunk))

Categories

Resources