How to replace part of text - python

I am really new to coding so don't be harsh on me, since my question is probably basic. I couldn't find a way to do it.
I would like to learn how to create automatizated process of creating custom links.(Preferably in Python)
Let me give you example.
https://website.com/questions/ineedtoreplacethis.pdf
I have a database (text file) of names, one name one line
(Oliver
David
Donald
etc.)
I am looking for a way how to automatically insert the name to the "ineedtoreplacethis" part of the link and create many many custom links like that at once.
Thank you in advance for any help.

f-string is probably the way to go.
Here is an example:
names = ['Olivier', 'David', 'Donald']
for name in names:
print(f"{name}.txt")
Output:
Olivier.txt
David.txt
Donald.txt

You can do this using string concatenation as explained below. This is after you get the data from the text file, achieving that is explained in the later part of the answer.
a= "Foo"
b= "bar"
a+b will return
"Foobar"
In your case,
original_link = "https://website.com/questions/"
sub_link = "ineedtoreplacethis.pdf"
out = original_link + sub_link
The value of out will be as you required.
To get the sub_link from your text file, read the text file as:
with open("database.txt","r") as file:
data= file.readlines() # Here I am assuming that your text file is CRLF terminated
Once you have the data , which is a list of all the values from your text file, you can iterate using loops.
for sub_link in data:
search_link = original_link+sub_link
"""Use this search_link to do your further operations"

Use a formatted string
filename = "test.txt"
lines = []
with open(filename) as my_file:
lines = my_file.readlines()
for i in lines:
print(f"https://website.com/questions/{i}.pdf")
EXPLAINATION:
Read the txt file by a list of lines
Iterate over the list using For loop
Using formatted string print them

Consider file textFile.txt as
Oliver
David
Donald
You can simply loop over the names in the file as
with open("textFile.txt", "r") as f:
name_list = f.read().split('\n')
link_prefix = 'https://website.com/questions/'
link_list = []
for word in name_list:
link_list.append(link_prefix + word + '.pdf')
print(link_list)
This will print output as (ie. contents of link_list is):
['https://website.com/questions/Oliver.pdf', 'https://website.com/questions/David.pdf', 'https://website.com/questions/Donald.pdf']

from pprint import pprint as pp
import re
url = "https://website.com/questions/ineedtoreplacethis.pdf"
pattern = re.compile(r"/(\w+)\.pdf") # search the end of the link with a regex
sub_link = pattern.search(url).group(1) # find the part to replace
print(f"{sub_link = }")
names = ["Oliver", "David", "Donald"] # text file content loaded into list
new_urls = []
for name in names:
new_url = url.replace(sub_link, str(name))
new_urls.append(new_url)
pp(new_urls) # Print out the formatted links to the console

Related

how to write a line to read a certain sentence in a text file?

I am currently learning how to code python and i need help with something.
is there a way where I can only allow the script to read a line that starts with Text = .. ?
Because I want the program to read the text file and the text file has a lot of other sentences but I only want the program to focus on the sentences that starts with Text = .. and print it out, ignoring the other lines in the text file.
for example,
in text file:
min = 32.421
text = " Hello I am Robin and I am hungry"
max = 233341.42
how I want my output to be:
Hello I am Robin and I am hungry
I want the output to just solely be the sentence so without the " " and text =
This is my code so far after reading through comments!
import os
import sys
import glob
from english_words import english_words_set
try:
print('Finding file...')
file = glob.glob(sys.argv[1])
print("Found " + str(len(file)) + " file!")
print('LOADING NOW...')
with open(file) as f:
lines = f.read()
for line in lines:
if line.startswith('Text = '):
res = line.split('"')[1]
print(res)
You can read the text file and read its lines like so :
# open file
with open('text_file.txt') as f:
# store the list of lines contained in file
lines = f.readlines()
for line in lines:
# find match
if line.startswith('text ='):
# store the string inside double quotes
res = line.split('"')[1]
print(res)
This should print your expected output.
You can open the file and try to find if the word "text" begins a sentence in the file and then checking the value by doing
file = open("file.txt", "r") # specify the variable as reading the file, change file.txt to the files path
for line in file: # for each line in file
if line.startswith("text"): # checks for text following a new line
text = line.strip() # removes any whitespace from the line
text = text.replace("text = \"", "") # removes the part before the string
text = text.replace("\"", "") # removes the part after the string
print(text)
Or you could convert it from text to something like yml or toml (in python 3.11+) as those are natively supported in python and are much simpler than text files while still keeping your file system about the same. It would store it as a dictionary instead of a string in the variable.
List comprehensions in python:
https://www.youtube.com/watch?v=3dt4OGnU5sM
Using list comprehension with files:
https://www.youtube.com/watch?v=QHFWb_6fHOw
First learn list comprehensions, then the idea is this:
listOutput = ['''min = 32.421
text = "Hello I am Robin and I am hungry"
max = 233341.42''']
myText = ''.join(listOutput)
indexFirst= myText.find("text") + 8 # add 8 to this index to discard {text = "}
indexLast = myText.find('''"''', indexFirst) # locate next quote since indexFirst position
print(myText[indexFirst:indexLast])
Output:
Hello I am Robin and I am hungry
with open(file) as f:
lines = f.read().split("\n")
prefix = "text = "
for line in lines:
if line.startswith(prefix):
# replaces the first occurence of prefix and assigns it to result
result = line.replace(prefix, '', 1)
print(result)
Alternatively, you could use result = line.removeprefix(prefix) but removeprefix is only available in python3.9 upwards

Searching keywords in a text file with a dictionary with python

I am trying to search a lot of keywords in a textfile and return integers/floats that come after the keyword.
I think it's possible using a dictionary where the keys are the keywords that are in the text file and the values are functions that return the following value.
import re
def store_text():
with open("path_to_file.txt", 'r') as f:
text = f.readlines()
return text
abc = store_text()
def search():
for index, line in enumerate(abc):
if "His age is:" in line:
return int(re.search(r"\d+", line).group())
dictionary = {
"His age is:": print(search())
}
The code returns the value I search in the text file but in search() I want to get rid of typing the keyword again, because its already in the dictionary.
Later on I want to store the values found in an excel file.
If you have the keywords ready to be in a list, the following approach can help.
import re
from multiprocessing import Pool
search_kwrds = ["His age is:", "His name is:"] # add more keywords if you need.
search_regex = "|".join(search_kwrds)
def read_search_text():
with open("path_to_file.txt", 'r') as f:
text = f.readlines()
return text
def search(search_line):
search_res = re.search(search_regex, search_line)
if search_res:
kwrd_found = search_res.group(0)
if kwrd_found:
suffix_val = int(re.search(r"\d+", search_line).group())
return {kwrd_found: suffix_val }
return {}
if __name__ == '__main__':
search_lines = read_search_text()
p = Pool(processes=1) # increase, if you want a faster search
s_res = p.map(search,search_lines)
search_results ={kwrd: suffix for d in s_res for kwrd, suffix in d.items()}
print(search_results)
You can add more keywords to the list and search for them. This focuses on searches where you will have a single keyword on a given line and keywords are not repeating in further lines.
You can put up your keywords that you need to search in a list. This way you end up specifying your input keywords just once in your program. Also, I've modified your program to make it a bit efficient. Explanation given in comments.
import re
import csv
list_of_keywords = ["His age is:","His number is:","Your_Keyword3"] # You can add more keywords to find and match to this list
def store_text():
with open("/Users/karthick/Downloads/sample.txt", 'r') as f:
text = f.readlines()
return text
abc = store_text()
def search(input_file):
# Initialize an empty dictionary to store the extracted values
dictionary = dict()
#Iterate through lines of textfile
for line in input_file:
#FOr every line in text file, iterate through the keywords to check if any keyword is present in the line
for keyword in list_of_keywords:
if keyword in line:
#If any matching keyword is present, append the dictionary with new values
dictionary.update({keyword : re.search(r"\d+", line).group()})
return dictionary
#Call the above function with input
output_dict = search(abc)
For storing the output values in an Excel csv:
#Write the extracted dictionary to an Excel csv file
with open('mycsvfile.csv','w') as f: #Specify the path of your output csv file here
w = csv.writer(f)
w.writerows(output_dict.items())

Parse text file which groups data

Trying to figure out how to extract strings and put into new file on new line for each string
Can't get my head around RegEx and all the things I'm looking at online show the data all being on one line but mine is already separated.
Trying to parse the output of another program, it outputs three lines Date,Address,Name and then has a newline and another set of three and I only need Address.
fo = open("C:\Sampledata.txt", "r")
item = fo.readlines(
Not even got anything working yet!
outList = []
inText = open("C:\Sampledata.txt", "r").read()
for line in inText.split("\n"):
Date,Address,Name = line.split(",")
outList .append(Address)
outText = "\n".join(outList )
open("outFile.txt","w").write(outText )
I'm not quite sure if this addesses your problem, but maybe something like:
addresses = list()
with open("file1", "r") as input:
for line in input:
if line.startswith("Address"):
addresses.append(line.strip("\n"))
Edit: Or if "Adress" is only contained once per file you can break the loop after detecting the line starting with "Address":
addresses = list()
with open("file1", "r") as input:
for line in input:
if line.startswith("Address"):
addresses.append(line.strip("\n"))
break
Then you can write all adresses into a new file.
with open("newFile", "w") as outfile:
for adress in addresses:
outfile.write(adress + "\n")

How I can extract the portion of words from the file using python3.6?

I want to extract the specific word from the text file.
Here is the example text file:
https://drive.google.com/file/d/0BzQ6rtO2VN95d3NrTjktMExfNkU/view?usp=sharing
Kindly review it.
I am trying to extract the string as:
"Name": "the name infront of it"
"Link": "Link infront of it"
Say from the input file, I am expecting to get output like this:
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
.
.
.
"Name":"directNIC"
"Link":"http://directnic.com"
If these words are anywhere in the file, it should get extracted to another file.
Kindly let me know how I can achieve this sort of extraction? Kindly consider the file as the small part of big file.
Also, it is text file not json.
Kindly help me.
Since the text file is not formatted properly, the only option for you is regex. The below snippet works for the given sample file.
Keep in mind that this requires you to load the entire file into memory
import re, json
f = open(r'filepath')
textCorpus = f.read()
f.close()
# replace empty strings to non-empty, match regex easily
textCorpus = textCorpus.replace('""', '" "')
lstMatches = re.findall(r'"Name".+?"Link":".+?"', textCorpus)
with open(r'new_file.txt', 'ab+) as wf:
for eachMatch in lstMatches:
convJson = "{" + eachMatch + "}"
json_data = json.loads(convJson)
wf.write(json_data["Name"] + "\n")
wf.write(json_data["Link"] + "\n")
Short solution using re.findall() and str.split() functions:
import re
with open('test.txt', 'r') as fh:
p = re.compile(r'(?:"Categories":[^,]+,)("Name":"[^"]+"),(?:[^,]+,)("Link":"[^"]+")')
result = [pair for l in re.findall(p, fh.read()) for pair in l]
print('\n'.join(result))
The output(fragment):
"Name":"JTLnet"
"Link":"http://jtlnet.com"
"Name":"Apache 1.3"
"Link":"http://httpd.apache.org/docs/1.3"
"Name":"Apache"
"Link":"http://httpd.apache.org/"
"Name":"PHP"
....
Your file is a wrongly formatted json with extraneous double quote. But it is enough for the json module not to be able to load it. You are left with lower level regex parsing.
Assumptions:
the interesting part after "Name" or "Link" is:
separated from the identifier by a colon (:)
enclosed in double quotes (") with no included double quote
the file is structured in lines
Name and Link fields are always on one single line (no new line in fields)
You can process your file line by line with a simple re.finditer on each line:
rx = re.compile(r'(("Name":".*?")|("Link":".*?"))')
with open(inputfile) as fd:
for line in fd:
l = rx.finditer(line)
for elt in l:
print(elt.group(0))
If you want to output data to another file, just open it before above snippet with open(outputfile, "w") as fdout: and replace the print line with:
fdout.write(elt.group(0) + "\n")

Find and write certain words in lines to a file in python

I have a .txt file in cyrillic. It's structure is like that but in cyrillic:
city text text text.#1#N
river, text text.#3#Name (Name1, Name2, Name3)
lake text text text.#5#N (Name1)
mountain text text.#23#Na
What I need:
1) look at the first word in a line
2) if it is "river" then write all words after "#3#", i.e. Name (Name1, Name2, Name3) in a file 'river'.
That I have to do also with another first words in lines, i. e. city, lake, mountain.
What I have done only finds if the first word is "city" and saves whole line to a file:
lines = f.readlines()
for line in lines:
if line.startswith('city'):
f2.write(line)
f.close()
f2.close()
I know I can use regex to find Names: #[0-9]+#(\W+) but I don't know how to implement it to a code.
I really need your help! And I'm glad for any help.
If all of your river**s have ,s after them, like in the above code you posted, I would do something like:
for line in f.readlines():
items = line.split("**,")
if items[0] == "**river":
names = line.split("#")[1].strip().split("(")[1].split(")")[0].split(",")
names = [Name1, Name2, Name3]
#.. now write each one
What you want to do here is avoid hard-coding the names of the files you need. Instead, glean that from the input file. Create a dictionary of the files you need to writing to, opening each one as it's needed. Something like this (untested and probably in need of some adaptation):
outfiles = {}
try:
with open("infile.txt") as infile:
for line in infile:
tag = line.split(" ", 1)[0].strip("*, ") # e.g. "river"
if tag not in outfiles: # if it's the first time we've seen a tag
outfiles[tag] = open(tag = ".txt", "w") # open tag.txt to write
content = line.rsplit("#", 1)[-1].strip("* ")
outfiles[tag].write(content + "\n")
finally:
for outfile in outfiles.itervalues():
outfile.close()

Categories

Resources