I am new to python and I have been searching for a method to replace a series of patterns and cannot find a method that uses regex, none of which I found have worked for me, here are some of my patterns and the code I am using:
regexes = {
r'\s(\(|\[)(.*?)Mix(.*?)(\)|\])/i' : r"",
r'\s(\(|\[)(.*?)Version(.*?)(\)|\])/i' : r"",
r'\s(\(|\[)(.*?)Remix(.*?)(\)|\])/i' : r"",
r'\s(\(|\[)(.*?)Extended(.*?)(\)|\])/i' : r"",
r'\s\(remix\)/i' : r"",
r'\s\(original\)/i' : r"",
r'\s\(intro\)/i' : r"",
}
def multi_replace(dict, text):
for key, value in dict.items():
text = re.sub(key, value, text)
return text
filename = "Testing (Intro)"
name = multi_replace(regexes, filename)
print(name)
I am pulling filenames from directories of music I own as I am a DJ, I belong to many record pools and they label their songs sometimes as follows;
SomeGuy - Song Name Here (Intro)
SomeGirl - Song Name Here (Remix)
SomeGirl - Song Name Here (Extended Version)
SomeGuy - Song Name Here (12" Mix Vocal)
and so on...
my regex above works in PHP in which it will remove all the values like (Intro) (Remix) (Extended Version), etc. so the output is;
SomeGuy - Song Name Here
SomeGirl - Song Name Here
SomeGirl - Song Name Here
SomeGuy - Song Name Here
and so on...
For ignorecase you need to use re.I or re.IGNORECASE
Try with this code:
import re
regexes = {
r'\s(\(|\[)(.*?)Mix(.*?)(\)|\])' : r"",
r'\s(\(|\[)(.*?)Version(.*?)(\)|\])' : r"",
r'\s(\(|\[)(.*?)Remix(.*?)(\)|\])' : r"",
r'\s(\(|\[)(.*?)Extended(.*?)(\)|\])' : r"",
r'\s\(remix\)' : r"",
r'\s\(original\)' : r"",
r'\s\(intro\)' : r"",
}
def multi_replace(dict, text):
for key, value in dict.items():
p = re.compile(key, re.I)
text = p.sub(value, text)
return text
filename = "Testing (Intro)"
name = multi_replace(regexes, filename)
print(name)
Related
I'm trying to create a data frame out of values in the XML tags in multiple XML files. While the code is working where the tags are not repeated, when they do repeat, I'm seeing values as doctionary values such as {'Michael','James',' '} for a column say FName, {'', ' '} for MName and {'Smith', ' '} for LName among others under the data frame columns. What should be the approach to fix this issue in the code?
def parse_XML(list_of_trees, df_cols):
def get_el(el_list):
if len(el_list) > 1:
return [el_text.text for el_text in el_list]
else:
return el_list[0].text
rows = []
for tree in list_of_trees:
xroot = tree.getroot()
res = {}
for node in xroot:
for el in df_cols[0:]:
if node is not None and node.find(f".//{el}") is not None:
el_res = get_el(node.findall(f".//{el}"))
if el not in res:
res[el] = el_res
elif type(res[el]) == list:
res[el].extend(el_res)
else:
res[el] = [res[el], el_res]
rows.append(res)
out_df = pd.DataFrame(rows, columns=df_cols)
return out_df
XML example:
<PD>
<Clt>
<PType>xxxx</PType>
<PNumber>xxxxx</PNumber>
<UID>xxxx</UID>
<TEfd>xxxxx</TEfd>
<TExd>xxxxxx</TExd>
<DID>xxxxx</DID>
<CType>xxxxx</CType>
<FName>Michael</FName>
<MName></MName>
<LName>Smith</LName>
<FName>James</FName>
<MName> </MName>
<LName> </LName>
<MAL>Home</MAL>
<AddressLine1>xxxx</AddressLine1>
<AddressLine2>xxxx</AddressLine2>
<AddressLine3></AddressLine3>
<City>xxxx</City>
<State>xx</State>
<ZipCode>xxxx</ZipCode>
<Country>xxxx</Country>
<Pr>
<PrType>xxxxx</PrType>
<PrName>xxxxxx</PrName>
<PrID>xxxxxx</PrID>
</Pr>
</Clt>
<CData>
<InceptionYear>2021</InceptionYear>
<FName> </FName>
</CData>
</PD>
Current output
PNumber FName MName LName
xxxx {Michael,James,''} {'',' '} {Smith,''}
Expected Output
PNumber FName MName LName
xxxx Michael Smith
xxxx James
xxxx NULL NULL
Given your XML is relatively shallow, consider the new IO method, pandas.read_xml (introduced in v1.3). The method even includes a names argument to rename same named elements.
out_df = pd.read_xml(
"input.xml", # OR xroot.tostring()
xpath = ".//Clt",
names = [
"PType",
"PNumber",
"UID",
"TEfd",
"TExd",
"DID",
"CType",
"FName1",
"MName1",
"LName1",
"FName2",
"MName2",
"LName2",
"MAL",
"AddressLine1",
"AddressLine2",
"AddressLine3",
"City",
"State",
"ZipCode",
"Country",
"PR"
]
)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm a beginner trying to build a simple library management system using Python. Users can search a book from a list of many books stored in a text file. Here is an example of what is in the text file:
Author: J.K Rowling
Title: Harry Potter and the Deathly Hollow
Keywords: xxxx
Published by: xxxx
Published year: xxxx
Author: Stephen King
Title: xxxx
Keywords: xxxx
Published by: xxxx
Published year: xxxx
Author: J.K Rowling
Title: Harry Potter and the Half Blood Prince
Keywords: xxxx
Published by: xxxx
Published year: xxxx
This is where it gets difficult for me. There is a Search by Author option for the user to search books. What I want to do is when the users search for any authors (e.g. J.K Rowling), it would output all (in this case, there are two J.K Rowling books) of the related components (Author, Title, Keywords, Published by, Published year). This is the last piece of the program, which I'm having very much difficulty in doing. Please help me, and thank you all in advance.
Is it possible for you to implement the text file in the form of a JSON file instead? It could be a better alternative since you could easily access all the values depending on the key you have chosen and search through those as well.
{
"Harry Potter and the Deathly Hollow" :
{
"Author": "J.K Rowling",
"Keywords": xxxx,
"Published by": xxxx,
"Published year": xxxx
},
'Example 2' :
{
"Author": "Stephen King"
"Keywords": xxxx
"Published by": xxxx
"Published year": xxxx
}
}
You can iterate through the lines of the text file like this:
with open(r"path\to\text_file.txt", "r") as books:
lines = books.readlines()
for index in range(len(lines)):
line = lines[index]
Now, get the author of each book by splitting the line on the ":" character and testing if the first part == "Author". Then, get the second part of the split string and strip it of the "\n" [newline] and " " characters to make sure there are no extra spaces or anything that will mess up the search on either side. I would also recomment lowercasing the author name and search query to make capitalisation not matter. Test if this is equal to the search query:
if line.split(":")[0] == "Author" and\
line.split(":")[1].strip("\n ").lower() == search_query.lower():
Then, in this if loop, print out all the required information about this book.
Completed code:
search_query = "J.K Rowling"
with open(r"books.txt", "r") as books:
lines = books.readlines()
for index in range(len(lines)):
line = lines[index]
if line.split(":")[0] == "Author" and line.split(":")[1].strip("\n ").lower() == search_query.lower():
print(*lines[index + 1: index + 5])
Generally, a lot of problems to be programmed can be resolved into a three-step process:
Read the input into an internal data structure
Do processing as required
Write the output
This problem seems like quite a good fit for that pattern:
In the first part, read the text file into an in-memory list of either dictionaries or objects (depending on what's expected by your course)
In the second part, search the in-memory list according to the search criteria; this will result in a shorter list containing the results
In the third part, print out the results neatly
It would be reasonable to put these into three separate functions, and to attack each of them separately
# To read the details from the file ex books.txt
with open("books.txt","r") as fd:
lines = fd.read()
#Split the lines based on Author. As Author word will be missing after split so add the Author to the result. The entire result is in bookdetails list.
bookdetails = ["Author" + line for line in lines.split("Author")[1:]]
#Author Name to search
authorName = "J.K Rowling"
# Search for the given author name from the bookdetails list. Split the result based on new line results in array of details.
result = [book.splitlines() for book in bookdetails if "Author: " + authorName in book]
print(result)
If you will always receive this format of the file and you want to transform it into a dictionary:
def read_author(file):
data = dict()
with open(file, "r") as f:
li = f.read().split("\n")
for e in li:
if ":" in e:
data[e.split(":")[0]] = e.split(":")[1]
return data['Author']
Note: The text file sometimes has empty lines so I check if the line contains the colon (:) before transforming it into a dict.
Then if you want a more generic method you can pass the KEY of the element you want:
def read_info(file, key):
data = dict()
with open(file, "r") as f:
li = f.read().split("\n")
for e in li:
if ":" in e:
data[e.split(":")[0]] = e.split(":")[1]
return data[key]
Separating the reading like the following you can be more modular:
class BookInfo:
def __init__(self, file) -> None:
self.file = file
self.data = None
def __read_file(self):
if self.data is None:
with open(self.file, "r") as f:
li = f.read().split("\n")
self.data = dict()
for e in li:
if ":" in e:
self.data[e.split(":")[0]] = e.split(":")[1]
def read_author(self):
self.__read_file()
return self.data['Author']
Then create objects for each book:
info = BookInfo("book.txt")
print(info.read_author())
I want to replace a word from the text taken by css selector like
company_name = browser.find_element_by_css_selector("body > company.b1").text
let's it took the text like this "GBH Global"
description = browser.find_element_by_css_selector("body > p.b1").text
let's it took the text like this "This is companyname we are based in london"
and I want to do this and replace the company name with GBH Global like "This is GBH Global we are based in london"
company_description = browser.find_element_by_css_selector("body > p.b1 > input")
company_description.send_keys(description)
I want to send like this ""This is GBH Global we are based in london" using selenium and python
I have this text "This is companyname we are based in london" and I can change its format for the code to work it properly...
Presumably the text extracted by the line of code:
browser.find_element_by_css_selector("body > company.b1").text
i.e. GBH Global would be a variable. In that case you can replace the text companyname as follows:
company_name = browser.find_element_by_css_selector("body > company.b1").text
description = browser.find_element_by_css_selector("body > p.b1").text
text_to_replace = browser.find_element_by_css_selector("body > company.b1").text.split()[2]
print(description.replace(text_to_replace, "{}".format(company_name)))
Following should work
company_name = "GBH Global"
description = "This is companyname we are based in london"
company_description = description.replace("companyname", company_name)
I want to extract the name, email and phone number of all the conversations and then save them into different variables. I want to save it like this: a=max, b=email and so on.
This is my text file:
[11:23] max : Name : max
Email : max#gmail.com
Phone : 01716345678
[11:24] harvey : hello there how can i help you
[11:24] max : can you tell me about the latest feature
and this is my code. What am I missing here?
in_file = open("chat.txt", "rt")
contents = in_file.read()
#line: str
for line in in_file:
if line.split('Name :'):
a=line
print(line)
elif line.split('Email :'):
b = line
elif line.split('Phone :'):
c = line
else:
d = line
That's not what split does, at all. You might be getting it confused with in.
In any case, a regular expression will do:
import re
string = '''[11:23] max : Name : max
Email : max#gmail.com
Phone : 01716345678
[11:24] harvey : hello there how can i help you
[11:24] max : can you tell me about the latest feature'''
keys = ['Name', 'Email', 'Phone', 'Text']
result = re.search('.+Name : (\w+).+Email : ([\w#\.]+).+Phone : (\d+)(.+)', string, flags=re.DOTALL).groups()
{key: data for key, data in zip(keys, result)}
Output:
{'Name': 'max',
'Email': 'max#gmail.com',
'Phone': '01716345678',
'Text': '\n\n[11:24] harvey : hello there how can i help you\n[11:24] max : can you tell me about the latest feature'}
Remove this line in your code:
"contents = in_file.read()"
Also, use "in" instead of "split":
in_file = open("chat.txt", "rt")
for line in in_file:
if ('Name') in line:
a=line
print(a)
elif 'Email' in line:
b = line
print(b)
elif 'Phone' in line:
c = line
print(c)
else:
d = line
print(d)
I need split a substring from a string, exactly this source text:
Article published on: Tutorial
I want delete "Article published on:" And leave only
Tutorial
, so i can save this
i try with:
category = items[1]
category.split('Article published on:','')
and with
for p in articles:
bodytext = p.xpath('.//text()').extract()
joined_text = ''
# loop in categories
for each_text in text:
stripped_text = each_text.strip()
if stripped_text:
# all the categories together
joined_text += ' ' + stripped_text
joined_text = joined_text.split('Article published on:','')
items.append(joined_text)
if not is_phrase:
title = items[0]
category = items[1]
print('title = ', title)
print('category = ', category)
and this don't works, what im missing?
error with this code:
TypeError: 'str' object cannot be interpreted as an integer
You probably just forgot to assign the result:
category = category.replace('Article published on:', '')
Also it seems that you meant to use replace instead of split. The latter also works though:
category = category.split(':')[1]