How To Just See the Song Lyrics? - python

I am trying to do a project involving song lyrics. I am using the lyricssgenius module in Python and making requests to find the lyrics like this:
LyricsGenius.search_song(song_title, artist_name)
However, when I set the above code to a variable and print the .lyrics attribute of the variable, I get a lot of extra stuff that aren't a part of the song lyrics. For example, the lines before each verse saying [Verse 1: artist name] etc etc. What would be the best way to get rid of this stuff so I am left only with the lyrics and no other extra stuff? I left an example of what I see below:
She Knows Lyrics[Intro: J. Cole & Amber Coffman (Sampled)]
*lyrics start on following line*
[Pre-Chorus]
Last night I dreamt of San Pedro
Thanks for your help

Related

Simplify song and artist names

Introduction
Hello. I am currently building a web application that takes a random song and put it into a spotify playlist. (The user can't choose which songs he wants)
So I search the input with the spotify api and get a list of results.
Problem
Since spotify is returning not always the best result, I wanted to loop through the results and find the best matching one. How would you achieve the best result?
My attempt
The first thing I tried, was matching the strings with the fuzzywuzzy library.
This looked something like this:
song_ratio = ratio(real_song_name,result_song_name)
This was good and it helped a lot but what is with songs that just have a different punctuation?
So what I did is removing the punctuation with:
song_name = song_name.translate(str.maketrans('', '', punctuation))
I want also want to avoid Karaoke, Remastered or Live Versions, etc. e.g.:
Stay with Me Till Dawn - Live in the UK, 1982 / 2010 Remaster from Judie Tzuke
Just filtering by this names would make no sense because they appear not in the same shape.
Another problem:
Searching for the song "Fascination" from "Jane Morgan And The Troubadors"
What I get is:
Best found song: Its Been A Long Long Time to 22 % match<br>
Best found artist: Jane Morgan 54 %
Would I just have queried for the song "Fascination" from "Jane Morgan" i would get:
Best found song: Fascination 100 % <br>
Best found artist: Jane Morgan 100 %
Question
What is a good way to solve this issue? Is it possible to train a neural network to process my strings into the right format and then find the best matching?
Something you could try is to use the advanced query syntax offered by Spotify search, and only search for part of the song title/artist name. For example your query for "Fascination" from "Jane Morgan And The Troubadors" could become:
artist:"Jane Mo" track:"Fascin"
and still return the correct result.
This query looks for the exact string 'Jane M' appearing in the artist name and 'Fascin' in the track title.

For loop appends each new element to a list, but '\n' appears in place of commas

EDIT: Thank you all for the very helpful answers. Indeed, as suggested in the responses to this post, school_list did not in fact contain hundreds of list items, it contained only four. This didn't stop school.text from grabbing all the hundreds of places within those four elements that included the text of a school name.
Original post:
I'm trying to iterate over each school name on a web page containing hundreds of school names, and append each school name to a list called list_of_names. I am using the Python library Selenium to access the web page and locate the HTML element which contains the list of school names.
driver.get('https://www.illinoisreportcard.com/SearchResult.aspx?SearchText=$high%20school$&type=NAME#High-schools')
school_list = driver.find_elements_by_class_name('container.col-sm-12.col-md-12')
list_of_names = []
for school in school_list:
try:
name = school.text
print(name)
list_of_names.append(name)
except selenium.common.exceptions.NoSuchElementException:
pass
As you can see below, where I've included the first three out of hundreds of results, the loop successfully prints the names of the schools plus grade information (which it has grabbed from each specified element of the HTML code).
ALLEN JUNIOR HIGH SCHOOL
(4 - 8)
LA MOILLE CUSD 303
(BUREAU)
LA MOILLE
CENTRALIA JR HIGH SCHOOL
(4 - 8)
The problem is that this line of code -- list_of_names.append(name) -- is not appending each of the school names as a list item surrounded by commas as separators, as I would have expected. Instead, it is appending each school name to one single list item that merely grows longer and longer. And in place of where commas should be, it is putting an '\n'.
Below is the first line of output of the command print(list_of_names):
['ALLEN JUNIOR HIGH SCHOOL\n(4 - 8)\nLA MOILLE CUSD 303\n(BUREAU)\nLA MOILLE\nCENTRALIA JR HIGH SCHOOL\n(4 - 8)\nCENTRALIA SD 135\n(MARION)\
(I have tried versions of this on smaller lists of elements outside of HTML and thus without the need for the Selenium try/except code at the very bottom here, and it worked. But that still doesn't get me any closer to being able to deploy this code on the web page with the school names.)
What is going on? Why isn't this code appending each school name to list_of_names as individual items in a list?
Appreciate any help!
The variable "school_list" is not a list rather it's a string.
So essentially the for loop runs only once. "\n" is an escape sequence for "new line", which is why you are getting the output in the print statement
If you want the varaible "list_of_names" to have elements as show in your print statement you can replace the for loop with
for school in school_list.split('\n'):

Regex and pandas: extract partial string on name match

I have a pandas data frame containing instances of web chat between two people, the customer and the service desk operator.
The customers name is always announced in the first line of the web chat as the customer enters the conversation.
Example 1:
In: df['log'][0]
Out: [14:40:48] You are joining a chat with James[14:40:48] James: Hello, I\'m looking to find out more about the services and products you offer.[14:41:05] Greg: Thank you for contacting us. [17:41:14] Greg: Could I start by asking what services lines or products you are interested in knowing more about, please?[14:41:23] James: I would like to know more about your gardening and guttering service.[14:43:20] James: hello?[14:43:32] Greg: thank you, for more information on those please visit www.example.com/more_examples.[14:44:12] James: Thanks[14:44:38] James has exited the session.
Example 2:
In: df['log'][1]
Out: [09:01:25] You are joining a chat with Roy Andrews[09:01:25] Roy Andrews: I\'m asking on behalf of partner whether she is able to still claim warranty on a coffee machine she purchased a year and a half ago? [09:02:00] Jamie: Thank you for contacting us. Could I start by asking for the type of coffee machine purchased please, and whether she still has a receipt?[09:02:23] Roy Andrews: BRX0403, she no longer has a receipt.[09:05:30] Jamie: Thank you, my interpretation is that she would not be able to claim as she is no longer under warranty. [09:08:46] Jamie: for more information on our product warranty policy please see www.brandx.com/warranty-policy/information[09:09:13] Roy Andrews: Thanks for the links, I will let her know.[09:09:15] Roy Andrews has exited the session.
The names in the chat always vary as different customers use the web chat service.
A customer can enter chat having one or more names. Example:
James
Ravi
Roy Andrews.
Requirements:
I would like to separate all instances of customer chat (e.g. chat by James and Roy Andrews) from the df['log'] column into a new column df[text_analysis].
From example 1 above this would look like:
In: df['text_analysis][0]
Out: [14:40:48] You are joining a chat with James[14:40:48] James: Hello, I\'m looking to find out more about the services and products you offer.[14:41:23] James: I would like to know more about your gardening and guttering service.[14:43:20] James: hello?[14:44:12] James: Thanks
EDIT:
The optimal solution would extract the substrings as provided in the example above and omit the final time stamp [14:44:38] James has exited the session..
What I have tried so far:
I have extracted the customer names from the df['log'] column into a new column called df['names'] using:
df['names'] = df['log'].apply(lambda x: x.split(' ')[7].split('[')[0])
I wanted to use the names in the df['names'] column to use in a str.split() pandas function -- something along the lines of:
df['log'].str.split(df['names']) however this does not work and if the split did occur under this scenario I think it would not properly split the customer and service operator chats apart.
Also I have tried incorporating the names into a regex type solution:
df['log'].str.extract('([^.]*{}[^.]*)').format(df['log']))
But this does not work either (because I'm guessing that .extract() does not support format.
Any help would be appreciated.
Use regex, longs is your first paragraph:
import re
re.match(r'^.*(?=\[)', longs).group()
Result:
"[14:40:48] You are joining a chat with James[14:40:48] James: Hello, I'm looking to find out more about the services and products you offer.[14:41:05] Greg: Thank you for contacting us. [17:41:14] Greg: Could I start by asking what services lines or products you are interested in knowing more about, please?[14:41:23] James: I would like to know more about your gardening and guttering service.[14:43:20] James: hello?[14:43:32] Greg: thank you, for more information on those please visit www.example.com/more_examples.[14:44:12] James: Thanks"
You can package this regex function into your dataframe:
df['text_analysis'] = df['log'].apply(lambda x: re.match(r'^.*(?=\[)', x).group())
Explanations: regex string '^.*(?=\[)' means: from beginning ^, match any number of any character .*, ends with [ but do not include it (?=\[). Since regex matches the longest string this will go from the beginning till the last [, and does not include [.
Individual lines can be extracted this way:
import re
customerspeak = re.findall(r'(?<=\[(?:\d{2}:){2}\d{2}\]) James:[^\[]*', s)
output:
[" James: Hello, I'm looking to find out more about the services and products you offer.",
' James: I would like to know more about your gardening and guttering service.',
' James: hello?',
' James: Thanks']
If you want these in the same line, you can ''.join(customerspeak)

Python Exact Match- Absolute exact match

Based on the code I have I am trying to find an exact match to any of the job positions listed in the input.
INPUT
this is str contains specific MATCH
dfp1[dfp1.index.str.match('Teacher|Dentist|General Manager|District Manager|Bus Driver|Team Lead|Dancer')]
Output is:
Teacher
Teacher, Middle
Teacher, High
Dentist, Sanford
Dentist
General Manager
General Manager, Dollar Tree
Team Lead
Dancer, 10th
Dancer
Dancer, Previous
I do not want anything extra other than the exact job position I put in the input. I want to specifically see only Teacher or Dentist or General Manager or District Manager or Bus Driver or Team Lead or Dancer.
I am not sure what my code is missing for it to display the job titles and no others.
Fixed your regex. You need to add a ^ at the beginning and a $ at the end.
dfp1[dfp1.index.str.match('^(Teacher|Dentist|General Manager|District Manager|Bus Driver|Team Lead|Dancer)$')]

Parsing txt file in python where it is hard to split by delimiter

I am new to python, and am wondering if anyone can help me with some file loading.
Situation is I have some text files and i'm trying to do sentiment analysis. Here's the text file. It is split into three category: <department>, <user>, <review>
Here are some sample data:
men peter123 the pants are too tight for my liking!
kids georgel i really like this toy, it keeps my kid entertained for days! It is affordable and comes on time, i strongly recommend it
health kksd1 the health pills is drowsy by nature, please take care and do not drive after you eat the pills
office ty7d1 the printer came on time, the only problem with it is with the duplex function which i suspect its not really working
I want to make into this
<category> <user> <review>
I have 50k lines of these data.
I have tried to load directly into numpy, but it says its an empty separator error. I looked up stackoverflow, but i couldn't find a situation where it applies to different number of delimiters. For instance, i will never get to know how many spaces are there in the data set that i have.
My biggest problem is, how do you count the number of delimiters and give them column. Is there a way that I can make into three categories <department>, <user>, <review>. Bear in mind that the review data can contain random commas and spaces which i can't control. So the system must be smart enough to pick up!
Any ideas? Is there a way that i can tell python that after you read the user data, then everything behind falls under review?
With data like this I'd just use split() with the maxplit argument:
If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements).
Example:
from StringIO import StringIO
s = StringIO("""men peter123 the pants are too tight for my liking!
kids georgel i really like this toy, it keeps my kid entertained for days! It is affordable and comes on time, i strongly recommend it
health kksd1 the health pills is drowsy by nature, please take care and do not drive after you eat the pills
office ty7d1 the printer came on time, the only problem with it is with the duplex function which i suspect its not really working""")
for line in s:
category, user, review = line.split(None, 2)
print ("category: {} - user: {} - review: '{}'".format(category,
user,
review.strip()))
The output is:
category: men - user: peter123 - review: 'the pants are too tight for my liking!'
category: kids - user: georgel - review: 'i really like this toy, it keeps my kid entertained for days! It is affordable and comes on time, i strongly recommend it'
category: health - user: kksd1 - review: 'the health pills is drowsy by nature, please take care and do not drive after you eat the pills'
category: office - user: ty7d1 - review: 'the printer came on time, the only problem with it is with the duplex function which i suspect its not really working'
For reference:
https://docs.python.org/2/library/stdtypes.html#str.split
What about doing it sorta manually:
data = []
for line in input_data:
tmp_split = line.split(" ")
#Get the first part (dept)
dept = tmp_split[0]
#get the 2nd part
user = tmp_split[1]
#everything after is the review - put spaces inbetween each piece
review = " ".join(tmp_split[2:])
data.append([dept, user, review])

Categories

Resources