Extracting (human) names from a description - python

I'm writing a piece of software that processes TV guide listings and converts them to XMLTV.
I've noticed that a lot of my descriptions contains who hosts the show - and I would like to be able to extract that information.
One of the methods I first looked at was regex, however my regex skills aren't great, and there doesn't seem to be a great way to achieve it anyway.
Another option is NLP, however that seems to be a bit over the top for what I need, especially since my descriptions share a common prefix (Hosted by). However, this may be the method I will go with, as it could be the most reliable and easy to use.
For reference, here is an example dataset - some are real, some are made up.
['Hosted by Jim Bolger, John James and Jim Bob, The Project is a show that exists',
'Hosted by Lisa Owen, Newshub Nation is an in-depth weekly current affairs show focusing on the major players and forces that shape New Zealand.',
'A fast paced wrap of all things entertainment, celebrity and Bravo hosted by Cassidy Morris',
'Hosted by chef Guy Fieri, Minute To Win It sees competitors take on a series of seemingly simple tasks while under a one-minute time limit.',
'Hosted by Jim van de Allen, Tom Scott and Petra Grazing, this is fair go',
'Hosted by Zyon Zickle, Johnny Boi and Zippy De Phrasee, The News looks at the important things that affect all Martians',
'Lorem is a magical substence wondered about by generations of things. This series hosted by Jim Tokien, explores this thingie']
I would much rather quality over quantity - so I'd rather it find less that matches, but most of the matches to be accurate rather than a lot of inaccurate matches.
Am I overthinking this? Is there a simpler way to do it? Any help would be greatly appreciated.

Related

(Beginner to) NLP:I am trying to understand how I can categorise words in text to identify all the words related to a topic

I have scraped a website using BeautifulSoup and now I want to analyse all the text that I have scraped and create a long-list of food items that occur in that piece of text.
Example text
If you’re a vegetarian and forever lamenting the fact that you can’t have wontons, these guys are for you! The filling is made with a simple mix of firm tofu crumbles, seasoned with salt, ginger, white pepper, and green onions. It’s super simple but so satisfying.
Make sure you drain your tofu well and dry it out as much as possible so that the filling isn’t too wet. You can even go a step further and give it a press: line a plate with paper towels, the put some paper towels on top and weigh the tofu down with another plate.
The best thing about these wontons is that the filling is completely cooked so you can adjust the seasoning just by tasting. Just make sure that the filling is slightly more saltier than you would have it if you were just eating it on it’s own. Wonton wrappers don’t have much in the way of seasoning.
These guys cook up in a flash because all you’re doing is cooking the wonton wrappers. Once you pop them in the boiling water and they float to the top, you’re good to go. Give them a toss in a spicy-soy-vinegar dressing and you’re in heaven!
I would like to create a long list from this which identifies:
wontons, tofu, vinegar, white pepper, onions, salt
I am not sure how I can do this without having a pre-existing list of food items. Therefore, any suggestions would be great. Looking for something which can do this automatically without too much manual intervention! (I am quite new to NLP and deep learning and so any articles/ methods you recommend would be super useful!)
Thanks!
If you are new in this field you can use the GENSIM, a free python library for topic modeling.You can extract the food items using Latent Semantic Analysis or Similarity Queries.
https://radimrehurek.com/gensim/index.html

How can I remove a newline character in a string in python

I use BeautifulSoup to extract a website and I got a the text need. The problem is it has "\n" character in the text which I need to remove.
sample output text:
\nI went to an advance screening of this movie thinking I was about
to\nembark on 120 minutes of cheezy lines, mindless plot, and the kind
of\nnauseous acting that made "The Postman" one of the most
malignant\ndisplays of cinematic blundering of our time. But I was
shocked.\nShocked to find a film starring Costner that appealed to the
soul of\nthe audience. Shocked that Ashton Kutcher could act in such a
serious\nrole. Shocked that a film starring both actually engaged and
captured\nmy own emotions. Not since 'Robin Hood' have I seen this
Costner: full\nof depth and complex emotion. Kutcher seems to have
tweaked the serious\nacting he played with in "Butterfly Effect".
These two actors came into\nthis film with a serious, focused attitude
that shone through in what I\nthought was one of the best films I've
seen this year. No, its not an\nOscar worthy movie. It's not an epic,
or a profound social commentary\nfilm. Rather, its a story about a
simple topic, illuminated in a way\nthat brings that audience to a
higher level of empathy than thought\npossible. That's what I think
good film-making is and I for one am\nthroughly impressed by this
work. Bravo!\n
I tried the below methods to remove the new line.
method 1 - regex
x = review_text.get_text()
y = re.sub(r'(\n)','',x)
method 2 - rstrip
x = review_text.get_text()
x.rstrip()
Neither of this methods are working for me.
When I use split
x = review_text.get_text()
print(x.split("\n"),"\n\n")
The output is as follows
['\nI went to an advance screening of this movie thinking I was about
to\nembark on 120 minutes of cheezy lines, mindless plot, and the
kind of\nnauseous acting that made "The Postman" one of the most
malignant\ndisplays of cinematic blundering of our time. But I was
shocked.\nShocked to find a film starring Costner that appealed to
the soul of\nthe audience. Shocked that Ashton Kutcher could act in
such a serious\nrole. Shocked that a film starring both actually
engaged and captured\nmy own emotions. Not since \'Robin Hood\' have
I seen this Costner: full\nof depth and complex emotion. Kutcher
seems to have tweaked the serious\nacting he played with in
"Butterfly Effect". These two actors came into\nthis film with a
serious, focused attitude that shone through in what I\nthought was
one of the best films I\'ve seen this year. No, its not an\nOscar
worthy movie. It\'s not an epic, or a profound social
commentary\nfilm. Rather, its a story about a simple topic,
illuminated in a way\nthat brings that audience to a higher level of
empathy than thought\npossible. That\'s what I think good film-making
is and I for one am\nthroughly impressed by this work. Bravo!\n']
what should I do to remove the new lines from the text.
Thank you.
Are you sure it's '\n' character and not '\\n' two character sequence? If it's '\n', x.rstrip() should work. Otherwise, try x.replace('\\n','')
If s is, a string such as:
\nNo, its not an\nOscar worthy movie. It's not an epic, or a profound social commentary\nfilm. Rather, its a story about a simple topic, illuminated in a way\nthat brings that audience to a higher level of empathy than thought\npossible. That's what I think good film-making is and I for one am\nthroughly impressed by this work. Bravo!\n
then s.strip() will remove trailing and leading whitespace, which includes newlines:
No, its not an\nOscar worthy movie. It's not an epic, or a profound social commentary\nfilm. Rather, its a story about a simple topic, illuminated in a way\nthat brings that audience to a higher level of empathy than thought\npossible. That's what I think good film-making is and I for one am\nthroughly impressed by this work. Bravo!
To remove all the other \n, replace them with " " for a space or "" to remove completely
s.replace("\n", " ").strip()
No, its not an Oscar worthy movie. It's not an epic, or a profound social commentary film. Rather, its a story about a simple topic, illuminated in a way that brings that audience to a higher level of empathy than thought possible. That's what I think good film-making is and I for one am throughly impressed by this work. Bravo!
You should be able to use x=x.replace("\n", "") to take out the newline.

Writing tuple output to a text file

Is it possible to write my output tuple to a text file? I am using following code to get text between two strings as write them to a text file:
def format_file(file, start, end):
f = open('C:\TEMP\Test.txt', 'r').read()
return tuple(x for x in ''.join(f.split(start)).replace('\n', '').split(end) if x != '')
print (format_file('XYZ', 'Q2 2016 Apple Inc Earnings Call - Final', 'Event Brief of Q1 2016 Apple Inc Earnings Call - Final'))
file = open('C:\TEMP\out.txt', 'w')
file.write(format_file('XYZ', 'Q2 2016 Apple Inc Earnings Call - Final', 'Event Brief of Q1 2016 Apple Inc Earnings Call - Final'))
But I keep getting following error:TypeError: write() argument must be str, not tuple.
When I try to return output as a string instead of a tuple I get a blank file. I would really appreciate any help on this one.
here is my input file text:
Q2 2016 Apple Inc Earnings Call - Final
OPERATOR: From Piper Jaffray, we'll hear from Gene Munster.
GENE MUNSTER, ANALYST, PIPER JAFFRAY & CO.: Good afternoon. Tim, can you talk a little bit about the iPhone ASP trends, and specifically you mentioned that the SE is going to impact, but how are you thinking about the aspirational market share that's out there, and your actual market share, and using price to close that gap? Is it just the SE or could there be other iPhone models that will be discounted, to try to be more aggressive in emerging markets?
And one for Luca. Can you talk a little bit about the services segment, in terms of what piece of the services is driving growth, and maybe a little bit about the profitability on a net basis versus the growth basis that you have referred to in the past. Thanks.
TIM COOK: I think the SE is attracting two types of customers. One is customers that wanted the latest technology, but wanted it in a more compact package. And we clearly see even more people than we thought in that category.
Secondly, it's attracting people aspire to own an iPhone, but couldn't quite stretch to the entry price of the iPhone, and we've established a new entry. I think both of these markets are very, very important to us, and we are really excited about where it can take us. I do think that we will be really happy with the new to iPhone customers that we see from here, because of the early returns we've had. We are currently supply constrained, but we'll be able to work our way out of this at some point. But it's great to see the overwhelming demand for it. I will let Luca comment on the ASPs.
LUCA MAESTRI: On the ASPs, Gene we mentioned that we were going to be down sequentially, and this is really the combination of two factors. So when we go from the March quarter to the June quarter, is the fact that we are having the SE entering the mix, and that obviously is going to have a downward pressure on ASP, and also this channel inventory reduction that we have talked about, obviously the channel inventory reduction will come from higher-end models, and that is also affecting the sequential trend on ASPs.
The question on services, when we look at our services business, obviously growing very well across the board. The biggest element, and the part of the services business that is growing very well, we mentioned 35%, is the App Store. It's interesting for us that our music business, which had been declining for a number of quarters, now that we have both a download model and a streaming model, we have now hit an inflection point, and we believe that this would be the bottom, and we can start growing from there over time.
We have many other services businesses that are doing very well, we have an iCloud business that is growing very quickly. Faster than the App Store, from a much lower base but I think it's important for us as we continue to develop these businesses. Tim have talked about Apple Pay. It doesn't provide a meaningful financial contribution at this point, but as we look at the amount of transactions that go into Apple Pay right now, and we think ahead for the long-term, that could be an interesting business for us, as well.
From a profitability standpoint, we have mentioned last time that when you look at it on a gross basis, so in terms of purchase value of these services, the profitability of the business is similar to Company average. Of course, when you met out the amount that is paid to developers, and you look at it, in terms of what is reported in our P&L, obviously that business has a profitability that is higher than Company average. We don't get into the specifics of specific products or services, but it is very clear it is significantly higher than Company average.
GENE MUNSTER: Thank you.
NANCY PAXTON: Thanks, Gene. Could we have the next question please?
OPERATOR: Katy Huberty with Morgan Stanley.
KATY HUBERTY, ANALYST, MORGAN STANLEY: Yes, thank you. First for Luca. This is the worst gross margin guide in a year and a half or so, and over the last couple of quarters, you have talked about number of tailwinds including component cost, the lower accounting deferrals that went into effect in September. You just mentioned the services margins are above corporate average. So the question is, are some of those tailwinds winding down? Or is a significant guide down in gross margin for the June quarter entirely related to volume and the 5 SE? And then I have a follow-up for Tim.
LUCA MAESTRI: Katy, clearly the commodity environment remains quite favorable, and we continue to expect cost improvements. The other dynamics that you have mentioned are still there, obviously what is different, and particularly as we look at it on a sequential basis coming out of the March quarter, we would have loss of leverage, and that obviously is going to have a negative impact on margins. The other factor that's important to keep in mind is this different mix of products.
Particularly when you look at iPhone, what I was mentioning to Gene earlier, I think we've got a couple of things that are affecting not only ASPs, but obviously, they also affects margins. And it's the fact that we have a channel inventory reduction at the top end of the range, and we've got the introduction of the iPhone SE at the entry level of the range. And so when you take into account those factors, those are the big elements that drive our guidance range right now.
KATY HUBERTY: Okay. Thank you. And that a question for Tim, appreciate the optimism around longer-term iPhone unit growth, but with developed market penetration in anywhere from 60% to 80%, the growth is going to have to come from new markets. You talked about India. Could you just spend a little bit more time on that market? What are some of the hurdles you have to overcome, for that to be a larger part of the business? When we expect Apple to have more distribution, and specifically your own stores in that country? Thanks.
TIM COOK: Katy, in the short term, let me just make a couple of comments on the developed markets, just to make sure this is clear. If you look at our installed base of iPhone today versus two years ago, it's increased by 80%. When you think about upgrade cycles, upgrade cycles would have varying rates on it. As I talked about on the comments, the iPhone 6s rate, upgrade rate is slightly higher than the iPhone 5s, but lower than the iPhone 6.
But the other multiplier in that equation is obviously the size of the installed base. The net of the idea is that I think there's still really, really good business in the developed markets, so I wouldn't want to write those off. It's our job to come up with great products that people desire, and also to continue to attract over Android switchers. With our worldwide share there's still quite a bit of room in the developed markets, as well.
From an India point of view, if you look at India, and each country has a different story a bit, but the things that have held not only us back, perhaps, but some others as well, is that the LTE rollout with India just really begins this year. So we will begin to see some really good networks coming on in India. That will unleash the power and capability of the iPhone, in a way that an older network, 2.5G or even some 3G networks, would not do. The infrastructure is one key one, and the second one is building the channel out.
Unlike the US as an example, where the carriers in the US sell the vast majority of phones that are sold in the United States, in India, the carriers in general sell virtually no phones, and it is out in retail, and retail is many, many different small shops. We've been in the process. It's not something we just started in the last few weeks.
We've been working in India now for a couple of years or more, but we've been working with great energy over the last 18 months or so, and I am encouraged by the results that we're beginning to see there, and believe there's a lot, lot more there. It is already the third largest smart phone market in the world, but because the smart phones that are working there are low-end, primarily because of the network and the economics, the market potential has not been as great there. I view India as where China was maybe 7 to 10 years ago from that point of view. I think there's a really great opportunity there.
NANCY PAXTON: Thank you, Katy. Could we have the next question please?
OPERATOR: We will go to Toni Sacconaghi with Bernstein.
TONI SACCONAGHI, ANALYST, BERNSTEIN: I have one, and then a follow-up, as well. My sense is that you talked about adjusting for the changes in channel inventory, that you are guiding for relatively normal sequential growth. And I think if you do the math it's probably the same or perhaps a touch worse in terms of iPhone unit growth sequentially, relative to normal seasonality between fiscal Q2 and Q3. I guess the question is, given that you should be entering new markets and you should see pronounced elasticity from the SE device, why wouldn't we be seeing something that was dramatically above normal seasonal, in terms of iPhone revenues and units for this quarter?
Maybe you could push back on me, but I can't help thinking that when Apple introduced the iPad Mini in a similar move, to move down market, there was great growth for one quarter, and the iPad never grew again and margins and ASPs went down. It looks like you are introducing the SE, and at least on a sequential basis, you not calling for any uplift, even adjusting for channel inventory, and ASPs I presume will go down and certainly it's impacting gross margins as you've guided to. Could you respond to, A, why you're not seeing the elasticity, and B, is the analogy with the iPad mini completely misplaced?
TIM COOK: Toni, it's Tim. Let me see if I can address your question. The channel inventory reduction that Luca referred to, the vast, vast majority of that is in iPhone. That would affect the unit compare that you maybe thinking about. The iPhone SE, we are thrilled with the response that we've seen on it.
It is clear that there is a demand there, even much beyond what we thought, and so that is really why we have the constraint that we have. Do I think it will be like the iPad Mini? No, I don't think so. I don't see that.
I think the tablet market in general, one of the challenges with the tablet market is that the replacement cycle is materially different than in the smart phone market. As you probably know, we haven't had an issue in customer satisfaction on the iPad. It is incredibly high, and we haven't had an issue with usage of the iPad. The usage is incredibly high.
But the consumer behavior there is you tend to hold on for very long period of time, before an upgrade. We continue to be very optimistic on the iPad business, and as I have said in my remarks, we believe we are going to have the best compare for iPad revenue this quarter that we have quite some time. We will report back in July on that one, but I think iPhone has a particularly different kind of cycle to it than the tablet market.
TONI SACCONAGHI: Okay, and if I could follow-up, Tim. You alluded to replacement cycles and differences between the iPad and the iPhone. My sense was, when you were going through the iPhone 6 cycle, was that you had commented that the upgrade cycle was not materially different. I think your characterization was that it accelerated a bit in the US, but international had grown to be a bigger part of your business, and replacement cycles there were typically a little bit longer. I'm wondering if it was only a modest difference between the 5s and the 6, how big a difference are we really seeing in terms of replacement cycles across the last three generations, and maybe you could help us, if the replacement cycle was flat this year relative to what you saw last year, how different would your results have been this quarter in the first half?
TIM COOK: There's a lot there. Let me just say I don't recall saying the things that you said I said about the upgrade cycle, so let me get that out of the way. Now let me describe without the specific numbers, the iPhone 6s upgrade cycle that we have measured for the first half of this year, so the first six months of our fiscal year to be precise, is slightly better than the rate that we saw with the iPhone 5s two years ago, but it's lower than the iPhone 6. I don't mean just a hair lower, it's a lot lower.
Without giving you exact numbers, if we would have the same rate on 6s that we did 6, there would -- it will be time for a huge party. It would be a huge difference. The great news from my point of view is, I think we are strategically positioned very well, because we have announced the SE, we are attracting customers that we previously didn't attract. That's really great, and this tough compare eventually isn't the benchmark. The install base is up 80% over the last two years, and so all of those I think bode well, and the switcher comments I made earlier, I wouldn't underestimate that, because that's very important for us in every geography. Thanks for the question.
NANCY PAXTON: Thanks, Toni. Can we have the next question please?
OPERATOR: From Cross Research Group, we'll hear from Shannon Cross.
SHANNON CROSS, ANALYST, CROSS RESEARCH: I have a couple of questions. One, Tim, can you talk a bit about what's going on in China? The greater China revenue I think was down 26%. You did talk about mainland China, but if you could talk about some of the trends you're seeing there, and how you think it's playing out, and maybe your thoughts on SE adoption within China as well.
TIM COOK: Shannon, thanks for the question. If you take greater China, we include Taiwan, Hong Kong, and mainland China in the greater China segment that you see reported on your data sheet. The vast majority of the weakness in the greater China region sits in Hong Kong, and our perspective on that is, it's a combination of the Hong Kong dollar being pegged to the US dollar, and therefore it carries the burden of the strength of the US dollar, and that has driven tourism, international shopping and trading down significantly compared to what it was in the year ago.
If you look at mainland China, which is one that I am personally very focused on, we are down 11% in mainland China, on a reported basis. On a constant currency basis, we are only down 7%, and the way that we really look at the health or underlying demand is look at sell-through, and if you look at there, we were down 5%. Keep in mind that is down 5% on comp a year ago that was up 81%.
As I back up from this and look at the larger picture, I think China is not weak, as has been talked about. I see China as -- may not have the wind at our backs that we once did, but it's a lot more stable than what I think is the common view of it. We remain really optimistic on China. We opened seven stores there during the quarter.
We are now at 35. We will open 5 more this quarter to achieve 40, which we had talked about before. And the LTE adoption continues to rise there, but it's got a long way ahead of it. And so we continue to be really optimistic about it, and just would ask folks to look underneath the numbers at the details in them before concluding anything. Thanks for the question.
SHANNON CROSS: Thanks. My second question is with regard to OpEx leverage, or thinking about when I look at the revenue, your revenue is below our expectations but OpEx is pretty much in line. So how are you thinking about potential for leverage, cost containment, maybe when macro is bad and revenue is under pressure, and how are you juggling that versus the required investment you need to go forward?
LUCA MAESTRI: It is Luca. Of course, we think about it. We think about it a lot, and so when you look at our results, for example, our OpEx for the quarter, for the March quarter was up 10%, which is the lowest rate that you have seen in years. And when you look within OpEx, you actually see two different dynamics. You see continued significant investments in research and development, because we really believe that's the future of the Company.
We continue to invest in initiatives and projects ahead of revenue. We had a much broader portfolio that we used to have. We do much more in-house technology development than we used to do a few years ago, which we think is a great investment for us to make. And so that parts we didn't need to protect, and we want to continue to invest in the business, right?
And then when you look at our SG&A portion of OpEx for the March quarter, it was actually down slightly. So obviously we think about it, and of course we look at our revenue trends, and we take measures accordingly. When you look at the guidance that we provided for the June quarter, that 10% year-over-year increase that I mentioned to you for the March quarter goes down to a range of 7% to 9% up, and again, the focus is on making investments in Road and continuing to run SG&A extremely tightly, and in a very disciplined way.
As you know, our E2R, expense to revenue ratio, is around 10%. It's something that we are very proud of, it's a number that is incredibly competitive in our industry, and we want to continue to keep it that way. At the same time, we don't want to under-invest in the business.
SHANNON CROSS: Thank you.
NANCY PAXTON: Thank you, Shannon. Could we have the next question please?
OPERATOR: From UBS we hear from Steve Milunovich.
STEVE MILUNOVICH, ANALYST, UBS: Tim, I first wanted to ask you about services and how do you view services? You've obviously highlighted it the last two quarters. Do you view it going forward as a primary driver of earnings, or do you view it, and you mentioned platforms in terms of your operating systems, which I would agree with. In that scenario I would argue it's more a supporter of the ecosystem, and a supporter of the hardware margins over time, and therefore somewhat subservient to hardware. It's great that it's growing, but longer-term, I would view its role as more creating an ecosystem that supports the high margins on the hardware, as opposed to independently driving earnings. How do you think about it?
TIM COOK: The most important thing for us, Steve, is that we want to have a great customer experience, so overwhelmingly, the thing that drives us are to embark on services that help that, and become a part of the ecosystem. The reality is that in doing so, we have developed a very large and profitable business in the services area, and so we felt last quarter and working up to that, that we should pull back the curtain so that people could -- our investors could see the services business, both in terms of the scale of it, and the growth of it. As we said earlier, the purchase value of the installed base services grew by 27% during the quarter, which was an acceleration over the previous quarter, and the value of it hit -- was just shy of $10 billion. It's huge, and we felt it was important to spell that out.
STEVE MILUNOVICH: Okay, and then going back to the upgrades of the installed base, you have clearly mentioned that you've pulled forward some demand, which makes sense, but there does seem to be a lengthening of the upgrade cycle, particularly in the US. AT&T and Verizon have talked about that. Investors I think perceive that maybe the marginal improvements on the phone might be less currently, and could be less going forward. At the same time, I think you just announced that you can get the upgrade program online, which I guess potentially could shorten it. Do you believe that upgrade cycles are currently lengthening, and can continue to do so?
TIM COOK: What we've seen is that it depends on what you compare it to. If you compare to the 5s, what we are seeing is the upgrade rate today is slightly higher, or that there are more people upgrading, if you will, in a similar time period, in terms of a rate, than the 5s. But if you compare to 6, you would clearly arrive at the opposite conclusion. I think it depends on people's reference points, and we thought it very important in this call to be very clear and transparent about what we're seeing. I think in retrospect, you could look at it and say, maybe the appropriate measure is more to the 5s, and I think everybody intuitively thought that the upgrades were accelerated with the 6, and in retrospect, when you look at the periods, they clearly were.
STEVE MILUNOVICH: Thank you.
NANCY PAXTON: Thanks, Steve. Could we have our next question, please?
OPERATOR: We will go to Rod Hall with JPMorgan.
ROD HALL, ANALYST, JPMORGAN: Yes, thanks for fitting me in. I wanted to start with a general, more general question. I guess, Tim, this one is aimed at you. As you think about where you thought things were going to head last quarter, when you reported to us, and how it's changed this quarter, obviously it's kind of a disappointing demand environment. Can you just help us understand what maybe the top two or three things are that have changed? And so as we walk away from this, we understand what the differences are, and what the direction of change is? Then I have a follow-up.
TIM COOK: I think you're probably indirectly asking about our trough comment, if you will, from last quarter. And when we made that, we did not contemplate or comprehend that we were going to make a $2 billion-plus reduction in channel inventory during this quarter. So if you factor that in and look at true customer demand, which is the way that we look at internally, I think you'll find a much more reasonable comparison.
ROD HALL: Okay, great. Thank you. And then for my follow-up, I wanted to ask you about the tax situation a little bit. Treasury obviously has made some rule changes, and I wonder, maybe if Luca, you could comment on what the impact to Apple from those is, if anything? and Tim, maybe more broadly how you see the tax situation for Apple looking forward? Thanks.
LUCA MAESTRI: Yes, Rod, these are new regulations, and we are in the processing of assessing them. Frankly from first read, we don't anticipate that they are going to have any material impact on our tax situation. Some of them relate to inversion transactions, obviously that's not an issue for us. Some of them are around internal debt financing, which is not something that we use, so we don't expect any issue there.
As you know, we are the largest US taxpayer by a wide margin, and we already pay full US tax on all the profits from the sales that we make in the United States, so we don't expect them to have any impact on us on tax reform. I will let Tim continue to provide more color, but we've been strong advocates for comprehensive corporate tax reform in this country. We continue to do that. We think a reform of the tax code would have significant benefits for the entire US economy, and we remain optimistic that we are going to get to a point where we can see that tax reform enacted. At that point in time, of course, we would have much more flexibility around optimizing our capital structure, and around providing more return of capital to our investors.
TIM COOK: The only thing I would add, Rod, is I think there are a growing number of people in both parties that would like to see comprehensive reform, and so I'm optimistic that it will occur. It's just a matter of when and that's difficult to say. But I think most people do recognize that it is in the US's interest to do this.
ROD HALL: Great, thanks.
NANCY PAXTON: Thank you, Rod. A replay of today's call will be available for two weeks as a podcast on the iTunes Store, as webcast on Apple.com/investor and via telephone. And the numbers for the telephone replay are 888-203-1112, or 719-457-0820, and please enter confirmation code 7495552. These replays will be available by approximately 5:00 PM Pacific time today.
Members of the press with additional questions can contact Kristin Huguet at 408-974-2414, and financial analysts can contact Joan Hoover or me with additional questions. Joan is at 408-974-4570, and I am at 408-974-5420. Thanks again for joining us.
OPERATOR: Ladies and gentlemen, that does conclude today's presentation. We do thank everyone for your participation.
[Thomson Financial reserves the right to make changes to documents, content, or other information on this web site without obligation to notify any person of such changes.
In the conference calls upon which Event Transcripts are based, companies may make projections or other forward-looking statements regarding a variety of items. Such forward-looking statements are based upon current expectations and involve risks and uncertainties. Actual results may differ materially from those stated in any forward-looking statement based on a number of important factors and risks, which are more specifically identified in the companies' most recent SEC filings. Although the companies may indicate and believe that the assumptions underlying the forward-looking statements are reasonable, any of the assumptions could prove inaccurate or incorrect and, therefore, there can be no assurance that the results contemplated in the forward-looking statements will be realized.
THE INFORMATION CONTAINED IN EVENT TRANSCRIPTS IS A TEXTUAL REPRESENTATION OF THE APPLICABLE COMPANY'S CONFERENCE CALL AND WHILE EFFORTS ARE MADE TO PROVIDE AN ACCURATE TRANSCRIPTION, THERE MAY BE MATERIAL ERRORS, OMISSIONS, OR INACCURACIES IN THE REPORTING OF THE SUBSTANCE OF THE CONFERENCE CALLS. IN NO WAY DOES THOMSON FINANCIAL OR THE APPLICABLE COMPANY OR THE APPLICABLE COMPANY ASSUME ANY RESPONSIBILITY FOR ANY INVESTMENT OR OTHER DECISIONS MADE BASED UPON THE INFORMATION PROVIDED ON THIS WEB SITE OR IN ANY EVENT TRANSCRIPT. USERS ARE ADVISED TO REVIEW THE APPLICABLE COMPANY'S CONFERENCE CALL ITSELF AND THE APPLICABLE COMPANY'S SEC FILINGS BEFORE MAKING ANY INVESTMENT OR OTHER DECISIONS.]
LOAD-DATE: April 29, 2016
LANGUAGE: ENGLISH
TRANSCRIPT: 042616a5987433.733
PUBLICATION-TYPE: Transcript
Copyright 2016 CQ-Roll Call, Inc.
All Rights Reserved
Copyright 2016 CCBN, Inc.
4 of 9 DOCUMENTS
FD (Fair Disclosure) Wire
January 26, 2016 Tuesday
Event Brief of Q1 2016 Apple Inc Earnings Call - Final
and the output I am expecting is everthing between 'Q2 2016 Apple Inc Earnings Call - Final' and 'Event Brief of Q1 2016 Apple Inc Earnings Call - Final' in a text file.
Did your try converting your tuple into a string and writing to the file?
s = str('XYZ', 'Q2 2016 Apple Inc Earnings Call - Final', 'Event Brief of Q1 2016 Apple Inc Earnings Call - Final')
file.write(s)

Extracting text between two strings

I am looking to extract the text between two patterns in my text file here is my text:
Q2 2016 Apple Inc Earnings Call - Final
ONI SACCONAGHI, ANALYST, BERNSTEIN: I have one, and then a follow-up, as well. My sense is that you talked about adjusting for the changes in channel inventory, that you are guiding for relatively normal sequential growth. And I think if you do the math it's probably the same or perhaps a touch worse in terms of iPhone unit growth sequentially, relative to normal seasonality between fiscal Q2 and Q3. I guess the question is, given that you should be entering new markets and you should see pronounced elasticity from the SE device, why wouldn't we be seeing something that was dramatically above normal seasonal, in terms of iPhone revenues and units for this quarter?
Maybe you could push back on me, but I can't help thinking that when Apple introduced the iPad Mini in a similar move, to move down market, there was great growth for one quarter, and the iPad never grew again and margins and ASPs went down. It looks like you are introducing the SE, and at least on a sequential basis, you not calling for any uplift, even adjusting for channel inventory, and ASPs I presume will go down and certainly it's impacting gross margins as you've guided to. Could you respond to, A, why you're not seeing the elasticity, and B, is the analogy with the iPad mini completely misplaced?
TIM COOK: Toni, it's Tim. Let me see if I can address your question. The channel inventory reduction that Luca referred to, the vast, vast majority of that is in iPhone. That would affect the unit compare that you maybe thinking about. The iPhone SE, we are thrilled with the response that we've seen on it.
It is clear that there is a demand there, even much beyond what we thought, and so that is really why we have the constraint that we have. Do I think it will be like the iPad Mini? No, I don't think so. I don't see that.
I think the tablet market in general, one of the challenges with the tablet market is that the replacement cycle is materially different than in the smart phone market. As you probably know, we haven't had an issue in customer satisfaction on the iPad. It is incredibly high, and we haven't had an issue with usage of the iPad. The usage is incredibly high.
But the consumer behavior there is you tend to hold on for very long period of time, before an upgrade. We continue to be very optimistic on the iPad business, and as I have said in my remarks, we believe we are going to have the best compare for iPad revenue this quarter that we have quite some time. We will report back in July on that one, but I think iPhone has a particularly different kind of cycle to it than the tablet market.
TONI SACCONAGHI: Okay, and if I could follow-up, Tim. You alluded to replacement cycles and differences between the iPad and the iPhone. My sense was, when you were going through the iPhone 6 cycle, was that you had commented that the upgrade cycle was not materially different. I think your characterization was that it accelerated a bit in the US, but international had grown to be a bigger part of your business, and replacement cycles there were typically a little bit longer. I'm wondering if it was only a modest difference between the 5s and the 6, how big a difference are we really seeing in terms of replacement cycles across the last three generations, and maybe you could help us, if the replacement cycle was flat this year relative to what you saw last year, how different would your results have been this quarter in the first half?
TIM COOK: There's a lot there. Let me just say I don't recall saying the things that you said I said about the upgrade cycle, so let me get that out of the way. Now let me describe without the specific numbers, the iPhone 6s upgrade cycle that we have measured for the first half of this year, so the first six months of our fiscal year to be precise, is slightly better than the rate that we saw with the iPhone 5s two years ago, but it's lower than the iPhone 6. I don't mean just a hair lower, it's a lot lower.
Without giving you exact numbers, if we would have the same rate on 6s that we did 6, there would -- it will be time for a huge party. It would be a huge difference. The great news from my point of view is, I think we are strategically positioned very well, because we have announced the SE, we are attracting customers that we previously didn't attract. That's really great, and this tough compare eventually isn't the benchmark. The install base is up 80% over the last two years, and so all of those I think bode well, and the switcher comments I made earlier, I wouldn't underestimate that, because that's very important for us in every geography. Thanks for the question.
Q3 2016 Apple Inc Earnings Call - Final
I think the tablet market in general, one of the challenges with the tablet market is that the replacement cycle is materially different than in the smart phone market. As you probably know, we haven't had an issue in customer satisfaction on the iPad. It is incredibly high, and we haven't had an issue with usage of the iPad. The usage is incredibly high.
But the consumer behavior there is you tend to hold on for very long period of time, before an upgrade. We continue to be very optimistic on the iPad business, and as I have said in my remarks, we believe we are going to have the best compare for iPad revenue this quarter that we have quite some time. We will report back in July on that one, but I think iPhone has a particularly different kind of cycle to it than the tablet market.
TONI SACCONAGHI: Okay, and if I could follow-up, Tim. You alluded to replacement cycles and differences between the iPad and the iPhone. My sense was, when you were going through the iPhone 6 cycle, was that you had commented that the upgrade cycle was not materially different. I think your characterization was that it accelerated a bit in the US, but international had grown to be a bigger part of your business, and replacement cycles there were typically a little bit longer. I'm wondering if it was only a modest difference between the 5s and the 6, how big a difference are we really seeing in terms of replacement cycles across the last three generations, and maybe you could help us, if the replacement cycle was flat this year relative to what you saw last year, how different would your results have been this quarter in the first half?
TIM COOK: There's a lot there. Let me just say I don't recall saying the things that you said I said about the upgrade cycle, so let me get that out of the way. Now let me describe without the specific numbers, the iPhone 6s upgrade cycle that we have measured for the first half of this year, so the first six months of our fiscal year to be precise, is slightly better than the rate that we saw with the iPhone 5s two years ago, but it's lower than the iPhone 6. I don't mean just a hair lower, it's a lot lower.
Without giving you exact numbers, if we would have the same rate on 6s that we did 6, there would -- it will be time for a huge party. It would be a huge difference. The great news from my point of view is, I think we are strategically positioned very well, because we have announced the SE, we are attracting customers that we previously didn't attract. That's really great, and this tough compare eventually isn't the benchmark. The install base is up 80% over the last two years, and so all of those I think bode well, and the switcher comments I made earlier, I wouldn't underestimate that, because that's very important for us in every geography. Thanks for the question.
Q4 2016 Apple Inc Earnings Call - Final
I am looking to extract the text between 'Q2 2016 Apple Inc Earnings Call - Final' and 'Q3 2016 Apple Inc Earnings Call - Final'
and extract text between 'Q3 2016 Apple Inc Earnings Call - Final' and 'Q4 2016 Apple Inc Earnings Call - Final' and print or write them to a text file. I would really appreciate any help with this.
Before asking next time, make sure to look at the docs, go through a tutorial, take a look at other stackoverflow questions, and if you still can't find it- make sure to post what you've already tried.
Now, to give you an answer. Assuming you have the full string as the variable text
import re
target1 = 'Q2 2016 Apple Inc Earnings Call - Final'
target2 = 'Q3 2016 Apple Inc Earnings Call - Final'
regxp = '{}(.+?){}'.format(target1,target2)
pattern = re.compile(regxp, flags=re.DOTALL)
results = pattern.findall(text)
results

How to add punctuation to text using python?

I am playing with the IBM Watson Speech To Text Service API. For those who do not know this service is being used to transcribe audio. You upload an audio file to the service and it returns back the text. The service has been good so far, but the problem is that that returned text does not contain punctuation marks. I tried solving this problem with nltk, but no results.
Some nltk code I have tried.
#string is the text
string = """Hey guys today I'm gonna show you how to make bulletproof coffee so free guys and never heard a blooper coffee it's been around since like two thousand two been around for awhile but i think it's been a lot more popular now probably within the last maybe year two years so for me I recently just started doing bulletproof coffee say about a couple so basically regular coffee you know you get your regular coffee and then you put cream and you put sugar mix it up to get this nice little creamy sweet taste so bulletproof coffee the differences as they're putting cream and sugar your you play coffee you put butter grass fed butter and coconut oil so the difference is instead of using sugar you're using and healthy fats and the point of that is it's really good for if you're asked about the cato jenneke dieter basically low carb diet last you won't do is load up your coffee with how much sugar and roads are low carb diet and also a lot of people actually use bulletproof coffee as a way to replace their practice now i'm not saying to do that because albion is as of right now I still you might break I drink my coffee and at all times a wreck this but again though I do eat a lot so home a lot more than a lot of normal people so keep this in mind okay so this is how the rescue workers the book per copy you want to go ahead and put two cups of coffee so it's a here is we have one cup put it into our blender the trick here is the blending park you don't blend it doesn't come out right I use it not blended and then that coconut oil and the butter cut just floats on top when you got this oily coffee taste not the not the best now two cups of coffee you put two teaspoons of butter you wanna go and grass fed butter so let's say this is one teaspoon here yeah I just gonna estimated this will be two teaspoons every go two teaspoons of butter and how it I guess and crazy I remember doing this as a man sir put butter in there but it actually doesn't and it's really really good then you put about one to two teaspoons of cooking oil so for me i'm going to go with so here's about probably about one tablespoon and here's to get there you go throw on the cat and so not only is a good to be honest i think on a local wow it works out really well sugar is my car but when it comes to drinking the i can't really put any pre sugar sugar so what i've basically been doing a lot and you know black take the paper personally i can enjoy the cream and sugar so i thought this to be a really good replacement it makes is reading the taste and i think it's leann when i used even though i'm not sugar at all so it tastes good good for you and it's a low carb no take a look got a little foam on the top uh man fiasco we smell this now the butter coconut so check this out you get this nice little cream uh this is a taste test so smooth so creamy and i never put any sugar and a lot of people come in and play like go you know so we can love or something the v. f. or something the person for me albeit like this is perfect you know it's creamy and it's not bitter like black coffee plus it fits nicely with a low carb diet so a lot people drink this morning they replace directors because this does have a lot of calories so here's something to look out for them he noticed two teaspoons of butter want to tease bozo coconut oil a lot of calories just give an example one tablespoon of coconut oil has but a hundred and thirty calories which means two of them it's like two hundred sixty calories the butter is likely to borrow from that so you're looking at sort of ballpark a four hundred calories plus so that's what a lot of people drink this as replacement for their breakfast and with the good fats they give them the energy that they need to go throughout the day and also because that you watched another video talk about the difference we fat and carbs are sugar that exhort slower and so the energy lasts longer it doesn't get any sort of super fast and gets burnt off versus when you let's say drink sugary coffee you know we put sugar and cream in your coffee that sugar gets exhort super fast so you get this spike in energy up in the morning and then it starts crashing down but i noticed drinking this you don't gain topic crashes because it's back they said sugar so exhort slowly and kinda sustains so this is how i make my coffee i love it i suggest give it a try guys especially if you're one of coffee drinkers replacing your cream in your sugar with some bulletproof coffee you try some coconut oil and grass fed butter so i tried our guys see i like more workouts mortician how to work out the ads the most is that you want sixpack shortcuts dot com piece"""
import nltk
n = nltk.tokenize.punkt.PunktSentenceTokenizer()
n.sentences_from_text(string)
Is there any way to solve my problem? What would you do if you were in my case?
Unless you've got some serious machine learning skills, you'll have to write your own "rules" to handle this. Rules are must easier for this use case to get started.
A sentence must adhere to specific "grammar rules". These rules are pretty basic hence the reason they can be taught in primary school. Problem is most people don't follow grammar rules when writing or speaking. You'll first need to focus on text that follows the rules. Based on this premise here is what I would do
Run NLTK POS Tagger on string (once we know the Parts of Speech we can code rules)
Identify tokens whose part of speech breaks the grammar rules of a sentence (sentences shouldn't end in a preposition)
Identify tokens that a sentence could end with based on grammar rules (Nouns are a great start)
Find a large well punctuated grammatically correct corpora of text and remove the punctuation.
Try your "new rules" out on adding punctuation to the corpora. Adjust your rules. Rinse and Repeat
It's 2022 and huggingface has some easy-to-install machine learning solutions for adding punctuation back into transcripts:
https://huggingface.co/felflare/bert-restore-punctuation
Example:
from rpunct import RestorePuncts
# The default language is 'english'
rpunct = RestorePuncts(use_cuda=False)
rpunct.punctuate("""in 2018 cornell researchers built a high-powered detector that in combination with an algorithm-driven process called ptychography set a world record
by tripling the resolution of a state-of-the-art electron microscope as successful as it was that approach had a weakness it only worked with ultrathin samples that were
a few atoms thick anything thicker would cause the electrons to scatter in ways that could not be disentangled now a team again led by david muller the samuel b eckert
professor of engineering has bested its own record by a factor of two with an electron microscope pixel array detector empad that incorporates even more sophisticated
3d reconstruction algorithms the resolution is so fine-tuned the only blurring that remains is the thermal jiggling of the atoms themselves""")
This model was trained on 560,000 yelp sentences and has 90% accuracy.
install note
As of Sep 2022 the rpunct package has a bug where it won't run without GPUs, but there's a patched fork that supports the use_cuda=False kwarg in my example (slower but works on all processors). To install this fork instead, do this:
pip3 install git+https://github.com/ernie-mlg/rpunct.git
option 2 (better)
This is yet another huggingface model with similar accuracy and installed correctly the first time
https://huggingface.co/oliverguhr/fullstop-punctuation-multilang-large
pip install deepmultilingualpunctuation
usage:
>>> from deepmultilingualpunctuation import PunctuationModel
>>> model = PunctuationModel()
Downloading config.json: 100%|█████████████████████████████████████████████████████████████████| 892/892 [00:00<00:00, 335kB/s]
Downloading pytorch_model.bin: 100%|██████████████████████████████████████████████████████| 2.08G/2.08G [04:54<00:00, 7.60MB/s]
Downloading tokenizer_config.json: 100%|███████████████████████████████████████████████████████| 406/406 [00:00<00:00, 216kB/s]
Downloading sentencepiece.bpe.model: 100%|████████████████████████████████████████████████| 4.83M/4.83M [00:00<00:00, 8.08MB/s]
Downloading special_tokens_map.json: 100%|█████████████████████████████████████████████████████| 239/239 [00:00<00:00, 158kB/s]
/opt/anaconda3/envs/punct/lib/python3.9/site-packages/transformers/pipelines/token_classification.py:135: UserWarning: `grouped_entities` is deprecated and will be removed in version v5.0.0, defaulted to `aggregation_strategy="none"` instead.
warnings.warn(
>>> text = "My name is Clara and I live in Berkeley California Ist das eine Frage Frau Müller"
>>> result = model.restore_punctuation(text)
>>> print(result)
My name is Clara and I live in Berkeley, California. Ist das eine Frage, Frau Müller?
I verified this actually worked out-of-the-box in a fresh conda env above. This 2nd model also auto-detects and supports 4 languages.
You could leverage Punctuator2 Python library to either train your own model to detect punctuation or use a pretrained one.

Categories

Resources