I am looking for a way in python to input a phone number and get a caller's carrier. I am looking for a free and simple way, I have used TELNYX and it returns CELLCO PARTNERSHIP DBA VERIZON instead of just simply 'verizon' which does not work for me. I have tried Twilio as well and it has not worked for me. Has anyone found success doing this? Thanks in advance. Code for the TELNYX:
def getcarrier(number):
url = 'https://api.telnyx.com/v1/phone_number/1' + number
html = requests.get(url).text
data = json.loads(html)
data = data["carrier"]
print(data["name"])
global carrier
What I have done in the past is to isolate the number prefix. And match against the prefix database available HERE. I did this only for my own country (Bangladesh), so it was a relatively easy code (just a series of if/else). So to work for any number I believe you'll need to consider the country code as well.
You can do it in two ways.
One. Having the data locally stored as CSV from the Wikipedia page. (scraping the page should be easy to do). And then use panda or similar CSV handling package to use it as the database of your program.
Or, two, you can write a small program that scrapes the page on demand and find the operator then.
Good Luck.
Related
Currently I am working on a project that will scrape content from various similarly designed websites which contain dynamic content. My end goal is to then aggregate all this data into one application or report of sorts. I made some progress in terms of pulling the needed data from one page but my lack of experience and knowledge in this realm has left me thinking I went down the wrong path.
https://dutchie.com/embedded-menu/revolutionary-clinics-somerville/menu
The above link is the perfect example of the type of page I will be pulling from.
In my initial attempt I was able to have the page scroll to the bottom all the while collecting data from the various elements using, plus the manual scroll.
cards = driver.find_elements_by_css_selector("div[class^='product-card__Content']")
This allowed me to on the fly pull all the data points I needed, minus the overarching category, which happens to be a parent element, this is something I can map manually in excel, but would prefer to be able to have it pulled alongside everything else.
This got me thinking that maybe I should have taken a top down approach, rathen then what I am seeing as a bottom up approach? But no matter how hard I try based on advice on others I could not get it working as intended where I can pull the category from the parent div due to my lack of understanding.
Based on input of others I was able to make a pivot of sorts and using the below code, I was able to get the category as well as the product name, without any need to scroll the page at all, which went against every experience I have had with this project so far - I am unclear how/why this is possible.
for product_group_name in driver.find_elements_by_css_selector("div[class^='products-grid__ProductGroupTitle']"):
for product in driver.find_elements_by_xpath("//div[starts-with(#class,'products-grid__ProductGroup')][./div[starts-with(#class,'products-grid__ProductGroupTitle')][text()='" + product_group_name.text + "']]//div[starts-with(#class,'consumer-product-card__InViewContainer')]"):
print (product_group_name.text, product.text)
The problem with this code, which is much quicker as it does not rely on scrolling, is that no matter how I approach it I am unable to pull the additional data points of brand and price. Obviously it is something in my approach, but outside of my knowledge level currently.
Any specific or general advice would be appreciated as I would like to scale this into something a bit more robust as my knowledge builds, I would like to be able to have this scanning multiple different URLS at set points in the day, long way away from this but I want to make sure I start on the right path if possible. Based off what I have provided, is the top down approach better in this case? Bottom up? Is this subjective?
I have noticed comments about pulling the entire source code of the page and working with that, would that be a valid approach and possibly better suited to my needs? Would it even be possible based on the dynamic nature of the page?
Thank you.
I want to access a website with the form of example.com/<num>-<num>.html
but I don't know the exact number in the url.
the numbers could go from 0 to 10000 or more.
I wrote a small python script to do this, but I feel it is very slow.
Are there some existing tools that could do the job?
for n in range(0,10000):
print("n",n)
for m in range(0,10000):
r = url+str(n)+str('-')+str(m)+str('.html')
html = requests.get(r,headers=headers)
try:
html.raise_for_status()
except requests.exceptions.HTTPError:
continue
#print(r+" doesn't exist")
print(r)
also this code neglect the possibility of strings like 0012, which kind of bad
I think you should make a folder containing the code of the thing you want to make.
In order to have the numbers, you have to name the folder that number. I commonly do this with websites with certain sub-pages. But if this is like a social media site with numbered labeled posts, I don't know what to do.
I'm working on a python webscraper that pulls data from a car advertising site. I got the scraping part all done with beatifoulsoup but I've ran into many difficulties trying to store and modify it. I would really appreciate some advice on this part since I'm a lacking knowledge on this part.
So here is what I want to do:
Scrape the data each hour (done).
Store scraped data as a dictionary in a .JSON file (done).
Everytime the ad_link not found in the scraped_data.json set it to dict['Status'] = 'Inactive' (done).
If a cars price changes , print notification + add old price to dictionary. On this part I came across many challenges with the .JSON way.
I've kept using 2 .json files and comparing them to each other (scraped_data_temp , permanent_data.json) but I think this is by far not the best method.
What would you guys suggest? How should I do this? .
What would be the best way to approach manipulating this kind of data ? (Databases maybe? - got no experince with them but I'm eager to learn) and what would be a good way to represent this kind of data, pygal?
Thank you very much.
If you have larger data, I would definitely recommend using some kind of DB. If you don't have the need to use DB server, you can use sqlite. I have used it in the past to save bigger data locally. You can use sqlalchemy in python to interact with DB-s.
As for displaying data, I tend to use matplotlib. It's extremely flexible, has extensive documentation and examples, so you can adjust data to you linking, graphs, charts, etc.
I'm assuming that you are using python3.
There is a website that claims to predict the approximate salary of an individual on the basis of the following criteria presented in the form of individual drop-down
Age : 5 options
Education : 3 Options
Sex : 3 Options
Work Experience : 4 Options
Nationality: 12 Options
On clicking the Submit button, the website gives a bunch of text as output on a new page with an estimate of the salary in numerals.
So, there are technically 5*3*3*4*12 = 2160 data points. I want to get that and arrange it in an excel sheet. Then I would run a regression algorithm to guess the function this website has used. This is what I am looking forward to achieve through this exercise. This is entirely for learning purposes since I'm keen on learning these tools.
But I don't know how to go about it? Any relevant tutorial, documentation, guide would help! I am programming in python and I'd love to use it to achieve this task!
Thanks!
If you are uncomfortable asking them for database as roganjosh suggested :) use Selenium. Write in Python a script that controls Web Driver and repeatedly sends requests to all possible combinations. The script is pretty simple, just a nested loop for each type of parameter/drop down.
If you are sure that value of each type do not depend on each other, check what request is sent to the server. If it is simple URL encoded, like age=...&sex=...&..., then Selenium is not needed. Just generate such URLa for all possible combinations and call the server.
How to generate a random yet valid website link, regardless of languages. Actually, the more diverse the language of the website it generates, the better it is.
I've been doing it by using other people's script on their webpage, how can i not rely on these random site forwarding script and make my own?. I've been doing it as such:
import webbrowser
from random import choice
random_page_generator = ['http://www.randomwebsite.com/cgi-bin/random.pl',
'http://www.uroulette.com/visit']
webbrowser.open(choice(random_page_generator), new=2)
I've been doing it by using other people's script on their webpage, how can i not rely on these random site forwarding script and make my own?
There are two ways to do this:
Create your own spider that amasses a huge collection of websites, and pick from that collection.
Access some pre-existing collection of websites, and pick from that collection. For example, DMOZ/ODP lets you download their entire database;* Google used to have a customized random site URL;** etc.
There is no other way around it (short of randomly generating and testing valid strings of arbitrary characters, which would be a ridiculously bad idea).
Building a web spider for yourself can be a fun project. Link-driven scraping libraries like Scrapy can do a lot of the grunt work for you, leaving you to write the part you care about.
* Note that ODP is a pretty small database compared to something like Google's or Yahoo's, because it's primarily a human-edited collection of significant websites rather than an auto-generated collection of everything anyone has put on the web.
** Google's random site feature was driven by both popularity and your own search history. However, by feeding it an empty search history, you could remove that part of the equation. Anyway, I don't think it exists anymore.
A conceptual explanation, not a code one.
Their scripts are likely very large and comprehensive. If it's a random website selector, they have a huge, huge list of websites line by line, and the script just picks one. If it's a random URL generator, it probably generates a string of letters (e.g. "asljasldjkns"), plugs it between http:// and .com, tries to see if it is a valid URL, and if it is, sends you that URL.
The easiest way to design your own might be to ask to have a look at theirs, though I'm not certain of the success you'd have there.
The best way as a programmer is simply to decipher the nature of URL language. Practice the building of strings and testing them, or compile a huge database of them yourself.
As a hybridization, you might try building two things. One script that, while you're away, searches for/tests URLs and adds them to a database. Another script that randomly selects a line out of this database to send you on your way. The longer you run the first, the better the second becomes.
EDIT: Do Abarnert's thing about spiders, that's much better than my answer.
The other answers suggest building large databases of URL, there is another method which I've used in the past and documented here:
http://41j.com/blog/2011/10/find-a-random-webserver-using-libcurl/
Which is to create a random IP address and then try and grab a site from port 80 of that address. This method is not perfect with modern virtual hosted sites, and of course only fetches the top page but it can be an easy and effective way of getting random sites. The code linked above is C but it should be easily callable from python, or the method could be easily adapted to python.