I have a python script which searches a web page for information. currently I add the search term as a parameter when i run my program 'myscript.py searchterm'.
What I would like to do is to have a file with my search terms in, get my script to loop through each one in turn on its own.
so I would populate my list from a file like this....
with open('mylist.txt', 'r') as f:
searchterms = f.readlines()
I already have my code which looks something like this...just to give you an idea of layout...
counter = 0
try:
while counter <10:
#do some other stuff here
counter=counter+10
except IOError:
print "No result found!"+""
I need to wrap this in another loop to do this for every item in my list and I'm failing.
I know I need to reset the counter if it gets to 10, move onto my next list item and loop through the above but I don't know how.
I find the python docs difficult to understand and I would appreciate a little help.
TIA
If you have a list of search terms, you can easily loop through them and pass them to your existing loop like:
for searchterm in searchterms:
counter = 0
try:
while counter <10:
#do some other stuff here with searchterm
counter=counter+10
except IOError:
print "No result found!"+""
Be sure that your counter correctly resets at the beginning of each search term's loop.
Related
Hi all First time having to look for assistance but i am sort of at a brick wall for now. i have been learning python since August and i have been giving a challenge to complete for the end of Novemeber and i hope that there could be some help in making my code works. My task requires to find an ip address which occurs most frequent and count the number of times it appears also this information must be displayed to the user i have been giving 4 files .txt that have the ips. I am also required to make use of non trivial data structures and built in python sorting and/or searching functionalities, make use of functions, parameter passing and return values in the program. Below is a sample data structure they have recommended that i use: -
`enter code here`
def analyse_logs(parameter):
# Your Code Hear
return something
def extract_ip(parameter):
# Your Code Hear
return something
def find_most_frequent(parameter):
# Your Code Hear
return something
# Test Program
def main():
# Your Code Hear
# Call Test Program
main()
And below hear is what i have came up with and the code is completley differant from the sample that has been provided but what i have done dosnt give me output straight back instead creats a new text file which has been sorted but now what i am looking for: -
enter code here
def sorting(filename):
infile = open(filename)
ip_addr = []
for line in infile:
temp = line.split()
for i in temp:
ip_addr.append(i)
infile.close()
ip_addr.sort()
outfile = open("result.txt", "w")
for i in ip_addr:
outfile.writelines(i)
outfile.writelines(" ")
outfile.close()
sorting("sample_log_1.txt")e here
The code that i have created has sorted everything thats in the .txt file and outputs the most frequent that has been used all the way to the lest frequent. All i am look for is for an algorithim that can sort through the .txt file, find the IP address thats more frequent then print that ip out and how many times it appears. I hope i have provided everything and i am sure this is probally somthing very basic but i just cant get my head round it.
You should keep the number of times the IP addresses are repeated in a variable. You can use dictionary.
ip_count_dict = {"IP1": repeat_count, "IP2": repeat_count}
When first time you find a IP in your list set repeat_count 1 and after that if you find same ip again just increase counter.
For example,
ip_count_dict = {}
ip_list = ['1.1.1.1','1.1.1.2','1.1.1.3','1.1.1.1']
#Loop and count ips
#Final version of ip_count_dict {'1.1.1.1':2 , '1.1.1.2':1, '1.1.1.3':1}
With this dictionary you can store all ips and sort by their value.
P.S.: Dictionary keeps key,value pairs you can search "sort dictionary by value" after all counting thing done.
I have a list and while I constantly append element to it, I want to check that it isn't empty and get element at the same time. Normally we wait all the elements append to the list and then we get element from list do something. In this case, we lose some time for waiting all elements add to the list. What knowledge do I need to acquire to make this happen (Multiprocessing, multiprocessing.dummy, asynchronous) ,sorry, I am still new for doing this, I think it's better for me to explain to you why I want to achieve this kind of effect,this problem came from a web crawler
import requests
from model import Document
def add_concrete_content(input_list):
"""input_list data structure [{'url': 'xxx', 'title': 'xxx'}...]"""
for e in input_list:
r = requests.get(e['url'])
html = r.content
e['html'] = html
return input_list
def save(input_list):
for e in input_list:
Document.create(**e)
if __name__ == '__main__':
res = add_concrete_content(list)
save(res)
"""this is I normally do, I save data to mysql or whatever database,but
I think the drawback is I have to wait all the html add to dict and then
save to database, what if I have to deal with tons of data? Can I save
the dict with the html first? Can I save some time? A friend of mine
said this is a typical producer consumer problem, probably gonna use
at least two threads and lock, because without lock, data probably
gonna fall into disorder"""
You're being vague. And I think there's a misconception in the way you want things to happen.
You don't need any extra python rock-science to do what you want:
you can check if the list is empty by simply doing: if list_: (where list_ is your list)
you can verify any element by using list_[idx] (where idx is the index of the element). For example, list_[0] will get you the first element of the list, while list_[-1], the last one.
That said, you don't have to wait for all the elements to be added to the list if you need to process them on the go. You might look for something like this:
def push(list_):
count = 0
while True:
list_.append(count)
f()
count += 1
if count == 1000:
break
def f():
print('First element: {}'.format(list_[0]))
print('Last element: {}'.format(list_[-1]))
if __name__ == '__main__':
list_ = []
push(list_)
I am a Python 2.7 (and programming) beginner. Also, this is my first question on SO.
I am currently building a little shopping cart application. I am storing shopping cart values in a txt file. If the user running the script has already a cart saved in the txt file from a previous session, I want the program to make the offer to continue with those items in cart.
Here is what I have so far:
my_file = open("cart.txt", "r")
match = re.findall(r'%s' % enter_name, my_file.read())
my_file.close()
#using a regular expression here to find the person's cart by name
#in cart.txt.
if match:
print "Hello %s, welcome back." % enter_name
print "Do you want to use your previous Shopping Cart?"
continue_cart = raw_input("Type 'yes' or 'no' ")
if continue_cart == "yes":
with open("cart.txt", "r") as my_file:
for line in my_file:
line_list = line.split()
print line_list
#to be continued
The final print statement is basically only a placeholder and a way for me to see what the happens. I am kinda stuck there. The code with the for loop at the end basically produces various lines with a list each. Like this
['Martin']
['Milk', '2']
['Apple', '4']
What I want to do is to "access" the values representing products & quantity in order to then add them to the shopping card class. But how to process this list in order to be able to access the values?
Any help is appreciated. Thanks.
As e4c5 said by not utilizing more advanced concepts (a database, or just using a simple json or pickle) you are making it much harder for yourself.
If you want to use only plain text files directly to store and load values not only it's much harder to write the code (honestly, i've been programming for years and I still would not be confident I got that right!) but also for everyone (including yourself in the future) to read the code. And maintainability is really - from experience - the holly grail of programming. That's why we use python after all!
That said, I understand, you are not trying not build a perfect solution but to learn programming!
That said, your cart.txt probably looks something like this:
Martin
Milk 2
Apple 4
Eva
Milk 1
Then your for loop would look like this (for example, but as I said, this is not a right approach and I it is very fragile and buggy!):
enter_name_found = False
cart_items = {}
for line in my_file:
line_list = line.split()
if len(line_list) == 1:
if line_list[0] == enter_name:
enter_name_found = True
else:
if enter_name_found:
break
else:
if enter_name_found:
cart_items[line_list[0]] = line_list[1]
print(cart_items)
EDIT:
To demonstrate how much simpler this all is using a database:
Here is a full example to save a cart:
import shelve
cart = shelve.open("cart.dat")
cart["Martin"] = {"Apple": 5, "Milk": 2}
cart["Eva"] = {"Milk": 1, "Cat": 4}
cart.close()
And full example to load the cart from the DB:
import shelve
cart = shelve.open("cart.dat")
print(cart["Martin"])
print(cart["Eva"])
What is the easiest way to loop through a series URLs until there are no more results returned?
If the number of URLs is fixed e.g 9, something like the following code would work
for i in range(1,10):
print('http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page='+ str(i)+'&sort_order=default ')
However, the number of URLs is dynamic, and I get a page saying "Sorry, there are currently no listings in this category." when I overshoot. Example below.
http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page=10&sort_order=default
What is the easiest way to only return pages with results?
Cheers
Steve
# count is an iterator that just keeps going
# from itertools import count
# but I'm not going to use it, because you want to set a reasonable limit
# otherwise you'll loop endlessly if your end condition fails
# requests is third party but generally better than the standard libs
import requests
base_url = 'http://www.trademe.co.nz/browse/categorylistings.aspx?v=list&rptpath=4-380-50-7145-&mcatpath=sports%2fcycling%2fmountain-bikes%2ffull-suspension&page={}&sort_order=default'
for i in range(1, 30):
result = requests.get(base_url.format(i))
if result.status_code != 200:
break
content = result.content.decode('utf-8')
# Note, this is actually quite fragile
# For example, they have 2 spaces between 'no' and 'listings'
# so looking for 'no listings' would break
# for a more robust solution be more clever.
if 'Sorry, there are currently no' in content:
break
# do stuff with your content here
print(i)
I have a text file with some content. I need to search this content frequently. I have the following two options, which one is the best (by means of faster execution) ?
METHOD 1:
def search_list(search_string):
if search_word in li:
print "found at line ",li.indexOf(search_word)+1
if __name__="__main__":
f=open("input.txt","r")
li=[]
for i in f.readlines():
li.append(i.rstrip("\n"))
search_list("appendix")
METHOD 2:
def search_dict(search_string):
if d.has_key(search_word):
print "found at line ",d[search_word]
if __name__="__main__":
f=open("input.txt","r")
d={}
for i,j in zip(range(1,len(f.readlines())),f.readlines()):
d[j.rstrip("\n")]=i
search_dict("appendix")
For frequent searching, a dictionary is definitely better (provided you have enough memory to store the line numbers also) since the keys are hashed and looked up in O(1) operations. However, your implementation won't work. The first f.readlines() will exhaust the file object and you won't read anytihng with the second f.readlines().
What you're looking for is enumerate:
with open('data') as f:
d = dict((j[:-1],i) for i,j in enumerate(f,1))
It should also be pointed out that in both cases, the function which does the searching will be faster if you use try/except provided that the index you're looking for is typically found. (In the first case, it might be faster anyway since in is an order N operation and so is .index for a list).
e.g.:
def search_dict(d, search_string):
try:
print "found at line {0}".format(d[search_string])
except KeyError:
print "string not found"
or for the list:
def search_list(search_string):
try:
print "found at line {0}".format(li.indexOf(search_word)+1)
except ValueError:
print "string not found"
If you do it really frequently, then the second method will be faster (you've built something like an index).
Just adapt it a little bit:
def search_dict(d, search_string):
line = d.get(search_string)
if line:
print "found at line {}".format(line)
else:
print "string not found"
d = {}
with open("input.txt", "r") as f:
for i, word in enumerate(f.readlines(), 1):
d[word.rstrip()] = i
search_dict(d, "appendix")
I'm posting this after reading the answers of eumiro and mgilson.
If you compare your two methods on the command line, I think you'll find that the first one is faster. The other answers that say the second method is faster, but they are based on the premise that you'll do several searches on the file after you've built your index. If you use them as-is from the command line, you will not.
The building of the index is slower than just searching for the string directly, but once you've built an index, searches can be done very quickly, making up for the time spent building it. This extra time is wasted if you just use it once, because when the program is complete, the index is discarded and has to be rebuilt the next run. You need to keep the created index in memory between queries for this to pay off.
There are several ways of doing this, one is making a daemon to hold the index and use a front-end script to query it. Searching for something like python daemon client communication on google will give you pointers on implementing this -- here's one method.
First one is O(n); second one is O(1), but it requires searching on the key. I'd pick the second one.
Neither one will work if you're ad hoc searches in the document. For that you'll need to parse and index using something like Lucene.
Another option to throw in is using the FTS provided by SQLite3... (untested and making the assumption you're looking for wholewords, not substrings of words or other such things)
import sqlite3
# create db and table
db = sqlite3.connect(':memory:') # replace with file on-disk?
db.execute('create virtual table somedata using fts4(line)')
# insert the data
with open('yourfile.txt') as fin:
for lineno, line in enumerate(fin):
# You could put in a check here I guess...
if somestring in line:
print lineo # or whatever....
# put row into FTS table
db.execute('insert into somedata (line) values (?)', (line,))
# or possibly more efficient
db.executemany('insert into somedata (line) values (?)', fin)
db.commit()
look_for = 'somestring'
matches = db.execute('select rowid from somedata where line match ?', (look_for,) )
print '{} is on lines: {}'.format(look_for, ', '.join(match[0] for match in matches))
If you only wanted the first line, then add limit 1 to the end of the query.
You could also look at using mmap to map the file, then use the .find method to get the earliest offset of the string, then assuming it's not -1 (ie, not found - let's say 123456), then do mapped_file[:123456].count('\n') + 1 to get the line number.