python requests.get() InvalidSchema error - python

I'm incredibly new to python, and i'm trying to write something to get the first result returned from Google' "I'm feeling lucky" button. I have a list of 100 items I need it to get urls for. Here's what i have:
import requests
with open('2012.txt') as f:
lines = f.readlines()
for i in range(0, 100):
temp1 = "r'http://www.google.com/search?q=\""
temp2 = "\"&btnI'"
temp3 = lines[i]
temp3 = temp3[:-1]
temp4 = temp1+temp3+temp2
print temp4
var = requests.get(temp4)
print var.url
Now if I print the value in temp4 and paste it into requests.get(), it works as I want it to. However, I get error's every time I try to pass temp4 in, instead of a hard-coded string.

Specifically, I guess you're getting:
requests.exceptions.InvalidSchema: No connection adapters were found for 'r'http://www.google.com/search?q="foo"&btnI''
(except with something else in lieu of foo:-) -- please post exceptions as part of your Q, why make us guess or need to reproduce?!
The problem is obviously that leading r' which does indeed make the string into an invalid schema (the trailing ' doesn't help either).
So, try instead something like:
temp1 = 'http://www.google.com/search?q="'
temp2 = '"&btnI'
and things should go better... specifically, when I do that (still with 'foo' in lieu of a real temp3), I get
http://en.wikipedia.org/wiki/Foobar
which seems to make sense as the top search result for "foo"!-)

Related

Is there a way to ignore empty lines in the ndiff lib and to print only the + and the number of the line? PYTHON

So guys, here is my code:
import io
import difflib
import re
with io.open('textest.txt', mode="r", encoding="utf_8_sig") as file:
lines1 = file.readlines()
with io.open('minitext.txt', mode="r", encoding="utf_8_sig") as file:
lines2 = file.readlines()
def prefilter(line):
return re.sub("\s+"," ",line.strip())
for d in difflib.ndiff([prefilter(x) for x in lines1],[prefilter(x) for x in lines2]):
print(d)
the textest.txt is the full song and the minitext.txt is just a part of it. The output is this (I know, it's a justin bieber song, it's just an example)
+ somethin' I don't wanna hold back
- For all the times that you rained on my parade
- And all the clubs you get in using my name
- You think you broke my heart, oh, girl, for goodness' sake
- You think I'm crying on my own, well, I ain't
- And I didn't wanna write a song
- 'Cause I didn't want anyone thinkin' I still care, I don't, but
- You still hit my phone up
- And baby, I'll be movin' on
- And I think you should be somethin' I don't wanna hold back
Maybe you should know that
My mama don't like you and she likes everyone
And I never like to admit that I was wrong
And I've been so caught up in my job
Didn't see what's going on, but now I know
+
+
+
I'm better sleeping on my own
+ 'Cause if you like the wa
- 'Cause if you like the way you look that much
- Oh, baby, you should go and love yourself
- And if you think that I'm still holdin' on to somethin'
The thing is: I wanted to print only the + (The different lines on the lines2, that is the minitext.txt), and the number of the line which is different. I also wanted to ignore the completely empty lines so the output is just like:
somethin' I don't wanna hold back (Number of line in minitext.txt)
'Cause if you like the wa (Number of line in minitext.txt)
or anything similiar. Is there a way I could do that?
if you go into difflib.py in your system (difflib.__file__ (get path for this module in your system))
you find that ndiff -> return Differ(linejunk, charjunk).compare(a, b)
after that inside Differ class check compare function....
this module work with _dump for yield (tag + string -->>> you see in print)
we overwrite this function for our own purpose
we filter tag(we need + tag)
filter string (we need not empty)
you can check(source code) and retrieve everything that you like.
import difflib
# over write Differ class (difflib.ndiff work with compare function that exist in this class)
class Differ(difflib.Differ):
# this function used for + tag
def _dump(self, tag, x, lo, hi):
"""Generate comparison results for a same-tagged range."""
for i in range(lo, hi):
# only yield if tag == "+"
if tag == '+':
# if second list not empty yield result
if x[i] != "":
# change format (item of second list string, i==index (number line start from 0))
yield '%s(%s)' % (x[i], i)
# example of two list of string
a = ["first", "second", "three", "four"]
b = ["number_one", "second", "number_three", "four"]
differ_object = Differ()
result = differ_object.compare(a, b)
for _ in result:
print(_)
# result
"""
number_one(0)
number_three(2)
"""

Looking for vulnerabilities in my code (split method)

Here I've tried to recreate the str.split() method in Python. I've tried and tested this code and it works fine, but I'm looking for vulnerabilities to correct. Do check it out and give feedback, if any.
Edit:Apologies for not being clear,I meant to ask you guys for exceptions where the code won't work.I'm also trying to think of a more refined way without looking at the source code.
def splitt(string,split_by = ' '):
output = []
x = 0
for i in range(string.count(split_by)):
output.append((string[x:string.index(split_by,x+1)]).strip())
x = string.index(split_by,x+1)
output.append((((string[::-1])[:len(string)-x])[::-1]).strip())
return output
There are in fact a few problems with your code:
by searching from x+1, you may miss an occurance of split_by at the very start of the string, resulting in index to fail in the last iteration
you are calling index more often than necessary
strip only makes sense if the separator is whitespace, and even then might remove more than intended, e.g. trailing spaces when splitting lines
instead, add len(split_by) to the offset for the next call to index
no need to reverse the string twice in the last step
This should fix those problems:
def splitt(string,split_by=' '):
output = []
x = 0
for i in range(string.count(split_by)):
x2 = string.index(split_by, x)
output.append((string[x:x2]))
x = x2 + len(split_by)
output.append(string[x:])
return output

Concatenating strings more efficiently in Python

I've been learning Python for a couple of months, and wanted to understand a cleaner and more efficient way of writing this function. It's just a basic thing I use to look up bus times near me, then display the contents of mtodisplay on an LCD, but I'm not sure about the mtodisplay=mtodisplay+... line. There must be a better, smarter, more Pythonic way of concatenating a string, without resorting to lists (I want to output this string direct to LCD. Saves me time. Maybe that's my problem ... I'm taking shortcuts).
Similarly, my method of using countit and thebuslen seems a bit ridiculous! I'd really welcome some advice or pointers in making this better. Just wanna learn!
Thanks
json_string = requests.get(busurl)
the_data = json_string.json()
mtodisplay='220 buses:\n'
countit=0
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
countit += 1
mtodisplay=mtodisplay+thebuses['expected_departure_time']
if countit != thebuslen:
mtodisplay=mtodisplay+','
return mtodisplay
Concatenating strings like this
mtodisplay = mtodisplay + thebuses['expected_departure_time']
Used to be very inefficient, but for a long time now, Python does reuse the string being catentated to (as long as there are no other references to it), so it's linear performance instead of the older quadratic performance which should definitely be avoided.
In this case it looks like you already have a list of items that you want to put commas between, so
','.join(some_list)
is probably more appropriate (and automatically means you don't get an extra comma at the end).
So next problem is to construct the list(could also be a generator etc.). #bgporter shows how to make the list, so I'll show the generator version
def mtodisplay(busurl):
json_string = requests.get(busurl)
the_data = json_string.json()
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
yield thebuses['expected_departure_time']
# This is where you would normally just call the function
result = '220 buses:\n' + ','.join(mtodisplay(busurl))
I'm not sure what you mean by 'resorting to lists', but something like this:
json_string = requests.get(busurl)
the_data = json_string.json()
mtodisplay= []
for entry in the_data['departures']:
for thebuses in the_data['departures'][entry]:
if thebuses['line'] == '220':
thebuslen=len(the_data['departures'][entry])
print 'buslen',thebuslen
mtodisplay.append(thebuses['expected_departure_time'])
return '220 buses:\n' + ", ".join(mtodisplay)

Why re is not compiling 'if' when there is 'else'?

Hello I'm facing a problem and I don't how to fix it. All I know is that when I add an else statement to my if statement the python execution always goes to the else statement even there is there a true statement in if and can enter the if statement.
Here is the script, without the else statement:
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()
with the else statement and I want else for the if ((Mspl[0]).lower()==(Wspl[0])):
import re
f = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
d = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
w = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
s=""
av =0
b=""
filtred=[]
Mlines=f.readlines()
Wlines=d.readlines()
for line in Wlines:
Wspl=line.split()
for line2 in Mlines:
Mspl=line2.replace('\n','').split("\t")
if ((Mspl[0]).lower()==(Wspl[0])):
Wspl.append(Mspl[1])
if(len(Mspl)>=3):
Wspl.append(Mspl[2])
s="\t".join(Wspl)+"\n"
if s not in filtred:
filtred.append(s)
break
else:
b="\t".join(Wspl)+"\n"
if b not in filtred:
filtred.append(b)
break
for x in filtred:
w.write(x)
f.close()
d.close()
w.close()
first of all, you're not using "re" at all in your code besides importing it (maybe in some later part?) so the title is a bit misleading.
secondly, you are doing a lot of work for what is basically a filtering operation on two files. Remember, simple is better than complex, so for starters, you want to clean your code a bit:
you should use a little more indicative names than 'd' or 'w'. This goes for 'Wsplt', 's' and 'av' as well. Those names don't mean anything and are hard to understand (why is the d.readlines named Wlines when ther's another file named 'w'? It's really confusing).
If you choose to use single letters, it should still make sense (if you iterate over a list named 'results' it makes sense to use 'r'. 'line1' and 'line2' however, are not recommanded for anything)
You don't need parenthesis for conditions
You want to use as little variables as you can as to not get confused. There's too much different variables in your code, it's easy to get lost. You don't even use some of them.
you want to use strip rather than replace, and you want the whole 'cleaning' process to come first and then just have a code the deals with the filtering logic on the two lists. If you split each line according to some logic, and you don't use the original line anywhere in the iteration, then you can do the whole thing in the beggining.
Now, I'm really confused what you're trying to achieve here, and while I don't understand why your doing it that way, I can say that looking at your logic you are repeating yourself a lot. The action of checking against the filtered list should only happend once, and since it happens regardless of whether the 'if' checks out or not, I see absolutely no reason to use an 'else' clause at all.
Cleaning up like I mentioned, and re-building the logic, the script looks something like this:
# PART I - read and analyze the lines
Wappresults = open('C:\Users\Ziad\Desktop\Combination\WhatsappResult.txt', 'r')
Mikrofull = open('C:\Users\Ziad\Desktop\Combination\MikrofullCombMaj.txt', 'r')
Wapp = map(lambda x: x.strip().split(), Wappresults.readlines())
Mikro = map(lambda x: x.strip().split('\t'), Mikrofull.readlines())
Wappresults.close()
Mikrofull.close()
# PART II - filter using some logic
filtred = []
for w in Wapp:
res = w[:] # So as to copy the list instead of point to it
for m in Mikro:
if m[0].lower() == w[0]:
res.append(m[1])
if len(m) >= 3 :
res.append(m[2])
string = '\t'.join(res)+'\n' # this happens regardles of whether the 'if' statement changed 'res' or not
if string not in filtred:
filtred.append(string)
# PART III - write the filtered results into a file
combination = open('C:\Users\Ziad\Desktop\Combination\combination.txt','w')
for comb in filtred:
combination.write(comb)
combination.close()
I can't promise it will work (because again, like I said, I don't know what you're trying to achive) but this should be a lot easier to work with.

Replacement value not working?

So I've got a try/except block set up, which will go through a database dependent on certain conditions:
try:
for searchnumber in itertools.count(0):
print searchnumber
c.execute("""SELECT words from searchterms where onstate = 1 AND progid = %d;""") % searchnumber
searchterms = (c.fetchall())
searchterms = [",".join(x) for x in searchterms]
print searchterms
except:
pass
For some reason, it isn't iterating on progid, in fact, it isn't even getting the first value assigned to it (0). Why would this be? As far as I know, %d should be replaced by the integer value of searchnumber
You're probably hiding a TypeError because you're trying to use the % operator on whatever object or value is equivalent to c.execute("string"). You might've caught it if you hadn't hidden all errors with the bare except. You'll note this is a specific antipattern in the official Python Dos and Don'ts page.
Never use except: pass, it hides information.
The information it's currently hiding is probably a failure from this code:
c.execute("""SELECT words from searchterms where onstate = 1 AND progid = %d;""") % searchnumber

Categories

Resources