python dictionary do not print if there is no value - python

I have written a short script to print some info based on information in a CSV file.
What I need to be able to do is to make the print function not print if there is no value for the key, or if there is a default value such as 'n/a'.
Or it may be that if there is a default or empty cell in the CSV, that it doesn't get added to the dictionary? Not sure what is the best option.
import csv
with open('lhcdes.csv', 'rb') as testcsv:
myfile = csv.DictReader(testcsv)
for row in myfile:
print 'Key1 %s' % row.get('Key1') + '\n' + 'and ' + 'Key2:%s ' % row.get('Key2') + 'Key3:%s ' % row.get('Key3:')
the CSV Format is as follow:
Key1,Key2,Key3,Key4,Key5,Key6
Gi0/3/0/1.1838,CustA,EU1,AN-12345,TAL12345,Host1_London
Gi0/3/0/1.2072,CustB,EU2,AN-12346,TAL12346,Host2_Manchester
Gi0/3/0/2.3761,CustB,EU3,AN-12347,TAL12347,Not Found
Gi0/3/0/3.3573,CustC,EU7,AN-12348,TAL12348,Host5_Swansea
Gi0/3/0/3.3702,CustD,EU5,AN-12349,N/A,Host4_Glasgow
Gi0/3/0/3.3917,CustB,EU6,AN-12350,TAL12350,Not Found
Gi0/3/0/3.3918,CustA,EU2,AN-12351,TAL12351,N/A
Gi0/3/0/3.3919,CustE,EU9,AN-12352,Not Found,Not Found
Gi0/3/0/3.3923,CustE,EU9,AN-12353,TAL12353,N/A
Gi0/3/0/4.512,CustC,EU8,AN-12354,TAL12354,Not Found
The output should look like
interface Gi0/3/0/1.1838
Client:CustA EU:EU1 IR:AN-12345 CR:TAL12345 R:Host1_London
interface Gi0/3/0/1.2072
Client:CustB EU:EU2 IR:AN-12346 CR:TAL12346 R:Host2_Manchester
Where info is absent or n/a
interface Gi0/3/0/3.3919
Client:CustE EU:EU9 IR:AN-12352

You need to test the contents of Key5 and Key6 for a missing value and then format your output accordingly:
import csv
missing = ['Not Found', None, 'N/A', '']
with open('lhcdes.csv', 'rb') as testcsv:
myfile = csv.DictReader(testcsv)
for row in myfile:
if row['Key5'] in missing or row['Key6'] in missing:
print 'interface {}\nClient: {} {} IR:{}'.format(
row['Key1'], row['Key2'], row['Key3'], row['Key4'])
else:
print 'interface {}\nClient: {} {} IR:{} CR:{} R:{}'.format(
row['Key1'], row['Key2'], row['Key3'], row['Key4'], row['Key5'], row['Key6'])
The script would display the following output:
interface Gi0/3/0/1.1838
Client: CustA EU1 IR:AN-12345 CR:TAL12345 R:Host1_London
interface Gi0/3/0/1.2072
Client: CustB EU2 IR:AN-12346 CR:TAL12346 R:Host2_Manchester
interface Gi0/3/0/2.3761
Client: CustB EU3 IR:AN-12347
interface Gi0/3/0/3.3573
Client: CustC EU7 IR:AN-12348 CR:TAL12348 R:Host5_Swansea
interface Gi0/3/0/3.3702
Client: CustD EU5 IR:AN-12349
interface Gi0/3/0/3.3917
Client: CustB EU6 IR:AN-12350
interface Gi0/3/0/3.3918
Client: CustA EU2 IR:AN-12351
interface Gi0/3/0/3.3919
Client: CustE EU9 IR:AN-12352
interface Gi0/3/0/3.3923
Client: CustE EU9 IR:AN-12353
interface Gi0/3/0/4.512
Client: CustC EU8 IR:AN-12354

One way would be to print the keys in a loop, given a condition:
import csv
with open('lhcdes.csv', 'rb') as testcsv:
myfile = csv.DictReader(testcsv)
keys = ['key1', 'key2', 'key3'] # a list of all keys you want to print
default_value = 'n/a' # the default value you want to skip
for row in myfile:
for key in keys:
value = row.get(key, default_value) # get the value at 'key', or the
# default value if 'key' was not found
if value == default_value:
continue # skip to the next item in the 'keys'
# list
print("{} {}".format(key, value)) # use .format over '%' syntax
This way, you skip any entries where there was either not value or whatever your default is.

Related

Joining similar sections from configparser

Good morning,
I have a configuration file with data like this:
[hostset 1]
ip = 192.168.122.136
user = test
password =
pkey = ~/.ssh/id_rsa
[hostset 2]
ip = 192.168.122.138
user = test
password =
pkey = ~/.ssh/id_rsa
I want to be able to join the ips of any given number of host sets in this configuration file if the other values are the same, so the ingested and formatted data would be stored in a dict, something like this:
{
ip: ['192.168.122.136', '192.168.122.138'],
user: 'test',
password: '',
pkey: '~/.ssh/id_rsa',
}
by doing something like:
from configparser import ConfigParser
def unpack(d):
return [value for key, value in d.items()]
def parse(configuration_file):
parser = ConfigParser()
parser.read(configuration_file)
hosts = [unpack(connection) for connection in [section for section in dict(parser).values()]][1:]
return [i for i in hosts]
if __name__ == '__main__':
parse('config.ini')
I can get a list of lists containing the elements of the configuration file, like this:
[['192.168.122.136', 'test', '', '~/.ssh/id_rsa'], ['192.168.122.138', 'test', '', '~/.ssh/id_rsa']]
Then I just need a way of comparing the two lists and if all elements are similar except for the ip, then join them into a list like:
[['192.168.122.136','192.168.122.138'], 'test', '', '~/.ssh/id_rsa']
So I would just need a smart way of doing this with a list of lists of no specific length and join all similar lists.
Got some help from a friend and solved the question. The key was making the values I wanted to compare into a tuple, making that tuple the key to a dictionary and the value the ips. From this. I can assert that if the tuple key already exists, then I will append the ip to the value.
from configparser import ConfigParser
from ast import literal_eval as literal
def unpack(d):
return [value for key, value in d.items()]
def parse(configuration_file):
parser = ConfigParser()
parser.read(configuration_file)
hosts = [unpack(connection) for connection in [section for section in dict(parser).values()]][1:]
d = dict()
for item in hosts:
try:
d[str((item[1:]))].append(item[0])
except KeyError:
d[str((item[1:]))] = [item[0]]
return d
if __name__ == '__main__':
for k, v in parse('config.ini').items():
print([v, *literal(k)])
In this solution, I presumed that the file format is exactly as described in the question:
First we split the host sets:
we suppose that your data is in rowdata variable
HostSets = rowdata.split("[hostset ") # first element is empty
Dict = {}
for i in range (1,len(HostSets)):
l = HostSets[i].split("ip = ")#two elements the first is trash
ip = l[1].split()[0]
conf =l[1].split("\n",1 )[1] #splits only the first element
try :
Dict[conf].append(ip)
except :
Dict[conf] = list()
Dict[conf].append(ip)
print('{')
for element in Dict:
print("ip: ",Dict[element],",",element)
print('}')

Python winreg - How to write a linefeed to a REG_SZ value

I need some assistance in writing a line feed to a registry value. The Value is of type REG_SZ.
I'm able to do this manually, by adding a "0A" to each break position when modifying the hex value in the Registry, but I'm not sure how to do this programmatically.
This is my current code to write the String to Registry, which WORKS, but does not allow for the line feeds:
(identifying information redacted, and text shortened. The "#" is the position for a line feed)
import os
import _winreg
from _winreg import *
Key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
Field = [_winreg.REG_SZ, _winreg.REG_BINARY]
Sub_Key = ["legalnoticetext",
"legalnoticecaption"]
value = ["Terms of Use",
"By logging in to this PC, users agree to the Terms of Use"
"\nThe User agrees to the following:#- The User may not alter, in any way, "
"with Settings on this PC without the approval from any Authorised Persons."]
parmLen = len(Sub_Key)
z = 0 # Loop Counter for list iteration
try:
Key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
if not os.path.exists(Key):
key = _winreg.CreateKey(HKEY_LOCAL_MACHINE, Key)
Registrykey = OpenKey(HKEY_LOCAL_MACHINE, Key, 0, KEY_WRITE)
while z < parmLen:
_winreg.SetValueEx(Registrykey, Sub_Key[z], 0, Field[z], value[z])
print ("Setting <" + Sub_Key[z] + "> with value: " + value[z])
z += 1
CloseKey(Registrykey)
print ("SUCCESSFUL! Procedure: Show Logon Terms of Use")
except WindowsError:
print ("Error")
I've tested the following code to see if I can write directly in Hex, as I have the modified hex value, but it results in the value being interpreted as a string, and then formatting it incorrectly (again) to Hex.
import os
import _winreg
from _winreg import *
Key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
Field = [_winreg.REG_SZ, _winreg.REG_BINARY]
Sub_Key = ["legalnoticetext",
"legalnoticecaption"]
value = ["Terms of Use",
"42,00,79,00,20,00,6C,00,6F,00,67,00,67,00,69,00,6E,00,67,00,20,00,69,00,"
"6E,00,20,00,74,00,6F,00,20,00,74,00,68,00,69,00,73,00,20,00,50,00,43,00,"
"2C,00,20,00,75,00,73,00,65,00,72,00,73,00,20,00,61,00,67,00,72,00,65,00,"
"65,00,20,00,74,00,6F,00,20,00,74,00,68,00,65,00,20,00,54,00,65,00,72,00,"
"6D,00,73,00,20,00,6F,00,66,00,20,00,55,00,73,00,65,00,0A,00,54,00,68,00,"
"65,00,20,00,55,00,73,00,65,00,72,00,20,00,61,00,67,00,72,00,65,00,65,00,"
"73,00,20,00,74,00,6F,00,20,00,74,00,68,00,65,00,20,00,66,00,6F,00,6C,00,"
"6C,00,6F,00,77,00,69,00,6E,00,67,00,3A,00,0A,00,2D,00,20,00,54,00,68,00,"
"65,00,20,00,55,00,73,00,65,00,72,00,20,00,6D,00,61,00,79,00,20,00,6E,00,"
"6F,00,74,00,20,00,61,00,6C,00,74,00,65,00,72,00,2C,00,20,00,69,00,6E,00,"
"20,00,61,00,6E,00,79,00,20,00,77,00,61,00,79,00,2C,00,20,00,77,00,69,00,"
"74,00,68,00,20,00,53,00,65,00,74,00,74,00,69,00,6E,00,67,00,73,00,20,00,"
"6F,00,6E,00,20,00,74,00,68,00,69,00,73,00,20,00,50,00,43,00,20,00,77,00,"
"69,00,74,00,68,00,6F,00,75,00,74,00,20,00,74,00,68,00,65,00,20,00,61,00,"
"70,00,70,00,72,00,6F,00,76,00,61,00,6C,00,20,00,66,00,72,00,6F,00,6D,00,"
"20,00,61,00,6E,00,79,00,20,00,41,00,75,00,74,00,68,00,6F,00,72,00,69,00,"
"73,00,65,00,64,00,20,00,50,00,65,00,72,00,73,00,6F,00,6E,00,73,00,2E,00,"
"00,00"]
parmLen = len(Sub_Key)
z = 0 # Loop Counter for list iteration
try:
Key = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
if not os.path.exists(Key):
key = _winreg.CreateKey(HKEY_LOCAL_MACHINE, Key)
Registrykey = OpenKey(HKEY_LOCAL_MACHINE, Key, 0, KEY_WRITE)
while z < parmLen:
_winreg.SetValueEx(Registrykey, Sub_Key[z], 0, Field[z], value[z])
print ("Setting <" + Sub_Key[z] + "> with value: " + value[z])
z += 1
CloseKey(Registrykey)
print ("SUCCESSFUL! Procedure: Show Logon Terms of Use")
except WindowsError:
print ("Error")
RTFM'ing doesn't prove to be very helpful with this specific issue, so any guidance would be appreciated!
You have a couple of issues going on. os.path.exists checks for file path existence. Feeding it the key string won't check the registry.
It looks like you are using Python 2.7, please consider upgrading to 3. The issue you are running into is that writing a REG_BINARY expects binary data; in Python 3, this means you just need to encode your string. Here is the code in Python 3.
import winreg
Key_str = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
Field = [winreg.REG_SZ, winreg.REG_BINARY]
Sub_Key = ["legalnoticetext", "legalnoticecaption"]
value = ["Terms of Use",
"By logging in to this PC, users agree to the Terms of Use"
"\nThe User agrees to the following:#- The User may not alter, in any way, "
"with Settings on this PC without the approval from any Authorised Persons."]
try:
key = winreg.OpenKey(winreg.HKEY_LOCAL_MACHINE, Key_str)
except FileNotFoundError:
key = winreg.CreateKey(winreg.HKEY_LOCAL_MACHINE, Key_str)
for sk, f, v in zip(Sub_Key, Field, value):
if f == winreg.REG_BINARY:
winreg.SetValueEx(key, sk, 0, f, v.encode('latin-1'))
else:
winreg.SetValueEx(key, sk, 0, f, v)
key.Close()
In Python 2, your standard strings are byte strings, so there is no need to encode the string to a bytes encoding. Here is the Python 2 code:
import _winreg
Key_str = r"SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System"
Field = [_winreg.REG_SZ, _winreg.REG_BINARY]
Sub_Key = ["legalnoticetext", "legalnoticecaption"]
value = ["Terms of Use",
"By logging in to this PC, users agree to the Terms of Use"
"\nThe User agrees to the following:#- The User may not alter, in any way, "
"with Settings on this PC without the approval from any Authorised Persons."]
try:
key = _winreg.OpenKey(_winreg.HKEY_LOCAL_MACHINE, Key_str)
except WindowsError:
key = _winreg.CreateKey(_winreg.HKEY_LOCAL_MACHINE, Key_str)
for sk, f, v in zip(Sub_Key, Field, value):
_winreg.SetValueEx(key, sk, 0, f, v)
key.Close()

How to add tag and region name while printing the result

This is basically a effort to learn mapping for dictionary, basically i have a function which prints the change in a port , the code is as follows :
def comp_ports(self,filename,mapping):
try:
#print "HEYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
f = open(filename)
self.prev_report = pickle.load(f) # NmapReport
for s in self.prev_report.hosts:
self.old_port_dict[s.address] = set()
for x in s.get_open_ports():
self.old_port_dict[s.address].add(x)
for s in self.report.hosts:
self.new_port_dict[s.address] = set()
for x in s.get_open_ports():
self.new_port_dict[s.address].add(x)
print "The following Host/ports were available in old scan : !!"
print `self.old_port_dict`
print "--------------------------------------------------------"
print "The following Host/ports have been added in new scan: !!"
print `self.new_port_dict`
##
for h in self.old_port_dict.keys():
self.results_ports_dict[h] = self.new_port_dict[h]- self.old_port_dict[h]
print "Result Change: for",h ,"->",self.results_ports_dict[h]
################### The following code is intensive ###################
print "++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++"
diff_key=[key for key in self.old_port_dict if self.old_port_dict[key]!=self.new_port_dict[key]]
for key in diff_key:
print "For %s, Port changed from %s to %s" %(key,self.old_port_dict[key],self.new_port_dict[key])
The way i call this is via main function,
if __name__ == "__main__":
if len(sys.argv) < 2:
print "Usage:\n\tportwatch.py <configfile> [clean]"
sys.exit(-1)
else:
# Read
config = ConfigParser.ConfigParser()
config.read(sys.argv[1])
if len(sys.argv) > 2:
if sys.argv[2] == "clean":
for f in ['nmap-report-old.pkl','nmap-report.pkl']:
try:
os.remove( config.get('system','scan_directory') + "/" + f )
except Exception as e:
print e
# Configure Scanner
s = Scanner(config)
# Execute Scan and Generate latest report
net_range = gather_public_ip() #config.get('sources','networks') # gather_public_ip()
### r = s.run(','.join([[i[0] for i in v] for v in net_range][0]))
r = s.run(net_range)
data = list(itertools.chain(*net_range))
mapping = {i[0]:[i[1],i[2]] for i in data}
s.save()
report = Report(r)
report.dump_raw(mapping) ## change made for dump to dump_raw
print "Hosts in scan report",report.total_hosts()
# Read in last scan
report.compare(config.get('system','scan_directory') + '/nmap-report-old.pkl' )
print "New Hosts"
report.new_hosts()
# slack.api_token = config.get('notification','slack_key')
notify_slack_new_host(report.new_hosts()) #Notifty Slack for any new added host
# for h in report.result_port_dict.keys():
# notify_slack(report.new_hosts(h))
print "Lost Hosts"
report.lost_hosts()
report.comp_ports(config.get('system','scan_directory') + '/nmap-report-old.pkl',mapping)
The whole code is at http://pastebin.com/iDYBBrEq , can someone please help me at comp_ports where i want to also add the tag and region name as similer to dump_raw.
Please help
Since the IP is your key in the dictionaries old_port_dict, new_port_dict and mapping and in mapping each IP maps to a list with tag at index 0 and region at index 1, the way to access those will be.
for key in diff_key:
print "For %s with tag %s and region %s, Port changed from %s to %s" %(key,mapping[key][0],mapping[key][1],self.old_port_dict[key],self.new_port_dict[key])

Python parsing complex text

I'm struggling to develop an algorithm that can edit the below snip of an XML file. Can anyone help with ideas? Requirements are to parse the file as input, remove the "cipher" that uses "RC4", and output a new xml file, with just "RC4" cipher removed. The problem is there are multiple "Connector" sections within the XML file. I need to read all of them, but only edit the one that uses port 443 and with a specific IP address. So the script would need to parse each Connector section one at a time, but discard the ones that don't have correct IP address and port. Have tried:
1. Using ElementTree XML parser. Problem is it doesn't output the new XLM file well - it's a mess. I need it prettified with python 2.6.
<Connector
protocol="org.apache.coyote.http11.Http11NioProtocol"
port="443"
redirectPort="443"
executor="tomcatThreadPool"
disableUploadTimeout="true"
SSLEnabled="true"
scheme="https"
secure="true"
clientAuth="false"
sslEnabledProtocols="TLSv1,TLSv1.1,TLSv1.2"
keystoreType="JKS"
keystoreFile="tomcat.keystore"
keystorePass="XXXXX"
server="XXXX"
ciphers="TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
TLS_DH_RSA_WITH_AES_128_CBC_SHA,
TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
TLS_DH_DSS_WITH_AES_128_CBC_SHA,
TLS_RSA_WITH_AES_128_CBC_SHA,
TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_RSA_WITH_RC4_128_SHA"
address="192.168.10.6">
Here was my code:
from xml.etree import ElementTree
print "[+] Checking for removal of RC4 ciphers"
file = "template.xml"
with open(file, 'rt') as f:
tree = ElementTree.parse(f)
f.close()
for node in tree.getiterator('Connector'):
if node.tag == 'Connector':
address = node.attrib.get('address')
port = node.attrib.get('port')
if "EMSNodeMgmtIp" in address and port == "443":
ciphers = node.attrib.get('ciphers')
if "RC4" in ciphers:
# If true, RC4 is enabled somewhere in the cipher suite
print "[+] Found RC4 enabled ciphers"
# Find RC4 specific cipher suite string, for replacement
elements = ciphers.split()
search_str = ""
for element in elements:
if "RC4" in element:
search_str = element
print "[+] Search removal RC4 string: %s" % search_str
# Replace string by removing RC4 cipher
print "[+] Removing RC4 cipher"
replace_str = ciphers.replace(search_str,"")
rstrip_str = replace_str.rstrip()
if rstrip_str.endswith(','):
new_cipher_str = rstrip_str[:-1]
#print new_cipher_str
node.set('ciphers', new_cipher_str)
tree.write('new.xml')
I included comments to explain what is going on.
inb4downvote
from lxml import etree
import re
xml = '''<?xml version="1.0"?>
<data>
<Connector
protocol="org.apache.coyote.http11.Http11NioProtocol"
port="443"
redirectPort="443"
executor="tomcatThreadPool"
disableUploadTimeout="true"
SSLEnabled="true"
scheme="https"
secure="true"
clientAuth="false"
sslEnabledProtocols="TLSv1,TLSv1.1,TLSv1.2"
keystoreType="JKS"
keystoreFile="tomcat.keystore"
keystorePass="XXXXX"
server="XXXX"
ciphers="TLS_DHE_RSA_WITH_AES_128_CBC_SHA,
TLS_DH_RSA_WITH_AES_128_CBC_SHA,
TLS_DHE_DSS_WITH_AES_128_CBC_SHA,
TLS_DH_DSS_WITH_AES_128_CBC_SHA,
TLS_RSA_WITH_AES_128_CBC_SHA,
TLS_DHE_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_DH_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_RSA_WITH_3DES_EDE_CBC_SHA,
TLS_RSA_WITH_RC4_128_SHA"
address="192.168.10.6"></Connector></data>'''
tree = etree.fromstring(xml)
root = tree.getroottree().getroot()
for connector in root.findall('Connector'):
port = connector.get('port')
ip = connector.get('address')
#change this to port/ip you want to remove
if port != '443' or ip != '192.168.10.6':
#removes child connector
connector.getparent().remove(connector)
continue
#here we use list comprehension to remove any cipher with "RC4"
ciphers = ','.join([x for x in re.split(r',\s*', connector.get('ciphers')) if 'RC4' not in x])
#set the modified cipher back
connector.set('ciphers', ciphers)
print etree.tostring(root, pretty_print=True)
If the XML tools don't preserve the original structure and formatting, dump them. This is a straightforward text-processing problem, and you can write a Python program to handle it.
Spin through the lines of the file; simply echo to the output anything other than a "cipher" statement. When you hit one of those:
Stuff the string into a variable.
Split the string into a list.
Drop any list element containing "RC4".
Print the resulting "cipher" statement in your desired format.
Return to normal "read-and-echo" processing.
Does this algorithm get you going?
Answer below. Basically had to read each of the Connector sections (there were 4) into a temporary list, to check if port and address are correct. If they are, then make a change to the Cipher by removing cipher string but only if RC4 cipher is enabled. So the code had to read in all of the 4 Connectors, one at a time, into a temporary list.
f = open('template.xml', 'r')
lines = f.readlines()
f.close()
new_file = open('new.xml', 'w')
tmp_list = []
connector = False
for line in lines:
if '<Connector' in line:
connector = True
new_file.write(line)
elif '</Connector>' in line:
connector = False
port = False
address = False
for a in tmp_list:
if 'port="443"' in a:
port = True
elif 'address="%(EMSNodeMgmtIp)s"' in a:
address = True
if port and address:
new_list = []
count = 0
for b in tmp_list:
if "RC4" in b:
print "[+] Found RC4 cipher suite string at line index %d: %s" % (count,b)
print "[+] Removing RC4 cipher string from available cipher suites"
# check if RC4 cipher string ends with "
check = b[:-1]
if check.endswith('"'):
tmp_str = tmp_list[count-1]
tmp_str2 = tmp_str[:-2]
tmp_str2+='"\n'
new_list[count-1] = tmp_str2
replace_line = b.replace(b,"")
new_list.append(replace_line)
else:
replace_line = b.replace(b,"")
new_list.append(replace_line)
else:
new_list.append(b)
count+=1
for c in new_list:
new_file.write(c)
new_file.write(' </Connector>\n')
else:
# Not port and address
for d in tmp_list:
new_file.write(d)
new_file.write(' </Connector>\n')
tmp_list = []
elif connector:
tmp_list.append(line)
else:
new_file.write(line)
new_file.close()

Fetching language detection from Google api

I have a CSV with keywords in one column and the number of impressions in a second column.
I'd like to provide the keywords in a url (while looping) and for the Google language api to return what type of language was the keyword in.
I have it working manually. If I enter (with the correct api key):
http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=merde
I get:
{"responseData": {"language":"fr","isReliable":false,"confidence":6.213709E-4}, "responseDetails": null, "responseStatus": 200}
which is correct, 'merde' is French.
so far I have this code but I keep getting server unreachable errors:
import time
import csv
from operator import itemgetter
import sys
import fileinput
import urllib2
import json
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
#not working
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
# Deserialize JSON to Python objects
result_object = json.loads(result)
#Get the rows in the table, then get the second column's value
# for each row
return row in result_object
#not working
def retrieve_terms(seedterm):
print(seedterm)
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=%(seed)s'
url = url_template % {"seed": seedterm}
try:
with urllib2.urlopen(url) as data:
data = perform_request(seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
#terms = parse_result(result)
#print terms
print result
def main(argv):
filename = argv[1]
csvfile = open(filename, 'r')
csvreader = csv.DictReader(csvfile)
rows = []
for row in csvreader:
rows.append(row)
sortedrows = sorted(rows, key=itemgetter('impressions'), reverse = True)
keys = sortedrows[0].keys()
for item in sortedrows:
retrieve_terms(item['keywords'])
try:
outputfile = open('Output_%s.csv' % (filename),'w')
except IOError:
print("The file is active in another program - close it first!")
sys.exit()
dict_writer = csv.DictWriter(outputfile, keys, lineterminator='\n')
dict_writer.writer.writerow(keys)
dict_writer.writerows(sortedrows)
outputfile.close()
print("File is Done!! Check your folder")
if __name__ == '__main__':
start_time = time.clock()
main(sys.argv)
print("\n")
print time.clock() - start_time, "seconds for script time"
Any idea how to finish the code so that it will work? Thank you!
Try to add referrer, userip as described in the docs:
An area to pay special attention to
relates to correctly identifying
yourself in your requests.
Applications MUST always include a
valid and accurate http referer header
in their requests. In addition, we
ask, but do not require, that each
request contains a valid API Key. By
providing a key, your application
provides us with a secondary
identification mechanism that is
useful should we need to contact you
in order to correct any problems. Read
more about the usefulness of having an
API key
Developers are also encouraged to make
use of the userip parameter (see
below) to supply the IP address of the
end-user on whose behalf you are
making the API request. Doing so will
help distinguish this legitimate
server-side traffic from traffic which
doesn't come from an end-user.
Here's an example based on the answer to the question "access to google with python":
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from pprint import pprint
api_key, userip = None, None
query = {'q' : 'матрёшка'}
referrer = "https://stackoverflow.com/q/4309599/4279"
if userip:
query.update(userip=userip)
if api_key:
query.update(key=api_key)
url = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s' %(
urllib.urlencode(query))
request = urllib2.Request(url, headers=dict(Referer=referrer))
json_data = json.load(urllib2.urlopen(request))
pprint(json_data['responseData'])
Output
{u'confidence': 0.070496580000000003, u'isReliable': False, u'language': u'ru'}
Another issue might be that seedterm is not properly quoted:
if isinstance(seedterm, unicode):
value = seedterm
else: # bytes
value = seedterm.decode(put_encoding_here)
url = 'http://...q=%s' % urllib.quote_plus(value.encode('utf-8'))

Categories

Resources