Does anyone have a nifty way to get all the three letter alphabetic currency codes (an example of the ones I mean is at http://www.iso.org/iso/support/faqs/faqs_widely_used_standards/widely_used_standards_other/currency_codes/currency_codes_list-1.htm) into a list in Python 2.5? Note I don't want to do a screen scraping version as the code has to work offline - the website is just an example of the codes.
It looks like there should be a way using the locale library but it's not clear to me from reading the documentation and there must be a better way than copy pasting those into a file!
To clear the question up more, in C# for the same problem, the following code did it very neatly using the internal locale libraries:
CultureInfo.GetCultures(CultureTypes.SpecificCultures)
.Select(c => new RegionInfo(c.LCID).CurrencySymbol)
.Distinct()
I was hoping there might be an equivalent in python. And thanks to everyone who has provided an answer so far.
Not very elegant or nifty, but you can generate the list once and save to use it later:
import urllib, re
url = "http://www.iso.org/iso/support/faqs/faqs_widely_used_standards/widely_used_standards_other/currency_codes/currency_codes_list-1.htm"
print re.findall(r'\<td valign\="top"\>\s+([A-WYZ][A-Z]{2})\s+\</td\>', urllib.urlopen(url).read())
output:
['AFN', 'EUR', 'ALL', 'DZD', 'USD', 'EUR', 'AOA', 'ARS', 'AMD', 'AWG', 'AUD',
...
'UZS', 'VUV', 'EUR', 'VEF', 'VND', 'USD', 'USD', 'MAD', 'YER', 'ZMK', 'ZWL', 'SDR']
Note that you'll need to prune everything after X.. as they are apparently reserved names, which means that you'll get one rogue entry (SDR, the last element) which you can just delete by yourself.
You can get currency codes (and other) data from geonames. Here's some code that downloads the data (save the file locally to achieve the same result offline) and populates a list:
import urllib2
data = urllib2.urlopen('http://download.geonames.org/export/dump/countryInfo.txt')
ccodes = []
for line in data.read().split('\n'):
if not line.startswith('#'):
line = line.split('\t')
try:
if line[10]:
ccodes.append(line[10])
except IndexError:
pass
ccodes = list(set(ccodes))
ccodes.sort()
Related
i fetch all the detail from the desire website but unable to get the some specific information please guide me for that.
targeted domain: https://shop.adidas.ae/en/messi-16-3-indoor-boots/BA9855.html
my code isresponse.xpath('//ul[#class="product-size"]//li/text()').extract()
need to fetch data!!!
Thanks!
Often ecommerce websites have data in json format in page source and then have javscript unpack it on users end.
In this case you can open up the page source with javascript disabled and search for keywords (like specific size).
I found in this case it can be found with regular expressions:
import re
import json
data = re.findall('window.assets.sizesMap = (\{.+?\});', response.body_as_unicode())
json.loads(data[0])
Out:
{'16': {'uk': '0k', 'us': '0.5'},
'17': {'uk': '1k', 'us': '1'},
'18': {'uk': '2k', 'us': '2.5'},
...}
Edit: More accurately you probably want to get different part of the json but nevertheless the answer is more or less the same:
data = re.findall('window.assets.sizes = (\{(?:.|\n)+?\});', response.body_as_unicode())
json.loads(data[0].replace("'", '"')) # replace single quotes to doubles
The data you want to fetch is loaded from a javascript. It is said explicitly in the tag class="js-size-value ".
If you want to get it, you will need to use a rendering service. I suggest you use Splash, it is simple to install and simple to use. You will need docker to install splash.
I'm trying to pull very specific elements from a dictionary of RSS data that was fetched using the feedparser library, then place that data into a new dictionary so it can be called on later using Flask. The reason I'm doing this is because the original dictionary contains tons of metadata I don't need.
I have broken down the process into simple steps but keep getting hung up on creating the new dictionary! As it is below, it does create a dictionary object, but it's not comprehensive-- it only contains a single article's title, URL and description-- the rest is absent.
I've tried switching to other RSS feeds and had the same result, so it would appear the problem is either the way I'm trying to do it, or there's something wrong with the structure of the list generated by feedparser.
Here's my code:
from html.parser import HTMLParser
import feedparser
def get_feed():
url = "http://thefreethoughtproject.com/feed/"
front_page = feedparser.parse(url)
return front_page
feed = get_feed()
# make a dictionary to update with the vital information
posts = {}
for i in range(0, len(feed['entries'])):
posts.update({
'title': feed['entries'][i].title,
'description': feed['entries'][i].summary,
'url': feed['entries'][i].link,
})
print(posts)
Ultimately, I'd like to have a dictionary like the following, except that it keeps going with more articles:
[{'Title': 'Trump Does Another Ridiculous Thing',
'Description': 'Witnesses looked on in awe as the Donald did this thing',
'Link': 'SomeNewsWebsite.com/Story12345'},
{...},
{...}]
Something tells me it's a simple mistake-- perhaps the syntax is off, or I'm forgetting a small yet important detail.
The code example you provided does an update to the same dict over and over again. So, you only get one dict at the end of the loop. What your example data shows, is that you actually want a list of dictionaries:
# make a list to update with the vital information
posts = []
for entry in feed['entries']:
posts.append({
'title': entry.title,
'description': entry.summary,
'url': entry.link,
})
print(posts)
Seems that the problem is that you are using a dict instead of a list. Then you are updating the same keys of the dict, so each iteration you are overriding the last content added.
I think that the following code will solve your problem:
from html.parser import HTMLParser
import feedparser
def get_feed():
url = "http://thefreethoughtproject.com/feed/"
front_page = feedparser.parse(url)
return front_page
feed = get_feed()
# make a dictionary to update with the vital information
posts = [] # It should be a list
for i in range(0, len(feed['entries'])):
posts.append({
'title': feed['entries'][i].title,
'description': feed['entries'][i].summary,
'url': feed['entries'][i].link,
})
print(posts)
So as you can see the code above are defining the posts variable as a list. Then in the loop we are adding dicts to this list, so it will give you the data structure that you want.
I hope to help you with this solution.
I have latex document in which there are various fields and there values are to be generated dynamically. So, what i am planning is to have python script which will generate values related to field and then insert it inside the latex document. The document looks as follows:
Project = ABC
Version = 1.0.0
Date = xyz
Now the values of project , version and date are to be filled by using python script. So, please help me how can i have the values inside latex document. I searched and have got generating the whole latex document from python but i want the two processes to be seperate. So, please help. I have with me the latex document code so, i realy don't want to play around with the code as its completly new to me, i just want to feed values inside the various fields using python.
If I understand your intention, I would just replace the values within the LaTeX source by named variables, such as $project instead of ABC, $version instead of 1.0.0 etc. Then you can run the following Python script to substitute these named variables by their actual values. This assumes that the LaTeX source doesn't contain other occurrences of text conflicting with the variable syntax $xy. If it is not the case, other syntax can be chosen.
You didn't specify how you get the values into the python program. Here I assume you can define them statically within the python code.
The program will fail when an undefined variable name (not present in the dictionary) is found within the source file. It could also be changed to leave such text unchanged or replace it by empty string depending on your needs.
#!/usr/bin/env python
import sys
import re
variables = {
'project': 'ABC',
'version': '1.0.0',
'date': 'xyz',
}
def run(args):
if len(args) == 1:
filename = args[0]
else:
sys.stderr.write("Filename must be passed as argument.\n")
sys.exit(1)
regex = re.compile(r"\$([a-zA-Z][a-zA-Z_]*)")
with open(filename) as f:
for line in f:
sys.stdout.write(regex.sub(lambda m: variables[m.group(1)], line))
if __name__ == '__main__':
run(sys.argv[1:])
I'm kinda new to Python. And I'm trying to find out how to do parsing in Python?
I've got a task: to do parsing with some piece of unknown for me symbols and put it to DB. I guess I can create DB and tables with help of SQLAlchemy, but I have no idea how to do parsing and what all these symbols below mean?
http://joxi.ru/YmEVXg6Iq3Q426
http://joxi.ru/E2pvG3NFxYgKrY
$$HDRPUBID 112701130020011127162536
H11127011300UNIQUEPONUMBER120011127
D11127011300UNIQUEPONUMBER100001112345678900000001
D21127011300UNIQUEPONUMBER1000011123456789AR000000001
D11127011300UNIQUEPONUMBER200002123456987X000000001
D21127011300UNIQUEPONUMBER200002123456987XIR000000000This item is inactive. 9781605600000
$$EOFPUBID 1127011300200111271625360000005
Thanks in advance those who can give me some advices what to start from and how the parsing is going on?
The best approach is to first figure out where each token begins and ends, and write a regular expression to capture these. The site RegexPal might help you design the regex.
As other suggest take a look to some regex tutorials, and also re module help.
Probably you're looking to something like this:
import re
headerMapping = {'type': (1,5), 'pubid': (6,11), 'batchID': (12,21),
'batchDate': (22,29), 'batchTime': (30,35)}
poaBatchHeaders = re.findall('\$\$HDR\d{30}', text)
parsedBatchHeaders = []
batchHeaderDict = {}
for poaHeader in poaBatchHeaders:
for key in headerMapping:
start = headerMapping[key][0]-1
end = headerMapping[key][1]
batchHeaderDict.update({key: poaHeader[start:end]})
parsedBatchHeaders.append(batchHeaderDict)
Then you have list with dicts, each dict contains data for each attribute. I assume that you have your datafile in text which is string. Each dict is made for one found structure (POA Batch Header in example).
If you want to parse it further, you have to made a function to parse each date in each attribute.
def batchDate(batch):
return (batch[0:2]+'-'+batch[2:4]+'-20'+batch[4:])
for header in parsedBatchHeaders:
header.update({'batchDate': batchDate( header['batchDate'] )})
Remember, that's an example and I don't know documentation of your data! I guess it works like that, but rest is up to you.
I have a dictionary containing (among others) this key value pair:
'Title': '\xc3\x96lfarben'
In German this translates to Ölfarben.
I have trouble to print this string to stout properly.
It is always printed as Ãlfarben
I already tried to use string.decode("utf-8"), string.encode("utf-8"), and many more combinations such as unicode(string.decode("utf-8")) etc.
The problem is that I still have troubles to understand unicode, utf-8 etc.
Can anyone help?
Update
Here is some more information.
I am receiving a csv file report from the google adwords api (using the official python library to access the api). This data is presumably utf-8 encoded and stored to disk.
Then I use the dictreader method to read the csv from disk and convert it to a dict. Then I iterate of the data and use the print method. This is where the problem above occurs.
this is an entire line from the imported dict:
{'Destination URL': 'http://domain.com/file.html?adword={keyword}', 'Ad': 'Staffeleien', 'Campaign': '\xc3\x96 Farben', 'Ad group state': 'enabled', 'Ad state': 'enabled', 'Ad group': 'Farben', 'Campaign state': 'active'}
If you've added u to this string - dont do it, you should decode it at first. In unicode this string look like this: u'\xd6lfarben':
>>> print u'\xc3\x96lfarben'
Ãlfarben
>>> print '\xc3\x96lfarben'.decode('utf-8')
Ölfarben
>>> '\xc3\x96lfarben'.decode('utf-8')
u'\xd6lfarben'
with unicode function:
>>> unicode('\xc3\x96lfarben', encoding='utf-8')
u'\xd6lfarben'