I think this is simple but I am not finding an answer that works. The data importing seems to work but separating the "/" numbers doesnt code is below. thanks for the help.
import urllib.request
opener = urllib.request.FancyURLopener({})
url = "http://jse.amstat.org/v22n1/kopcso/BeefDemand.txt"
f = opener.open(url)
content = f.read()
# below are the 3 different ways I tried to separate the data
content.encode('string-escape').split("\\x")
content.split('\r')
content.split('\\')
I highly recommend Pandas for reading and analysing this kind of file. It supports reading directly from a url and also gives meaningful analysis ability.
import pandas
url = "http://jse.amstat.org/v22n1/kopcso/BeefDemand.txt"
df = pandas.read_table(url, sep="\t+", engine='python', index_col="Year")
Note that you have multiple repeated tabs as separators in that file, which is handled by the sep="\t+". The repeats also means you have to use the python engine.
Now that the file is read into a dataframe, we can do easy plotting for instance:
df[['ChickPrice', 'BeefPrice']].plot()
Simply use a csv.reader or csv.DictReader to parse the contents. Make sure to set the delimiter to tabs, in this case:
import requests
import csv
import re
url = "http://jse.amstat.org/v22n1/kopcso/BeefDemand.txt"
response = requests.get(url)
response.raise_for_status()
text = re.sub("\t{1,}", "\t", response.text)
reader = csv.DictReader(text.splitlines(), delimiter="\t")
for row in reader:
print(row)
I like csv.DictReader better in this case, because it consumes the header line for you and each "row" is a dictionary. Your specific text file sometimes seperates fields with repeated tabs to make it look prettier, so you'll have to take that into account in some way. In my snippet, I used a regular expression to replace all tab-clusters with a single tab.
I want to convert a stream of JSONs (nearly 10,000) pasted in a file to a CSV file with a particular format for headers and values.
I have the following streams of JSON data :
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
... and so on.
I want to convert it into this form in CSV with whatever the headers (shortUrlclicks, longUrclicks, etc.) are:
I would be thankful to if you could please help me in the same. Any code in python or any other language would be useful.
You can use JSON library from the Python Standard Library to read the JSON and read/write files using the OS Library (from Python Standard Library too).
It would be something like this:
f = File.open('file.json', 'r')
items = json.loads(f.read())
csv_file = ""
for row in items():
new_row = ""
# get columns somehow
for column in columns:
new_row = "%s," % column
# Finished row, append a '\n' char to the row string.
csv_file.append("%s\n" % new_row)
# write json file
out = File.open('out.csv', 'rw')
out.write(csv_file)
out.close()
PS: I didn't run this code before posting. This is something to you to get an idea.
You can use Pandas to do this for you.
read your JSON file like this
df = pandas.read_json('filename.json')
write to csv
df.to_csv('filename.csv', index=False) # set index false if u don't need it
Example:
http://hayd.github.io/2013/pandas-json
REF:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
import urllib
import json
import re
import csv
from bs4 import BeautifulSoup
game_code = open("/Users//Desktop/PYTHON/gc.txt").read()
game_code = game_code.split("\r")
for gc in game_code:
htmltext =urllib.urlopen("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id="+gc+"&lang_code=en&fmt=json&tab=pxpverbose")
soup= BeautifulSoup(htmltext, "html.parser")
j= json.loads(soup.text)
summary = ['GC'],['Pxpverbose']
for event in summary:
print gc, ["event"]
I can not seem to access the lib to print the proper headers and row. I ultimately want to export specific rows to csv. I downloaded python 2 days ago, so i am very new. I needed this one data set for a project. Any advice or direction would be greatly appreciated.
Here are a few game codes if anyone wanted to take a look. Thanks
21127,20788,20922,20752,21094,21196,21295,21159,21128,20854,21057
Here are a few thoughts:
I'd like to point out the excellent requests as an alternative to urllib for all your HTTP needs in Python (you may need to pip install requests).
requests comes with a built-in json decoder (you don't need BeautifulSoup).
In fact, you have already imported a great module (csv) to print headers and rows of data. You can also use this module to write the data to a file.
Your data is returned as a dictionary (dict) in Python, a data structure indexed by keys. You can access the values (I think this is what you mean by "specific rows") in your data with these keys.
One of many possible ways to accomplish what you want:
import requests
import csv
game_code = open("/Users//Desktop/PYTHON/gc.txt").read()
game_code = game_code.split("\r")
for gc in game_code:
r = requests.get("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id="+gc+"&lang_code=en&fmt=json&tab=pxpverbose")
data = r.json()
with open("my_data.csv", "a") as csvfile:
wr = csv.writer(csvfile,delimiter=',')
for summary in data["GC"]["Pxpverbose"]:
wr.writerow([gc,summary["event"]])
# add keys to write additional values;
# e.g. summary["some-key"]. Example:
# wr.writerow([gc,summary["event"],summary["id"]])
You don't need beautiful soup for this; the data can be read directly from the URL into JSON format.
import urllib, json
response = urllib.urlopen("http://cluster.leaguestat.com/feed/index.php?feed=gc&key=f109cf290fcf50d4&client_code=ohl&game_id=" + gc +"&lang_code=en&fmt=json&tab=pxpverbose")
data = json.loads(response.read())
At this point, data is the parsed JSON of your web page.
Excel can read csv files, so easiest route would be exporting the data you want into a CSV file using this library.
This should be enough to get you started. Modify fieldnames to include specific event details in the columns of the csv file.
import csv
with open('my_games.csv', 'w') as csvfile:
fieldnames = ['event', 'id']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames,
extrasaction='ignore')
writer.writeheader()
for event in data['GC']['Pxpverbose']:
writer.writerow(event)
So I am quite the beginner in Python, but what I'm trying to do is to download each CSV file for the NYSE. In an excel file I have every symbol. The Yahoo API allows you to download the CSV file by adding the symbol to the base url.
My first instinct was to use pandas, but pandas doesn't store strings.
So what I have
import urllib
strs = ["" for x in range(3297)]
#strs makes the blank string array for each symbol
#somehow I need to be able to iterate each symbol into the blank array spots
while y < 3297:
strs[y] = "symbol of each company from csv"
y = y+1
#loop for downloading each file from the link with strs[y].
while i < 3297:
N = urllib.URLopener()
N.retrieve('http://ichart.yahoo.com/table.csv?s='strs[y],'File')
i = i+1
Perhaps the solution is simpler than what I am doing.
From what I can see in this question you can't see how to connect your list of stock symbols to how you read the data in Pandas. e.g. 'http://ichart.yahoo.com/table.csv?s='strs[y] is not valid syntax.
Valid syntax for this is
pd.read_csv('http://ichart.yahoo.com/table.csv?s={}'.format(strs[y]))
It would be helpful if you could add a few sample lines from your csv file to the question. Guessing at your structure you would do something like:
import pandas as pd
symbol_df = pd.read_csv('path_to_csv_file')
for stock_symbol in symbol_df.symbol_column_name:
df = pd.read_csv('http://ichart.yahoo.com/table.csv?s={}'.format(stock_symbol))
# process your dataframe here
Assuming you take that Excel file w/ the symbols and output as a CSV, you can use Python's built-in CSV reader to parse it:
import csv
base_url = 'http://ichart.yahoo.com/table.csv?s={}'
reader = csv.reader(open('symbols.csv'))
for row in reader:
symbol = row[0]
data_csv = urllib.urlopen(base_url.format(symbol)).read()
# save out to file, or parse with CSV library...
I have a two-column csv which I have uploaded via an HTML page to be operated on by a python cgi script. Looking at the file on the server side, it looks to be a long string i.e for a file called test.csv with the contents.
col1, col2
x,y
has become
('upfile', 'test.csv', 'col1,col2'\t\r\nx,y')
Col1 contains the data I want to operate on (i.e. x) and col 2 contains its identifier (y). Is there a better way of doing the uploading or do I need to manually extract the fields I want - this seems potentially very error-prone
thanks
If you're using the cgi module in python, you should be able to do something like:
form = cgi.FieldStorage()
thefile = form['upfile']
reader = csv.reader(thefile.file)
header = reader.next() # list of column names
for row in reader:
# row is a list of fields
process_row(row)
See, for example, cgi programming or the python cgi module docs.
Can't you use the csv module to parse this? It certantly better than rolling your own.
Something along the lines of
import csv
import cgi
form = cgi.FieldStorage()
thefile = form['upfile']
reader = csv.reader(thefile, delimiter=',')
for row in reader:
for field in row:
doThing()
EDIT: Correcting my answer from the ars answer posted below.
Looks like your file is becoming modified by the HTML upload. Is there anything stopping you from just ftp'ing in and dropping the csv file where you need it?
Once the CSV file is more proper, here is a quick function that will put it into a 2D array:
def genTableFrCsv(incsv):
table = []
fin = open(incsv, 'rb')
reader = csv.reader(fin)
for row in reader:
table.append(row)
fin.close()
return table
From here you can then operate on the whole list in memory rather than pulling bit by bit from the file as in Vitor's solution.
The easy solution is rows = [row.split('\t') for r in csv_string.split('\r\n')]. It's only error proned if you have users from different platforms submit data. They might submit comas or tabs and their line breaks could be \n, \r\n, \r, or ^M. The easiest solution is to use regular expressions. Book mark this page if you don't know regular expressions:
http://regexlib.com/CheatSheet.aspx
And here's the solution:
import re
csv_string = 'col1,col2'\t\r\nx,y' #obviously your csv opening code goes here
rows = re.findall(r'(.*?)[\t,](.*?)',csv_string)
rows = rows[1:] # remove header
Rows is now a list of tuples for all of the rows.