I have over 200 scraped files in json format and I want to analyse them. I can open them individually, but would like to loop through to save time as I will be doing this a lot.
Can open each file but want to be able to do a loop in some format
e.g.
with codecs.open('c:\\project\\input*.json','r','utf-8') as f:
where '*' is a number.....
import codecs, json, csv, re
#read a json file downloaded with twitterscraper
with codecs.open('c:\\project\\input1.json','r','utf-8') as f:
tweets = json.load(f,encoding='utf-b')
Just put your files into a folder and then loop through the files in the folder like so.
import codecs
import json
import csv
import re
import os
files = []
for file in os.listdir("/mydir"):
if file.endswith(".json"):
files.append(os.path.join("/mydir", file))
for file in files:
with codecs.open(file,'r','utf-8') as f:
tweets = json.load(f,encoding='utf-b')
Add, and use, glob to iterate over files with certain file pattern.
import glob
import codecs
import json
# ... more packages here
for file in glob.glob('c:\\project\\input*.json'):
with codecs.open(file, 'r','utf-8') as f:
tweets = json.load(f, encoding='utf-b')
#... whatever you do next with `tweets`
BTW: utf-b instead of utf-8?
Related
I have the below code that takes my standardized .txt file and converts it into a JSON file perfectly. The only problem is that sometimes I have over 300 files and doing this manually (i.e. changing the number at the end of the file and running the script is too much and takes too long. I want to automate this. The files as you can see reside in one folder/directory and I am placing the JSON file in a differentfolder/directory, but essentially keeping the naming convention standardized except instead of ending with .txt it ends with .json but the prefix or file names are the same and standardized. An example would be: CRAZY_CAT_FINAL1.TXT, CRAZY_CAT_FINAL2.TXT and so on and so forth all the way to file 300. How can I automate and keep the file naming convention in place, and read and output the files to different folders/directories? I have tried, but can't seem to get this to iterate. Any help would be greatly appreciated.
import glob
import time
from glob import glob
import pandas as pd
import numpy as np
import csv
import json
csvfile = open(r'C:\Users\...\...\...\Dog\CRAZY_CAT_FINAL1.txt', 'r')
jsonfile = open(r'C:\Users\...\...\...\Rat\CRAZY_CAT_FINAL1.json', 'w')
reader = csv.DictReader(csvfile)
out = json.dumps([row for row in reader])
jsonfile.write(out)
****************************************************************************
I also have this code using the python library "requests". How do I make this code so that it uploads multiple json files with a standard naming convention? The files end with a number...
import requests
#function to post to api
def postData(xactData):
url = 'http link'
headers = {
'Content-Type': 'application/json',
'Content-Length': str(len(xactData)),
'Request-Timeout': '60000'
}
return requests.post(url, headers=headers, data=xactData)
#read data
f = (r'filepath/file/file.json', 'r')
data = f.read()
print(data)
# post data
result = postData(data)
print(result)
Use f-strings?
for i in range(1,301):
csvfile = open(f'C:\Users\...\...\...\Dog\CRAZY_CAT_FINAL{i}.txt', 'r')
jsonfile = open(f'C:\Users\...\...\...\Rat\CRAZY_CAT_FINAL{i}.json', 'w')
import time
from glob import glob
import csv
import json
import os
INPATH r'C:\Users\...\...\...\Dog'
OUTPATH = r'C:\Users\...\...\...\Rat'
for csvname in glob(INPATH+'\*.txt'):
jsonname = OUTPATH + '/' + os.basename(csvname[:-3] + 'json')
reader = csv.DictReader(open(csvname,'r'))
json.dump( list(reader), open(jsonname,'w') )
I am new to python and I am using following code to pull output as sentiment analysis:
import json
from watson_developer_cloud import ToneAnalyzerV3Beta
import urllib.request
import codecs
import csv
import os
import re
import sys
import collections
import glob
ipath = 'C:/TEMP/' # input folder
opath = 'C:/TEMP/matrix/' # output folder
reader = codecs.getreader("utf-8")
tone_analyzer = ToneAnalyzerV3Beta(
url='https://gateway.watsonplatform.net/tone-analyzer/api',
username='ABCID',
password='ABCPASS',
version='2016-02-11')
path = 'C:/TEMP/*.txt'
file = glob.glob(path)
text = file.read()
data=tone_analyzer.tone(text='text')
for cat in data['document_tone']['tone_categories']:
print('Category:', cat['category_name'])
for tone in cat['tones']:
print('-', tone['tone_name'],tone['score'])
#create file
In the above code all I am trying to do is to read the file and do sentiment analysis all the text file stored in C:/TEMP folder but I keep getting and error :'list' object has no attribute 'read'
Not sure where I am going wrong and I would really appreciate any help with this one. Also, is there a way i can write the output to a CSV file so if I am reading the file
ABC.txt and I create a output CSV file called ABC.csv with output values.
Thank You
glob returns a list of files, you need to iterate over the list, open each file and then call .read on the file object:
files = glob.glob(path)
# iterate over the list getting each file
for fle in files:
# open the file and then call .read() to get the text
with open(fle) as f:
text = f.read()
Not sure what it is exactly you want to write but the csv lib will do it:
from csv import writer
files = glob.glob(path)
# iterate over the list getting each file
for fle in files:
# open the file and then call .read() to get the text
with open(fle) as f, open("{}.csv".format(fle.rsplit(".", 1)[1]),"w") as out:
text = f.read()
wr = writer(out)
data = tone_analyzer.tone(text='text')
wr.writerow(["some", "column" ,"names"]) # write the col names
Then call writerow passing a list of whatever you want to write for each row.
I've been trying to make my python code to fill a form in word with data that i scraped off the Internet. I wrote the data in a txt file and are now trying to fill the word file with this code:
import zipfile
import os
import tempfile
import shutil
import codecs
def getXml(docxFilename,ReplaceText):
zip = zipfile.ZipFile(open(docxFilename,"rb"))
xmlString= zip.read("word/document.xml")
for key in ReplaceText.keys():
xmlString = xmlString.replace(str(key), str(ReplaceText.get(key)))
return xmlString
def createNewDocx(originalDocx,xmlString,newFilename):
tmpDir = tempfile.mkdtemp()
zip = zipfile.ZipFile(open(originalDocx,"rb"))
zip.extractall(tmpDir)
#3tmpDir=tmpDir.decode("utf-8")
with open(os.path.join(tmpDir,"word/document.xml"),"w") as f:
f.write(xmlString)
filenames = zip.namelist()
zipCopyFilename = newFilename
with zipfile.ZipFile(zipCopyFilename,"w") as docx:
for filename in filenames:
docx.write(os.path.join(tmpDir,filename),filename)
shutil.rmtree(tmpDir)
f=open('test.txt', 'r',)
text=f.read().split("\n")
print text[1]
Pavarde = text[1]
Replace = {"PAVARDE1":Pavarde}
createNewDocx("test.docx",getXml("test.docx",Replace),"test2.docx")
The file is created but I cant open it.
I get the following error:
Illegal xlm character
My guess would be that theres something with the encoding but I cant find a solution.
I would like to automate the download of CSV files from the World Bank's dataset.
My problem is that the URL corresponding to a specific dataset does not lead directly to the desired CSV file but is instead a query to the World Bank's API. As an example, this is the URL to get the GDP per capita data: http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv.
If you paste this URL in your browser, it will automatically start the download of the corresponding file. As a consequence, the code I usually use to collect and save CSV files in Python is not working in the present situation:
baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen("%s" %(baseUrl))
myData = csv.reader(remoteCSV)
How should I modify my code in order to download the file coming from the query to the API?
This will get the zip downloaded, open it and get you a csv object with whatever file you want.
import urllib2
import StringIO
from zipfile import ZipFile
import csv
baseUrl = "http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv"
remoteCSV = urllib2.urlopen(baseUrl)
sio = StringIO.StringIO()
sio.write(remoteCSV.read())
# We create a StringIO object so that we can work on the results of the request (a string) as though it is a file.
z = ZipFile(sio, 'r')
# We now create a ZipFile object pointed to by 'z' and we can do a few things here:
print z.namelist()
# A list with the names of all the files in the zip you just downloaded
# We can use z.namelist()[1] to refer to 'ny.gdp.pcap.cd_Indicator_en_csv_v2.csv'
with z.open(z.namelist()[1]) as f:
# Opens the 2nd file in the zip
csvr = csv.reader(f)
for row in csvr:
print row
For more information see ZipFile Docs and StringIO Docs
import os
import urllib
import zipfile
from StringIO import StringIO
package = StringIO(urllib.urlopen("http://api.worldbank.org/v2/en/indicator/ny.gdp.pcap.cd?downloadformat=csv").read())
zip = zipfile.ZipFile(package, 'r')
pwd = os.path.abspath(os.curdir)
for filename in zip.namelist():
csv = os.path.join(pwd, filename)
with open(csv, 'w') as fp:
fp.write(zip.read(filename))
print filename, 'downloaded successfully'
From here you can use your approach to handle CSV files.
We have a script to automate access and data extraction for World Bank World Development Indicators like: https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS
The script does the following:
Downloading the metadata data
Extracting metadata and data
Converting to a Data Package
The script is python based and uses python 3.0. It has no dependencies outside of the standard library. Try it:
python scripts/get.py
python scripts/get.py https://data.worldbank.org/indicator/GC.DOD.TOTL.GD.ZS
You also can read our analysis about data from World Bank:
https://datahub.io/awesome/world-bank
Just a suggestion than a solution. You can use pd.read_csv to read any csv file directly from a URL.
import pandas as pd
data = pd.read_csv('http://url_to_the_csv_file')
This problem may be tricky.
I want to create a csv file from a list in Python. This csv file does not exist before. And then export it to some local directory. There is no such file in the local directory either. We just create a new csv file, and export (put) the csv file in some local directory.
I found that StringIO.StringIO can generate the csv file from a list in Python, then what are the next steps.
Thank you.
And I found the following code can do it:
import os
import os.path
import StringIO
import csv
dir = r"C:\Python27"
if not os.path.exists(dir):
os.mkdir(dir)
my_list=[[1,2,3],[4,5,6]]
with open(os.path.join(dir, "filename"+'.csv'), "w") as f:
csvfile=StringIO.StringIO()
csvwriter=csv.writer(csvfile)
for l in my_list:
csvwriter.writerow(l)
for a in csvfile.getvalue():
f.writelines(a)
Did you read the docs?
https://docs.python.org/2/library/csv.html
Lots of examples on that page of how to read / write CSV files.
One of them:
import csv
with open('some.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(someiterable)
import csv
with open('/path/to/location', 'wb') as f:
writer = csv.writer(f)
writer.writerows(youriterable)
https://docs.python.org/2/library/csv.html#examples