Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.
I am reading csv files into python using:
df = pd.read_csv(r"C:\csvfile.csv")
But the file has some summary data, and the raw data start if a value "valx" is found. If "valx" is not found then the file is useless. I would like to create news dataframes that start when "valx" is found. I have been trying for a while with no success. Any help on how to achieve this is greatly appreciated.
Unfortunately, pandas only accepts skiprows for rows to skip in the beginning. You might want to parse the file before creating the dataframe.
As an example:
import csv
with open(r"C:\csvfile.csv","r") as f:
lines = csv.reader(f, newline = '')
if any('valx' in i for i in lines):
data = lines
Using the Standard Libary csv module, you can read file and check if valx is in the file, if it is found, the content will be returned in the data variable.
From there you can use the data variable to create your dataframe.
I want to convert a stream of JSONs (nearly 10,000) pasted in a file to a CSV file with a particular format for headers and values.
I have the following streams of JSON data :
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
{"shortUrlClicks":"594","longUrlClicks":"594","countries":[{"count":"125","id":"IQ"},{"count":"94","id":"US"},{"count":"56","id":"TR"},{"count":"50","id":"SA"},{"count":"29","id":"DE"},{"count":"24","id":"TN"},{"count":"20","id":"DZ"},{"count":"14","id":"EG"},{"count":"13","id":"MA"},{"count":"12","id":"PS"}],"browsers":[{"count":"350","id":"Chrome"},{"count":"100","id":"Firefox"},{"count":"46","id":"Safari"},{"count":"35","id":"Mobile"},{"count":"20","id":"Mobile Safari"},{"count":"20","id":"SamsungBrowser"},{"count":"8","id":"MSIE"},{"count":"6","id":"Opera"},{"count":"3","id":"OS;FBSV"},{"count":"2","id":"Maxthon"}],"platforms":[{"count":"227","id":"Android"},{"count":"221","id":"Windows"},{"count":"67","id":"iPhone"},{"count":"30","id":"X11"},{"count":"25","id":"Macintosh"},{"count":"8","id":"iPad"},{"count":"2","id":"Android 4.2.2"},{"count":"1","id":"Android 4.1.2"},{"count":"1","id":"Android 4.3"},{"count":"1","id":"Android 5.0.1"}],"referrers":[{"count":"340","id":"unknown"},{"count":"193","id":"t.co"},{"count":"38","id":"m.facebook.com"},{"count":"12","id":"addpost.it"},{"count":"4","id":"plus.google.com"},{"count":"3","id":"www.facebook.com"},{"count":"1","id":"goo.gl"},{"count":"1","id":"l.facebook.com"},{"count":"1","id":"lm.facebook.com"},{"count":"1","id":"plus.url.google.com"}]}
... and so on.
I want to convert it into this form in CSV with whatever the headers (shortUrlclicks, longUrclicks, etc.) are:
I would be thankful to if you could please help me in the same. Any code in python or any other language would be useful.
You can use JSON library from the Python Standard Library to read the JSON and read/write files using the OS Library (from Python Standard Library too).
It would be something like this:
f = File.open('file.json', 'r')
items = json.loads(f.read())
csv_file = ""
for row in items():
new_row = ""
# get columns somehow
for column in columns:
new_row = "%s," % column
# Finished row, append a '\n' char to the row string.
csv_file.append("%s\n" % new_row)
# write json file
out = File.open('out.csv', 'rw')
out.write(csv_file)
out.close()
PS: I didn't run this code before posting. This is something to you to get an idea.
You can use Pandas to do this for you.
read your JSON file like this
df = pandas.read_json('filename.json')
write to csv
df.to_csv('filename.csv', index=False) # set index false if u don't need it
Example:
http://hayd.github.io/2013/pandas-json
REF:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html
I have CSV Files, Text Files and Word Files that have tons of bibliography that I want to export to .bib file, preferably with a custom order defined by me.
What I want to know is, what's the best way to convert this information using Python, to a .bib file?
Is it converting Text to XML and maybe then to .bib? Or loading a CSV file and iterate through columns and write column by column?
I've worked a script for loading a CSV and take each column as a list, but to convert to .bib I can't solve how.
Here's the code I've pulled together:
import csv
import pandas
import bibtexparser
from bibtexparser.bparser import BibTexParser
from bibtexparser.customization import *
colnames = ['AUTHORS', 'TITLE', 'EDITOR']
data = pandas.read_csv('file.csv', names=colnames, delimiter =r";", encoding='latin1')
list1 = list(data.AUTHORS)
def customs(record):
record = type(record)
record = author(record)
record = editor(record)
return record
with open('123.bib','w') as bibtex_file:
parser = BibTexParser()
parser.customization = customs
I have a two-column csv which I have uploaded via an HTML page to be operated on by a python cgi script. Looking at the file on the server side, it looks to be a long string i.e for a file called test.csv with the contents.
col1, col2
x,y
has become
('upfile', 'test.csv', 'col1,col2'\t\r\nx,y')
Col1 contains the data I want to operate on (i.e. x) and col 2 contains its identifier (y). Is there a better way of doing the uploading or do I need to manually extract the fields I want - this seems potentially very error-prone
thanks
If you're using the cgi module in python, you should be able to do something like:
form = cgi.FieldStorage()
thefile = form['upfile']
reader = csv.reader(thefile.file)
header = reader.next() # list of column names
for row in reader:
# row is a list of fields
process_row(row)
See, for example, cgi programming or the python cgi module docs.
Can't you use the csv module to parse this? It certantly better than rolling your own.
Something along the lines of
import csv
import cgi
form = cgi.FieldStorage()
thefile = form['upfile']
reader = csv.reader(thefile, delimiter=',')
for row in reader:
for field in row:
doThing()
EDIT: Correcting my answer from the ars answer posted below.
Looks like your file is becoming modified by the HTML upload. Is there anything stopping you from just ftp'ing in and dropping the csv file where you need it?
Once the CSV file is more proper, here is a quick function that will put it into a 2D array:
def genTableFrCsv(incsv):
table = []
fin = open(incsv, 'rb')
reader = csv.reader(fin)
for row in reader:
table.append(row)
fin.close()
return table
From here you can then operate on the whole list in memory rather than pulling bit by bit from the file as in Vitor's solution.
The easy solution is rows = [row.split('\t') for r in csv_string.split('\r\n')]. It's only error proned if you have users from different platforms submit data. They might submit comas or tabs and their line breaks could be \n, \r\n, \r, or ^M. The easiest solution is to use regular expressions. Book mark this page if you don't know regular expressions:
http://regexlib.com/CheatSheet.aspx
And here's the solution:
import re
csv_string = 'col1,col2'\t\r\nx,y' #obviously your csv opening code goes here
rows = re.findall(r'(.*?)[\t,](.*?)',csv_string)
rows = rows[1:] # remove header
Rows is now a list of tuples for all of the rows.