Read and process a text file and save to csv - python

The files I have seem to be in a "dict" format...
file header is as follows: time,open,high,low,close,volume
next line is as follows:
{"t":[1494257340],"o":[206.7],"h":[209.3],"l":[204.50002],"c":[204.90001],"v":[49700650]}`
import csv
with open ('test_data.txt', 'rb') as f:
for line in f:
dict_file = eval(f.read())
time = (dict_file['t']) # print (time) result [1494257340]
open_price = (dict_file['o']) # print (open_price) result [206.7]
high = (dict_file['h']) # print (high) result [209.3]
low = (dict_file['l']) # print (low) result [204.50002]
close = (dict_file['c']) # print (close) result [204.90001]
volume = (dict_file['v']) # print (volume) result [49700650]
print (time, open_price, high, low, close, value)
# print result [1494257340] [206.7] [209.3] [204.50002] [204.90001] [49700650]
# I need to remove the [] from the output.
# expected result
# 1494257340, 206.7, 209.3, 204.50002, 204.90001, 49700650
the result I need is (change time ("epoch date format") to dd,mm,yy
5/8/17, 206.7, 209.3, 204.50002, 204.90001, 49700650
so I know I need the csv.writer function

I see a number of problems in the code you submitted. I recommend you to break your task into small pieces and see if you can make them work individually. So what are you trying to do is:
open a file
read the file line by line
eval each line to get a dict object
get values from that object
write those values in a (separate?) csv file
Right?
Now do each one, one small step at the time
opening a file.
You're pretty much on point there:
with open('test_data.txt', 'rb') as f:
print(f.read())
# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'
You can open the file in r mode instead, it will give you strings instead of byte type objects
with open('test_data.txt', 'r') as f:
print(f.read())
# {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
It might cause some problems but should work since eval can handle it just fine (at least in python 3)
read the file line by line
with open('test_data.txt', 'rb') as f:
for line in f:
print(line)
# b'{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}\n'
Here is another problem in your code, you're not using line variable and trying to f.read() instead. This will just read entire file (starting from the second line, since the first one is been read already). Try to swap one for another and see what happens
eval each line to get a dict object
Again. This works fine. but I would add some protection here. What if you get an empty line in the file or a misformatted one. Also if this file comes from an untrusted source you may become a victim of a code injection here, like if a line in your file changed to:
print("You've been hacked") or {"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
with open('test_data.txt', 'rb') as f:
for line in f:
dict_file = eval(line)
print(dict_file)
# You've been hacked
# {'t': [1494257340], 'o': [207.75], 'h': [209.8], 'l': [205.75], 'c': [206.35], 'v': [61035956]}
I don't know your exact specifications, but you should be safer with json.loads instead.
...
Can you continue on your own from there?
get values from the object
I think dict_file['t'] doesn't give you the value you expect.
What does it give you?
Why?
How to fix it?
write those values in a csv file
Can you write some random string to a file?
What scv format looks like? Can you format your values to match it
Check the docs for csv module, can it be of help to you?
And so on and so forth...
EDIT: Solution
# you can save the print output in a file by running:
# $ python convert_to_csv.py > output.cvs
import datetime, decimal, json, os
CSV_HEADER = 'time,open,high,low,close,volume'
with open('test_data.txt', 'rb') as f:
print(CSV_HEADER)
for line in f:
data = json.loads(line, parse_float=decimal.Decimal)
data['t'][0] = datetime.datetime.fromtimestamp(data['t'][0]) \
.strftime('%#d/%#m/%y' if os.name == 'nt' else '%-d/%-m/%y')
print(','.join(str(data[k][0]) for k in 'tohlcv'))
Running:
$ cat test_data.txt
{"t":[1494257340],"o":[207.75],"h":[209.8],"l":[205.75],"c":[206.35],"v":[61035956]}
{"t":[1490123123],"o":[107.75],"h":[109.8],"l":[105.75],"c":[106.35],"v":[11035956]}
{"t":[1491234234],"o":[307.75],"h":[309.8],"l":[305.75],"c":[306.35],"v":[31035956]}
$ python convert_to_csv.py
time,open,high,low,close,volume
8/5/17,207.75,209.8,205.75,206.35,61035956
21/3/17,107.75,109.8,105.75,106.35,11035956
3/4/17,307.75,309.8,305.75,306.35,31035956

Related

file.write can't save int

I am building a save system for a game im making, im trying to save all of the resources you get in the game so you can load into it the next time you play. I was going to use file.write as I saw it being used in other types of games, but it cant save the variables as ints. Is there any sort of workaround or just a different sort of saving that I could use to be able to do this?
from Resources import *
def start_new():
Q = int(input('which save file do you want to save to? 1, 2, or 3.'))
if Q == 1:
file = open("save1.txt", "w")
file.write(Manpower)
file.write(Food)
file.write(Food_Use)
file.write(Wood)
file.write(Farmers)
file.write(Food_Income)
file.write(FarmNum)
file.write(MaxFarmer)
file.write(Deforestation)
file.write(Trees)
file.write(Tree_Spread)
file = open("save1.txt", "r")
Convert integer values to string. You can do this using several method. Lets write Manpower(Assuming type of this variable is int) to file to be an example:
A small advice, there is no need to call file.close() when using with statement. The with statement itself ensures proper acquisition and release of resources.
with open("save1.txt", 'w') as f:
f.write(str(Manpower))
or even better:
with open("save1.txt", 'w') as f:
f.write(f"{Manpower}\n")
\n is EOL(End Of Line) character. After EOL character, new writes will be in next line. You can use it to separate and identify values while reading them again.
On you code you need to close the file after make changes...
file.close()
Your code:
from Resources import *
def start_new():
Q = int(input('which save file do you want to save to? 1, 2, or 3.'))
if Q == 1:
file = open("save1.txt", "w")
file.write(Manpower)
file.write(Food)
file.write(Food_Use)
file.write(Wood)
file.write(Farmers)
file.write(Food_Income)
file.write(FarmNum)
file.write(MaxFarmer)
file.write(Deforestation)
file.write(Trees)
file.write(Tree_Spread)
file.close()
file = open("save1.txt", "r")
# stuff
file.close()

Subtracting Numbers From A .txt File In Python

I want to be able to open the file i have, and append it so that if i want to subtract the number in the file by 2, it would print out the answer in the console by opening the file and reading it.
e.g. if the number in the file was 156, i would have to subtract it by 2, which is 154, this will then be displayed on the console!
this is all i have so far:
a = file.open("xp.txt", "r")
a.read()
a.close()
How would i update it so that if i wanted to subtract it by an integer, that integer would be displayed on console?
Thanks in advance!
Use readline instead of read so that you won't get an error when the file for example contains another empty line. Then, call strip on the result to eliminate possible whitespace. Finally, use int to convert the string to a number. Now you can do all the math you want with it:
with open("xp.txt", "r") as infile:
value = infile.readline()
stripped = value.strip()
number = int(stripped)
newNumber = number - 2
print(newNumber)
Or shorter:
with open("xp.txt", "r") as infile:
print(int(infile.readline().strip()) - 2)
To write the number to the same file, convert the number back to a string:
with open("xp.txt", "r") as infile:
result = int(infile.readline().strip()) - 2
print(result)
with open("xp.txt" , "w") as outfile:
outfile.write(str(result))
Assuming the file just contained that single value and nothing else, you could accomplish this using the following
with open('xp.txt', 'r') as f_in:
value = int(a.read())
value -= 2
print(f'New value is {value}')
with open('xp.txt', 'w') as f_out:
f_out.write(str(value))
Basically you open the file for reading, read the value into an integer, modify the value and display it, then re-open the file for writing to write the value back out.

Reading CSV file with python

filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(0, os.SEEK_END)
while 1:
time.sleep(1)
where = mycsv.tell()
line = mycsv.readline()
if not line:
mycsv.seek(where)
else:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
I have this Paython code which is reading the values from a csv file every time there is a new line printed in the csv from external program. My problem is that the csv file is periodically completely rewriten and then python stops reading the new lines. My guess is that python is stuck on some line number and the new update can put maybe 50 more or less lines. So for example python is now waiting a new line at line 70 and the new line has come at line 95. I think the solution is to let mycsv.seek(0, os.SEEK_END) been updated but not sure how to do that.
What you want to do is difficult to accomplish without rewinding the file every time to make sure that you are truly on the last line. If you know approximately how many characters there are on each line, then there is a shortcut you could take using mycsv.seek(-end_buf, os.SEEK_END), as outlined in this answer. So your code could work somehow like this:
avg_len = 50 # use an appropriate number here
end_buf = 3 * avg_len / 2
filename = 'NTS.csv'
mycsv = open(filename, 'r')
mycsv.seek(-end_buf, os.SEEK_END)
last = mycsv.readlines()[-1]
while 1:
time.sleep(1)
mycsv.seek(-end_buf, os.SEEK_END)
line = mycsv.readlines()[-1]
if not line == last:
arr_line = line.split(',')
var3 = arr_line[3]
print (var3)
Here, in each iteration of the while loop, you seek to a position close to the end of the file, just far back enough that you know for sure the last line will be contained in what remains. Then you read in all the remaining lines (this will probably include a partial amount of the second or third to last lines) and check if the last line of these is different to what you had before.
You can do a simpler way of reading lines in your program. Instead of trying to use seek in order to get what you need, try using readlines on the file object mycsv.
You can do the following:
mycsv = open('NTS.csv', 'r')
csv_lines = mycsv.readlines()
for line in csv_lines:
arr_line = line.split(',')
var3 = arr_line[3]
print(var3)

Python: Concise / elegant way to reformat a set of text files?

I have written a python script to process a set of ASCII files within a given dir. I wonder if there is a more concise and/or "pythonesque" way to do it, without loosing readability?
Python Code
import os
import fileinput
import glob
import string
indir='./'
outdir='./processed/'
for filename in glob.glob(indir+'*.asc'): # get a list of input ASCII files to be processed
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
lines = iter(fileinput.input([indir+filename])) # iterator over all lines in the input file
fout.write(next(lines)) # just copy the first line (the header) to output
for line in lines:
val=iter(string.split(line,' '))
fout.write('{0:6.2f}'.format(float(val.next()))), # first value in the line has it's own format
for x in val: # iterate over the rest of the numbers in the line
fout.write('{0:10.6f}'.format(float(val.next()))), # the rest of the values in the line has a different format
fout.write('\n')
fin.close()
fout.close()
An example:
Input:
;;; This line is the header line
-5.0 1.090074154029272 1.0034662411357929 0.87336062116561186 0.78649408279093869 0.65599958665017222 0.4379879132749317 0.26310799350679176 0.087808018565486673
-4.9900000000000002 1.0890770415316042 1.0025480136545413 0.87256100700428996 0.78577373527626004 0.65539842673645277 0.43758616966566649 0.26286647978335914 0.087727357602906453
-4.9800000000000004 1.0880820021223023 1.0016316956763136 0.87176305623792771 0.78505488659611744 0.65479851808106115 0.43718526271594083 0.26262546925502467 0.087646864773454014
-4.9700000000000006 1.0870890372077564 1.0007172884938402 0.87096676998908273 0.78433753775986659 0.65419986152386733 0.4367851929843618 0.26238496225635727 0.087566540188423345
-4.9600000000000009 1.086098148170821 0.99980479337809591 0.87017214936140763 0.78362168975984026 0.65360245789061966 0.4363859610200459 0.26214495911617541 0.087486383957276398
Processed:
;;; This line is the header line
-5.00 1.003466 0.786494 0.437988 0.087808
-4.99 1.002548 0.785774 0.437586 0.087727
-4.98 1.001632 0.785055 0.437185 0.087647
-4.97 1.000717 0.784338 0.436785 0.087567
-4.96 0.999805 0.783622 0.436386 0.087486
Other than a few minor changes, due to how Python has changed through time, this looks fine.
You're mixing two different styles of next(); the old way was it.next() and the new is next(it). You should use the string method split() instead of going through the string module (that module is there mostly for backwards compatibility to Python 1.x). There's no need to use go through the almost useless "fileinput" module, since open file handle are also iterators (that module comes from a time before Python's file handles were iterators.)
Edit: As #codeape pointed out, glob() returns the full path. Your code would not have worked if indir was something other than "./". I've changed the following to use the correct listdir/os.path.join solution. I'm also more familiar with the "%" string interpolation than string formatting.
Here's how I would write this in more idiomatic modern Python
def reformat(fin, fout):
fout.write(next(fin)) # just copy the first line (the header) to output
for line in fin:
fields = line.split(' ')
# Make a format header specific to the number of fields
fmt = '%6.2f' + ('%10.6f' * (len(fields)-1)) + '\n'
fout.write(fmt % tuple(map(float, fields)))
basenames = os.listdir(indir) # get a list of input ASCII files to be processed
for basename in basenames:
input_filename = os.path.join(indir, basename)
output_filename = os.path.join(outdir, basename)
with open(input_filename, 'r') as fin, open(output_filename, 'w') as fout:
reformat(fin, fout)
The Zen of Python is "There should be one-- and preferably only one --obvious way to do it". It's interesting how you functions which, during the last 10+ years, was "obviously" the right solution, but are no longer. :)
fin=open(indir+filename,'r') # input file
fout=open(outdir+filename,'w') # out: processed file
#code
fin.close()
fout.close()
can be written as:
with open(indir+filename,'r') as fin, open(outdir+filename,'w') as fout:
#code
In python 2.6, you can use:
with open(indir+filename,'r') as fin:
with open(outdir+filename,'w') as fout:
#code
And the line
lines = iter(fileinput.input([indir+filename]))
is useless. You can just iterate over an open file(fin in your case)
You can also do line.split(' ') instead of string.split(line, ' ')
If you change those things, there is no need to import string and fileinput.
Edit: I didn't know you can use inline code. That's cool
In my build script, I have this code:
inFile = open(sourceFile,'r')
outFile = open(targetFile,'w')
for line in inFile:
line = doKeywordSubstitution(line)
outFile.write(line)
inFile.close()
outFile.close()
I don't know of a way to make this any more concise. Putting the line-changing logic in a different function looks neater to me though.
I may be missing the point of your code, but I don't understand why you have lines = iter(fileinput.input([indir+filename])).
I don't understand why do you use: string.split(line, ' ') instead of just line.split(' ').
Well maybe I would write the string-processing part like this:
values = line.split(' ')
values[0] = '{0:6.2f}'.format(float(values[0]))
values[1:] = ['{0:10.6f}'.format(float(v)) for v in values[1:]]
fout.write(' '.join(values))
At least for me this looks better but this might be subjective :)
Instead of indir I would use os.curdir. Instead of "./processed" I would do: os.path.join(os.curdir, 'processed').

Read a multielement list, look for an element and print it out in python

I am writing a python script in order to write a tex file. But I had to use some information from another file. Such file has names of menus in each line that I need to use. I use split to have a list for each line of my "menu".
For example, I had to write a section with the each second element of my lists but after running, I got anything, what could I do?
This is roughly what I am doing:
texfile = open(outputtex.tex', 'w')
infile = open(txtfile.txt, 'r')
for line in infile.readlines():
linesplit = line.split('^')
for i in range(1,len(infile.readlines())):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
By the way, in the inclugraphics line, I had to increace the number after pg_ from "0001" to "25050". Any clues??
I really appreciate your help.
I don't quite follow your question. But I see several errors in your code. Most importantly:
for line in infile.readlines():
...
...
for i in range(1,len(infile.readlines())):
Once you read a file, it's gone. (You can get it back, but in this case there's no point.) That means that the second call to readlines is yielding nothing, so len(infile.readlines()) == 0. Assuming what you've written here really is what you want to do (i.e. write file_len * (file_len - 1) + 1 lines?) then perhaps you should save the file to a list. Also, you didn't put quotes around your filenames, and your indentation is strange. Try this:
with open('txtfile.txt', 'r') as infile: # (with automatically closes infile)
in_lines = infile.readlines()
in_len = len(in_lines)
texfile = open('outputtex.tex', 'w')
for line in in_lines:
linesplit = line.split('^')
for i in range(1, in_len):
texfile.write('\section{}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' %i)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
Perhaps you don't actually want nested loops?
infile = open('txtfile.txt', 'r')
texfile = open('outputtex.tex', 'w')
for line_number, line in enumerate(infile):
linesplit = line.split('^')
texfile.write('\section{{{0}}}\n'.format(linesplit[1]))
texfile.write('\\begin{figure*}[h!]\n')
texfile.write('\centering\n')
texfile.write('\includegraphics[scale=0.95]{pg_000%i.pdf}\n' % line_number)
texfile.write('\end{figure*}\n')
texfile.write('\\newpage\n')
texfile.write('\end{document}')
texfile.close()
infile.close()

Categories

Resources