I need to load data from a .txt file, but I cannot figure out how to refer to the rows and columns that I want.
I have normally used code such as follows:
a = []
b = []
for line in file:
if line[0] != 'x':
False
else:
fields = (line.strip()).split('\t')
a.append(fields[0])
b.append(fields[1])
My issue is that the lines with the data I want do not all start with the same character like other files I have opened. The first line of data I want begins with a float (0.0) and goes up to 5300.0. This is column a. It is separated by a tab from the second column I need, b.
I'm unable to comment, so I apologize, can you post the contents of the file and explain what you need reached further?
In order to load the data from a .txt file, you can use the file handling
f = open('file.txt','r')
data1 = f.read()
data2 = f.readlines()
data3 = f.readline()
f.close()
Explanation
data1 would have all the data as it is from the txt file and is a str type
data2 would have all the lines in a list type ['line1','line2','line3'...]
data3 would read just the first line and the output it str type. You can use read(2) to read first 2 lines as well.
If you're looking for a more complex output, please post an expected output with the contents of the file - and I'll assist you with the writing the code
Related
I am looking for a simple way to split lines in python from a .txt file and then just read out the names and compare them to another file.
I've had a code that split the lines successfully, but I couldn't find a way to read out just the names, unfortunately the code that split it successfully was lost.
this is what the .txt file looks like.
Id;Name;Job;
1;James;IT;
2;Adam;Director;
3;Clare;Assisiant;
example if the code I currently have (doesn't output anything)
my_file = open("HP_liki.txt","r")
flag = index = 0
x1=""
for line in my_file:
line.strip().split('\n')
index+=1
content = my_file.read()
list=[]
lines_to_read = [index-1]
for position, line1 in enumerate(x1):
if position in lines_to_read:
list=line1
x1=list.split(";")
print(x1[1])
I need a solution that doesn't import pandas or csv.
The first part of your code confuses me as to your purpose.
for line in my_file:
line.strip().split('\n')
index+=1
content = my_file.read()
Your for loop iterates through the file and strips each line. Then it splits on a newline, which cannot exist at this point. The for already iterates by lines, so there is no newline in any line in this loop.
In addition, once you've stripped the line, you ignore the result, increment index, and leave the loop. As a result, all this loop accomplishes is to count the lines in the file.
The line after the loop reads from a file that has no more data, so it will simply handle the EOF exception and return nothing.
If you want the names from the file, then use the built-in file read to iterate through the file, split each line, and extract the second field:
name_list = [line.split(';')[1]
for line in open("HP_liki.txt","r") ]
name_list also includes the header "Name", which you can easily delete.
Does that handle your problem?
So without using any external library you can use simple file io and then generalize according to your need.
readfile.py
file = open('datafile.txt','r')
for line in file:
line_split = line.split(';')
if (line_split[0].isdigit()):
print(line_split[1])
file.close()
datafile.txt
Id;Name;Job;
1;James;IT;
2;Adam;Director;
3;Clare;Assisiant;
If you run this you'll have output
James
Adam
Clare
You can change the if condition according to your need
I have my dataf.txt file:
Id;Name;Job;
1;James;IT;
2;Adam;Director;
3;Clare;Assisiant;
I have written this to extract information:
with open('dataf.txt','r') as fl:
data = fl.readlines()
a = [i.replace('\n','').split(';')[:-1] for i in data]
print(a[1:])
Outputs:
[['1', 'James', 'IT'], ['2', 'Adam', 'Director'], ['3', 'Clare', 'Assisiant']]
I'm trying to read a plaintext file line by line, cherry pick lines that begin with a pattern of any six digits. Pass those to a list and then write that list row by row to a .csv file.
Here's an example of a line I'm trying to match in the file:
**000003** ANW2248_08_DESOLATE-WASTELAND-3. A9 C 00:55:25:17 00:55:47:12 10:00:00:00 10:00:21:20
And here is a link to two images, one showing the above line in context of the rest of the file and the expected result: https://imgur.com/a/XHjt9e1
import csv
identifier = re.compile(r'^(\d\d\d\d\d\d)')
matched_line = []
with open('file.edl', 'r') as file:
reader = csv.reader(file)
for line in reader:
line = str(line)
if identifier.search(line) == True:
matched_line.append(line)
else: continue
with open('file.csv', 'w') as outputEDL:
print('Copying EDL contents into .csv file for reformatting...')
outputEDL.write(str(matched_line))
Expected result would be the reader gets to a line, searches using the regex, then if the result of the search finds the series of 6 numbers at the beginning, it appends that entire line to the matched_line list.
What I'm actually getting is, once I write what reader has read to a .csv file, it has only picked out [], so the regex search obviously isn't functioning properly in the way I've written this code. Any tips on how to better form it to achieve what I'm trying to do would be greatly appreciated.
Thank you.
Some more examples of expected input/output would better help with solving this problem but from what I can see you are trying to write each line within a text file that contains a timestamp to a csv. In that case here is some psuedo code that might help you solve your problem as well as a separate regex match function to make your code more readable
import re
def match_time(line):
pattern = re.compile(r'(?:\d+[:]\d+[:]\d+[:]\d+)+')
result = pattern.findall(line)
return " ".join(result)
This will return a string of the entire timecode if a match is found
lines = []
with open('yourfile', 'r') as txtfile:
with open('yourfile', 'w') as csvfile:
for line in txtfile:
res = match_line(line)
#alternatively you can test if res in line which might be better
if res != "":
lines.append(line)
for item in lines:
csvfile.write(line)
Opens a text file for reading, if the line contains a timecode, appends the line to a list, then iterates that list and writes the line to the csv.
I am new to json data processing and stuck with this issue. Data in my input file looks like this -
[{"key1":"value1"},{"key2":"value2"}] [{"key3":"value3"},{"key4":"value4"}]
I tried to read using
json.load(file)
or by
with open(file) as f:
json.loads(f)
tried with pandas.read_json(file, orient="records") as well
each of these attempts failed with stating Extra data: line 1 column n (char n) issue
Can someone guide how best to parse this file? I am not in favor writing a manual parser which may fail to scale later
P.S. There is no , between two arrays
TIA
Your Json file content has issue.
1. If , between arrays:
Code:
import json
with open("my.json") as fp:
data = json.load(fp) # data = json.loads(fp.read())
print data
your file content can be eithor of these.
Option1:
Use outer most square bracket for your json content.
[[ {"key1":"value1"}, {"key2":"value2"}], [{"key3":"value3"},
{"key4":"value4"}]]
Option2:
use only one square bracket.
[ {"key1":"value1"}, {"key2":"value2"}, {"key3":"value3"},
{"key4":"value4"}]
2. If no , between arrays:
code:
Just writing as per the given JSON format.
def valid_json_creator(given):
replaced = given.replace("}] [{", "}],[{")
return "[" + replaced + "]"
def read_json():
with open("data.txt") as fp:
data = fp.read()
valid_json = valid_json_creator(data)
jobj = json.loads(valid_json)
print(jobj)
if __name__ == '__main__':
read_json()
This code works for JSON if it is in the following format.
Note no , between arrays, but space is there.
[{"key0":"value0"},{"key1":"value41"}]
[{"key1":"value1"},{"key2":"value42"}]
[{"key2":"value2"},{"key3":"value43"}]
[{"key3":"value3"},{"key4":"value44"}]
[{"key4":"value4"},{"key5":"value45"}]
[{"key5":"value5"},{"key6":"value46"}]
[{"key6":"value6"},{"key7":"value47"}]
[{"key7":"value7"},{"key8":"value48"}]
[{"key8":"value8"},{"key9":"value49"}]
[{"key9":"value9"},{"key10":"value410"}]
[{"key10":"value10"},{"key11":"value411"}]
[{"key11":"value11"},{"key12":"value412"}]
[{"key12":"value12"},{"key13":"value413"}]
[{"key13":"value13"},{"key14":"value414"}]
[{"key14":"value14"},{"key15":"value415"}]
[{"key15":"value15"},{"key16":"value416"}]
[{"key16":"value16"},{"key17":"value417"}]
[{"key17":"value17"},{"key18":"value418"}]
[{"key18":"value18"},{"key19":"value419"}]
[{"key19":"value19"},{"key20":"value420"}]
What you have tested is reading from a structure that corresponds to the JSON file (which by definition is text, not Python data structure).
Test:
file = '[{"key1":"value1"},{"key2":"value2"}],[{"key3":"value3"},{"key4":"value4"}]'
This should work better. But wait... you do not seem to provide a list or dict at the top level of your to-be JSON! Hence the error:
ValueError: Extra data: line 1 column 38 - line 1 column 76 (char 37 -
75)
Change it then to (note the additional list opening and closing brackets at the beginning and end):
file = '[[{"key1":"value1"},{"key2":"value2"}],[{"key3":"value3"},{"key4":"value4"}]]'
This will work with:
json.load(file)
but not with:
with open(file) as f:
json.loads(f)
as your text variable is not a file! You would want to store the contents of the variable named file to a file and pass the path to that file:
with open(r'C:\temp\myfile.json') as f:
json.loads(f)
For the code to work properly.
I am beginner in the programming world and a would like some tips on how to solve a challenge.
Right now I have ~10 000 .dat files each with a single line following this structure:
Attribute1=Value&Attribute2=Value&Attribute3=Value...AttibuteN=Value
I have been trying to use python and the CSV library to convert these .dat files into a single .csv file.
So far I was able to write something that would read all files, store the contents of each file in a new line and substitute the "&" to "," but since the Attribute1,Attribute2...AttributeN are exactly the same for every file, I would like to make them into column headers and remove them from every other line.
Any tips on how to go about that?
Thank you!
Since you are a beginner, I prepared some code that works, and is at the same time very easy to understand.
I assume that you have all the files in the folder called 'input'. The code beneath should be in a script file next to the folder.
Keep in mind that this code should be used to understand how a problem like this can be solved. Optimisations and sanity checks have been left out intentionally.
You might want to check additionally what happens when a value is missing in some line, what happens when an attribute is missing, what happens with a corrupted input etc.. :)
Good luck!
import os
# this function splits the attribute=value into two lists
# the first list are all the attributes
# the second list are all the values
def getAttributesAndValues(line):
attributes = []
values = []
# first we split the input over the &
AtributeValues = line.split('&')
for attrVal in AtributeValues:
# we split the attribute=value over the '=' sign
# the left part goes to split[0], the value goes to split[1]
split = attrVal.split('=')
attributes.append(split[0])
values.append(split[1])
# return the attributes list and values list
return attributes,values
# test the function using the line beneath so you understand how it works
# line = "Attribute1=Value&Attribute2=Value&Attribute3=Vale&AttibuteN=Value"
# print getAttributesAndValues(line)
# this function writes a single file to an output file
def writeToCsv(inFile='', wfile="outFile.csv", delim=","):
f_in = open(inFile, 'r') # only reading the file
f_out = open(wfile, 'ab+') # file is opened for reading and appending
# read the whole file line by line
lines = f_in.readlines()
# loop throug evert line in the file and write its values
for line in lines:
# let's check if the file is empty and write the headers then
first_char = f_out.read(1)
header, values = getAttributesAndValues(line)
# we write the header only if the file is empty
if not first_char:
for attribute in header:
f_out.write(attribute+delim)
f_out.write("\n")
# we write the values
for value in values:
f_out.write(value+delim)
f_out.write("\n")
# Read all the files in the path (without dir pointer)
allInputFiles = os.listdir('input/')
allInputFiles = allInputFiles[1:]
# loop through all the files and write values to the csv file
for singleFile in allInputFiles:
writeToCsv('input/'+singleFile)
but since the Attribute1,Attribute2...AttributeN are exactly the same
for every file, I would like to make them into column headers and
remove them from every other line.
input = 'Attribute1=Value1&Attribute2=Value2&Attribute3=Value3'
once for the the first file:
','.join(k for (k,v) in map(lambda s: s.split('='), input.split('&')))
for each file's content:
','.join(v for (k,v) in map(lambda s: s.split('='), input.split('&')))
Maybe you need to trim the strings additionally; don't know how clean your input is.
Put the dat files in a folder called myDats. Put this script next to the myDats folder along with a file called temp.txt. You will also need your output.csv. [That is, you will have output.csv, myDats, and mergeDats.py in the same folder]
mergeDats.py
import csv
import os
g = open("temp.txt","w")
for file in os.listdir('myDats'):
f = open("myDats/"+file,"r")
tempData = f.readlines()[0]
tempData = tempData.replace("&","\n")
g.write(tempData)
f.close()
g.close()
h = open("text.txt","r")
arr = h.read().split("\n")
dict = {}
for x in arr:
temp2 = x.split("=")
dict[temp2[0]] = temp2[1]
with open('output.csv','w' """use 'wb' in python 2.x""" ) as output:
w = csv.DictWriter(output,my_dict.keys())
w.writeheader()
w.writerow(my_dict)
I have a txt file which has some 'excel formulas', I have converted this to a csv file using Python csv reader/writer. Now I want to read the values of the csv file and do some calculation, but when i try to access the particular column of .csv file, it still returns me in the 'excel formula' instead of the actual value?? although When i open the csv file .. formulas are converted in to value??
Any ideas?
Here is the code
Code to convert txt to csv
def parseFile(filepath):
file = open(filepath,'r')
content = file.read()
file.close()
lines = content.split('\n')
csv_filepath = filepath[:(len(filepath)-4)]+'_Results.csv'
csv_out = csv.writer(open(csv_filepath, 'a'), delimiter=',' , lineterminator='\n')
for line in lines:
data = line.split('\t')
csv_out.writerow(data)
return csv_filepath
Code to do some calculation in csv file
def csv_cal (csv_filepath):
r = csv.reader(open(csv_filepath))
lines = [l for l in r]
counter =[0]*(len(lines[4])+6)
if lines[4][4] == 'Last Test Pass?' :
print ' i am here'
for i in range(0,3):
print lines[6] [4] ### RETURNS FORMULA ??
return 0
I am new to python, any help would be appreciated!
Thanks,
You can paste special in Excel with Values only option selected. You could select all and paste into a another sheet and save. This would save you from having to implement some kind of parser in python. Or, you could evaluate some simple arithmetic with eval.
edit:
I've heard of xlrd which can be downloaded from pypi. It loads .xls files.
It sounded like you just wanted the final data which past special can do.