Reading text files with Python - python

I have a text file with something like a million lines that are formatted like this:
{"_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0","parameterId":"visib_mean_last10min","stationId":"06193","timeCreated":1581590344449633,"timeObserved":1577922600000000,"value":11100}
The file has no headers. I want to be able to observe it as an array.
I've tried this:
df = pd.read_csv("2020-01_2.txt", delimiter = ",", header = None, names = ["_id", "parameterId", "stationId", "timeCreated", "timeObserved", "value"])
and while that does sort the files into columns and rows like I want it to it will plot "_id":"0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0" as the first entry where I would only want "0e1daf84-4e4d-11ea-9f43-ba9b7f2413e0".
How do I plot only the value that comes after each ":" into the array?

As put it by #mousetail this looks as some kind of json file. You may want to do as follows:
import json
mylist = []
with open("2020-01_2.txt") as f:
for line_no, line in enumerate(f):
mylist.append([])
mydict = json.loads(line)
for k in mydict:
mylist[line_no].append(mydict[k])
mydict= {}
It will output a list of lists, each one of them corresponding to a file line.
Good luck!

Related

Check for key in csv file if key is matched then add data into different rows of matched column using python

I have csv file like below
I need to search for a key then some values should be added in that key column. for example I need to search for folder and some values should be added in folder column. in the same way I need to search for name and some values should be added in name column.
so the final output looks like below
I have followed the below way but it doesn't work for me
import csv
list1 = [['ab', 'cd', 'ed']]
with open('1.csv', 'a') as f_csv:
data_to_write_list1 = zip(*list1)
writer = csv.writer(f_csv, delimiter=',', dialect='excel')
writer.writerows(data_to_write_list1)
If you want to only use built-in methods, you can get the first row of a file (in the case of a CSV file like yours, the headers) like this:
>>> with open('file_you_need.csv', 'r') as f:
>>> file = f.readline()
In your case the variable file would then be (supposing the delimiter is ","):
folder,name,service
You can now do file.split(",") (eventually replacing "," with whatever your delimiter is) and you'll get back a list of headers. You can then create a list of lists where each list is a row of your file and write back to the file or use a dictionary to link new entries to each header. Depending on your choice you would then in different ways write back to the file, i.e. supposing you go with list of lists:
with open('file_you_need.csv','w') as f:
for list in listoflists:
row = ""
for el, i in enumerate(list):
if i != len(list):
row += el+","
else:
row += el
f.write(row)
As others have mentioned you could also use Pandas and DataFrames to make it cleaner, but I don't think this is too hard to grasp

How to sort the data by a keyword of a csv file in Python?

Is it possible to sort data of .csv file by a keyword in Python?
Suppose we write a .csv file and put some data in it. For example:
['www.google.com', 'www.kiet.edu','animals','www.yahoo.com' ,'birds','lion','www.youtube.com'])
Now I want to sort the data which have .com using Python. How could it be done?
l = ['www.google.com', 'www.kiet.edu','animals','www.yahoo.com' ,'birds','lion','www.youtube.com']
nl = list([x for x in l if '.com' in x])
nl.sort()
print nl
Output ['www.google.com', 'www.yahoo.com', 'www.youtube.com']
First filter to element that contains .com, then sort.
you simply read the file and store it in a variable named data. While reading the file line by line, simply check the lines containing .com and add them to list. \n is the line delimiter and we do not want to see it in our output. sorted(l) will sort the data in list.
data = open("file1.txt", "r")
l = []
i = 0
for val in data.readlines():
if ".com" in val:
l.append(str(val).replace("\n", ""))
print(sorted(l))

Parsing unique values in a CSV where the primary key is not unique

This seems pretty trivial. Generally, I'd do something like the following:
results = []
reader = csv.reader(open('file.csv'))
for line in reader: # iterate over the lines in the csv
if line[1] in ['XXX','YYY','ZZZ']: # check if the 2nd element is one you're looking for
results.append(line) # if so, add this line the the results list
However, my data set isn't so simply formatted. It looks like the following:
Symbol,Values Date
XXX,8/2/2010
XXX,8/3/2010
XXX,8/4/2010
YYY,8/2/2010
YYY,8/3/2010
YYY,8/4/2010
ZZZ,8/2/2010
ZZZ,8/3/2010
ZZZ,8/4/2010
Essentially what I am trying to do is parse the first date for each unique Symbol in the list such that I end up with the following:
XXX,8/2/2010
YYY,8/2/2010
ZZZ,8/2/2010
Pandas may help. ;-)
import pandas
pandas.read_csv('file.csv').groupby('Symbol').first()
Here is a simple solution using a set of already found 1st element:
results = []
reader = csv.reader(open('file.csv'))
already_done = set()
for line in reader: # iterate over the lines in the csv
if line[1] in ['XXX','YYY','ZZZ'] and line[0] not in already_done:
results.append(line) # if so, add this line the the results list
already_done.add(line[0])

Reading text file into dic file results in incomplete dic file

The following file 2016_01_22_Reps.txt is a list of expanded contractions that I want to put into a python dic file;
“can't":"cannot","could've":"could have","could've":"could have","didn't":"did not","doesn't":"does not", “don't":"do not"," hadn't":"had not", "hasn't":"has not","haven't":"have not","I'll":"I will","I'm":"I am","I've":"I have","isn't":"is not","I'll":"I
Note that the contents are a single line, not multiple lines.
My code is as follows;
reps = open('2016_01_22_Reps.txt', 'r')
Reps1dic={}
for line in reps:
x=line.split(",")
a=x[0]
b=x[1]
c=len(b)-1
b=b[0:c]
Reps1dic[a]=b
print (Reps1dic)
The output to Reps1dic stops after first two pairs of contractions. Contents are as follows;
{‘2016_01_22Reps = {“can\’t”:”cannot”‘ : ‘”could\’ve”:”could have’}
Instructions and explanation of why the complete file contents are not written to the dic file will be most appreciated.
The problem is that your values are all on one line, so your for line in reps only goes through the one iteration. Do something like this:
with open('2016_01_22_Reps.txt', 'r') as reps:
Reps1dic={}
contents = reps.read()
pairs = contents.split(',')
for pair in pairs:
parts = pair.split(':')
a = parts[0].replace('"', '').strip()
b = parts[1].replace('"', '').strip()
Reps1dic[a] = b
print(Reps1dic)
where you split the line and then iterate over that list instead of the lines in the file. I also used the with keyword to open your file - it's much better practice.

adding string at the end of the file python

Clarification:
So if my file has 10 lines:
THe first line is a heading, so I want to append some text at the end of first line
THen I have a list which contains 9 elements..
I want to read that list and append the end of each line with corresponding element..
So basically list[0] to second line, list[1] to third line and so on..
I have a file which is delimted by comma.
something like this:
A,B,C
0.123,222,942
......
Now I want to do something like this:
A,B,C,D #append "D" just once
0.123,222,942,99293
............
This "D" is actually saved in a list so yeah I have this "D"
How do I do this? I mean I know the naive way.
like go thru each line and do something like
string += str(list[i])
Basically how do i append something at the end of the file in pythonic way :)
Just create a new file:
data = ['header', 1, 2, 3, 4]
with open("infile", 'r') as inf, open("infile.2", 'w') as outf:
outf.writelines('%s,%s\n' % (s.strip(), n) for s, n in zip(inf, data))
If you want to "update" the input file, just rename the new one afterwards
import os
os.unlink("infile")
os.rename("infile.2", "infile")
Short answer: Use the csv module.
Long answer:
import csv
newvalues = [...]
with open("path/to/input.csv") as file:
data = list(csv.reader(file))
with open("path/to/input.csv", "w") as file:
writer = csv.writer(file)
for row, newvalue in zip(data, newvalues):
row.append(newvalue)
writer.writerow(row)
Naturally, this depends on the lines in the file and newvalues being the same length. If this isn't the case, you could use something like zip_longest to fill in the excess lines with a given value.
If you are doing this to the different files, we can do it even more easily:
import csv
newvalues = [...]
with open("path/to/input.csv") as from, open("path/to/output.csv", "w") as to:
reader = csv.reader(from)
writer = csv.writer(to)
for row, newvalue in zip(reader, newvalues):
row.append(newvalue)
writer.writerow(row)
This also has the advantage of not reading the entire file into memory, so for very large files, this is a better solution.

Categories

Resources