Parsing two files with Python

Parsing two files with Python - python

I'm still new to python and cannot achieve to make what i'm looking for. I'm using Python 3.7.0
I have one file, called log.csv, containing a log of CANbus messages.
I want to check what is the content of column label Data2 and Data3 when the ID is 348 in column label ID.
If they are both different from "00", I want to make a new string called fault_code with the "Data3+Data2".
Then I want to check on another CSV file where this code string appear, and print the column 6 of this row (label description). But this last part I want to do it only one time per fault_code.
Here is my code:
import csv
CAN_ID = "348"
with open('0.csv') as log:
reader = csv.reader(log,delimiter=',')
for log_row in reader:
if log_row[1] == CAN_ID:
if (log_row[5]+log_row[4]) != "0000":
fault_code = log_row[5]+log_row[4]
with open('Fault_codes.csv') as fault:
readerFC = csv.reader(fault,delimiter=';')
for fault_row in readerFC:
if "0x"+fault_code in readerFC:
print("{fault_row[6]}")
Here is a part of the log.csv file
Timestamp,ID,Data0,Data1,Data2,Data3,Data4,Data5,Data6,Data7,
396774,313,0F,00,28,0A,00,00,C2,FF
396774,314,00,00,06,02,10,00,D8,00
396775,**348**,2C,00,**00,00**,FF,7F,E6,02
and this is a part of faultcode.csv
Level;LED Flashes;UID;FID;Type;Display;Message;Description;RecommendedAction
1;2;1;**0x4481**;Warning;F12001;Handbrake Fault;Handbrake is active;Release handbrake
1;5;1;**0x4541**;Warning;F15001;Fan Fault;blablabla;blablalba
1;5;2;**0x4542**;Warning;F15002;blablabla
Also do you think of a better way to do this task? I've read that Pandas can be very good for large files. As log.csv can have 100'000+ row, it's maybe a better idea to use it. What do you think?
Thank you for your help!

Be careful with your indentation, you get this error because you sometimes you use spaces and other tabs to indent.
As PM 2Ring said, reading 'Fault_codes.csv' everytime you read 1 line of your log is really not efficient.
You should read faultcode once and store the content in RAM (if it fits). You can use pandas to do it, and store the content into a DataFrame. I would do that before reading your logs.
You do not need to store all log.csv lines in RAM. So I'd keep reading it line by line with csv module, do my stuff, write to a new file, and read the next line. No need to use pandas here as it will fill your RAM for nothing.

Related

Reading .csv file while it is being written

I have a similar problem to the one described here below in a different question:
Reading from a CSV file while it is being written to
Unfortunately the solution is not explained.
I'd like to create a script that plots some variables in a .csv file dynamically. The .csv is updated everytime a sensor registers something.
My basic idea was to read the file each fixed period of time and if the number of rows is increased, to update the plot with the new variables.
How can I proceed?

I am not that experienced in csv
but take this logic
def writeandRead(one_row):
with open (path/file.csv,"a"): # it is append .. if your case write just change from a to w
write.row(one_row)
with open(whateve.csv,"r"):
red=csv.read #i don't know syntax ... take the logic😊
return red
for row in rows: #or you have a lists whatever dictionary
print(writeandRead(row))

Get different strings from a file and write a .txt

I'am trying to get lines from a text file (.log) into a .txt document.
I need get into my .txt file the same data. But the line itself is sometimes different. From what I have seen on internet, it's usualy done with a pattern that will anticipate how the line is made.
1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1
The line i'm working with usualy looks like this. The data i'am trying to collect are :
\ip\0.0.0.0
\name\NickName_of_the_player
\ja_guid\420D990471FC7EB6B3EEA94045F739B7
And print these data, inside a .txt file. Here is my current code.
As explained above, i'am unsure about what keyword to use for my research on google. And how this could be called (Because the string isn't the same?)
I have been looking around alot, and most of the test I have done, have allowed me to do some things, but i'am not yet able to do as explained above. So i'am in hope for guidance here :) (Sorry if i'am noobish, I understand alot how it works, I just didn't learned language in school, I mostly do small scripts, and usualy they work fine, this time it's way harder)
def readLog(filename):
with open(filename,'r') as eventLog:
data = eventLog.read()
dataList = data.splitlines()
return dataList
eventLog = readLog('games.log')

You'll need to read the files in "raw" mode rather than as strings. When reading the file from disk, use open(filename,'rb'). To use your example, I ran
text_input = r"1525:22Player 11 spawned with userinfo: \team\b\forcepowers\0-5-030310001013001131\ip\46.98.134.211:24806\rate\25000\snaps\40\cg_predictItems\1\char_color_blue\34\char_color_green\34\char_color_red\34\color1\65507\color2\14942463\color3\2949375\color4\2949375\handicap\100\jp\0\model\desann/default\name\Faybell\pbindicator\1\saber1\saber_malgus_broken\saber2\none\sex\male\ja_guid\420D990471FC7EB6B3EEA94045F739B7\teamoverlay\1"
text_as_array = text_input.split('\\')
You'll need to know which columns contain the strings you care about. For example,
with open('output.dat','w') as fil:
fil.write(text_as_array[6])
You can figure these array positions from the sample string
>>> text_as_array[6]
'46.98.134.211:24806'
>>> text_as_array[34]
'Faybell'
>>> text_as_array[44]
'420D990471FC7EB6B3EEA94045F739B7'
If the column positions are not consistent but the key-value pairs are always adjacent, we can leverage that
>>> text_as_array.index("ip")
5
>>> text_as_array[text_as_array.index("ip")+1]
'46.98.134.211:24806'

Constant first row of a .csv file?

I have a Python code which is logging some data into a .csv file.
logging_file = 'test.csv'
dt = datetime.datetime.now()
f = open(logging_file, 'a')
f.write('\n "{:%H:%M:%S}",{},{}'.format(dt,x,y,))
The above code is the core part and this produces continuous data in .csv file as
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
Now I wish to add the following lines in first row of this data. time, data1,data2.I expect output as
time, data1, data2
"00:34:09" ,23.05,23.05
"00:36:09" ,24.05,24.05
"00:38:09" ,26.05,26.05
... etc.,
I tried many ways. Those ways not produced me the result as preferred format.But I am unable to get my expected result.
Please help me to solve the problem.

I would recommend writing a class specifically for creating and managing logs.Have it initialize a file, on creation, with the expected first line (don't forget a \n character!), and keep track of any necessary information about that log(the name of the log it created, where it is, etc). You can then have the class 'write' to the log (append the log, really), you can create new logs as necessary, and, you can have it check for existing logs, and make decisions about either updating what is existing, or scrapping it and starting over.

csv.writer returning long strings in each row

I've been having some more problems. After you've modified my code well into this.
import csv
mesta=["Ljubljana","Kranj","Skofja Loka","Trzin"]
opis=["ti","mene","ti mene","ne ti mene"]
delodajalci=["GENI","MOJEDELO","MOJADELNICA","HSE"]
ime=["domen","maja","andraz","sanja"]
datum=["2.1.2014","5.10.2014","11.12.2014","5.5.2014"]
with open('sth.csv','w') as csvfile:
zapis = csv.writer(csvfile)
zapis.writerows(zip(ime,delodajalci,opis,datum,mesta))
I have one aditional question. How do I get each piece of my output to have it's own cell and not have 5 really long rows divided by , signs. Since now my output looks like:
domen,GENI,ti,2.1.2014,Ljubljana
maja,MOJEDELO,mene,5.10.2014,Kranj
andraz,MOJADELNICA,ti mene,11.12.2014,Skofja Loka
sanja,HSE,ne ti mene,5.5.2014,Trzin
I hope you will be able to help me. Thank you in advance. Cheers.

So a csv file (Comma-separated values file) is meant to have commas on really long rows as you indicated. To open the file with each value in a cell, say for excel, if you change the extension of the file to .csv it will likely be taken care of. Otherwise, you may need to import the file and indicate that the separators are commas. If you don't have excel, you can try googling for csv viewer (there are many free versions available). In either case, your output looks correct, I think you just need a bit of help opening the file in your program of choice.

how to read a comma sep value with .txt extension into python as an array?

I am biologist and very very new to Python and before, i learnt a bit of R.
So I have a very big text file (3 GB, too big to handle in R), all values are comma seperated but the extension is .txt (I don't know if it is necessary information). what i wanted to do is to:
read it into python as an object which is equivalent of dataframe in R,
get rid of columns in the middle
reduce the size of the object
write it as txt file
take the rest to R.
If you can help me i would be very happy.
thank you

There is no real need to go into python first. Your question looks a lot like this question. The answer marked as the correct answer iteratively reads the large file, and creates a new, smaller file. Other good alternatives are using sqlite and the sqdf package, or use the ff package. This last approach works particularly well is the number of columns is small compared to the number of rows.

This will take minimal memory as it does not load the whole file at once.
import csv
with open('in.txt', 'rb') f_in, open('out.csv', 'wb') as f_out:
reader = csv.reader(f_in)
writer = csv.writer(f_out)
for row in reader:
# keep first two columns and last three columns
writer.writerow(row[:2] + row[-3:])
Note: If using Python 3 change the file modes to 'r' and 'w', respectively.

i am not familiar with r dataframe, but pandas provides helpers to read csv into pandas dataframe:
from pandas import read_csv
df = read_csv('yourfile.txt')
print df
print df['Line']
If that is not what you need you can use csv module to iterate through each line of your csv as a python list and put it into whatever data structure you want.

If you insist on using a preprocessing step, using the linux command tools is a really good and fast option. If you use Linux, these tools are already installed, under Windows you'll need to first install MinGW or Cygwin. This SO question already provides some nice pointers. In essence you use the awk tool to iteratively process the text file, creating an output text file as you go. Copying form the accepted answer of the SO question I linked:
awk -F "," '{ split ($8,array," "); sub ("\"","",array[1]); sub (NR,"",$0); sub (",","",$0); print $0 > array[1] }' file.txt
This read the file, grabs the eight column, and dumps it to a file. See the answer for more details.

Per CRAN (new features and bug fixes re: development) the new development build 3.0.0 should allow for R to use the pagefile/swap. In windows you will need to set R_MAX_MEM_SIZE to a suitably large value.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing two files with Python - python

Related

Reading .csv file while it is being written

Get different strings from a file and write a .txt

Constant first row of a .csv file?

csv.writer returning long strings in each row

how to read a comma sep value with .txt extension into python as an array?

Categories

Resources