filevalues.fillna not filling for some files only - python

readfile = pd.read_csv('42.csv')
filevalues= readfile.loc[readfile['Customer'].str.contains('Lam Dep', na=False), 'Jun-18\nQty']
filevalues = filevalues.fillna(0)
print(filevalues)
I have sales forecast files that have the same format as each other. When I change the value of the read file ( right now im reading file 42.csv), sometimes the columns will fill with 0 for null values and sometimes they do not
I am unsure why this happens as all the files have the same format and seem very identical.Any thoughts of why this may be happening? And please let me know if a screenshot of my files is needed
In addition, for the files that do not fill, when I run this program withou the
fillvalues.fillana(0)
function, then the files that do not fill with 0, still just show blank spaces, while the files that do fill with 0 do show nAn. I suppose a better question would be is that for both files, they have blank spaces, but python seems to detect some of them as nAn and some of them as just blanks and does not write anything there. Why??

filevalues = filevalues.replace(r'^\s*$', np.nan, regex=True)
Figured it out! It seems that some of the files had empty strings, and some of the files were simply detected as nAns ( not idea why) I used the function above and it worked !

Related

Pandas reads almost every column in .txt as index - SOLVED

I have a file named "sample name_TIC.txt". The first three columns in this file are useful - Scan, Time, and TIC. It also has 456 not useful columns after the first 3. To do other data processing, I need these not-useful columns to go away. So I wrote a bit of code to start:
os.chdir(main_folder)
mydir = (os.getcwd())
nameslist=['Scan','Time', 'TIC']
for path, subdirs, files in os.walk(mydir):
for file in files:
if (file.endswith('TIC.txt')):
myfile=os.path.join(path, file)
TIC_df = pd.read_csv(myfile,sep="\t",skiprows=1, usecols=[0,1,2],names=nameslist)
Normally, the for loop is set into a function that is iterated over a very large set of folders with a lot of samples, hence the os.walk stuff, but we can ignore that right now. This code will be completed to save a new .txt file with only the 3 relevant columns.
The problem comes in the last line, the pd.read_csv line. This results in a dataframe with an index column that comprises the data from the first 456 columns and the last 3 columns of the .txt are given the names in nameslist and callable as columns in pandas, (i.e. using .iloc). This is not a multi-index. It is a single index with all the data and whitespace of those first columns.
In this example code sep="\t" because that's how excel can successfully import it. But I've also tried:
sep="\s"
delimiter=r"\s+" rather than a sep argument
including header=None
not including the usecols argument I made an error, and did not call the proper result from this code edit. This is the correct solution. See edit below or the answer.
setting index_col=False
How can I get pd.read_csv to take the first 3 columns and ignore the rest?
Thanks.
EDIT: In my end-of-day foolishness, I made an error, changing the target df to the example TIC_df. In the original code set I took this from, this was named mz207_df. My call function was still referncing the old df name.
Changing the last line of code to:
TIC_df = pd.read_csv(myfile,sep="\s+",skiprows=1, usecols[0,1,2],names=nameslist)
successfully resolved my problem. Using sep="\t" also worked. Sorry for wasting people's time. I will post this with an answer as well in case someone needs to learn about usecols like I did.
Answering here to make sure the problem gets flagged as answered, in case someone else searches for it.
I made an error when calling the result from the code which included the usecols=[0,1,2] argument, and I was calling an older dataframe. The following line of code successfully generated the desired code.
TIC_df = pd.read_csv(myfile,sep="\s+",skiprows=1, usecols=[0,1,2],names=nameslist)
Using sep="\t" also generated the correct dataframe, but I default to \s+ to accomdate different and varible formatting from analytical machine outputs.

Removing excel sheet with openpyxl raise error when I open the file

I tested the openpyxl .remove() function and it's working on multiple empty file.
Problem: I have a more complex Excel file with multiple sheet that I need to remove. If I remove one or two it works, when I try to remove three or more, Excel raise an error when I open the file.
Sorry, we have troubles getting info in file bla bla.....
logs talking about pictures troubles
logs about error105960_01.xml ?
The strange thing is that it's talking about pictures trouble but I don't have this error if I don't remove 3 or more sheet. And I don't try to remove sheet with images !
Even more strange, It's always about the number, every file can be deleted without trouble but if I remove 3 or more, Excel yell at me.
The thing is that, it's ok when Excel "repair" the "error" but sometimes, excel reinitialize the format of the sheets (size of cell, bold and length of the characters, etc...) and everything fail :(
bad visual that I want to avoid
If someone have an idea, i'm running out of creativity !
For the code, I only use basic functions (simplify here but it would be long to present more...).
INPUT_EXCEL_PATH = "my_excel.xlsx"
OUTPUT_EXCEL_PATH = "new_excel.xlsx"
wb = openpyxl.load_workbook(INPUT_EXCEL_PATH)
ws = wb["sheet1"]
wb.remove(ws)
ws = wb["sheet2"]
wb.remove(ws)
ws = wb["sheet3"]
wb.remove(ws)
wb.save(OUTPUT_EXCEL_PATH)
In my case it was some left over empty CalculationChainPart. I used DocxToSource to investigate the corrupted file. Excel will attempt to fix the file on load. Save this file and compare it's structure to the original file. To delete descendant parts you can use the DeletePart() method.
using (SpreadsheetDocument doc = SpreadsheetDocument .Open(document, true)) {
MainDocumentPart mainPart = doc.MainDocumentPart;
if (mainPart.DocumentSettingsPart != null) {
mainPart.DeletePart(mainPart.DocumentSettingsPart);
}
}
CalculationChainPart can be also removed anytime.
While calculation chain information can be loaded by a spreadsheet application, it is not required. A calculation chain can be constructed in memory at load-time (source)

Parsing two files with Python

I'm still new to python and cannot achieve to make what i'm looking for. I'm using Python 3.7.0
I have one file, called log.csv, containing a log of CANbus messages.
I want to check what is the content of column label Data2 and Data3 when the ID is 348 in column label ID.
If they are both different from "00", I want to make a new string called fault_code with the "Data3+Data2".
Then I want to check on another CSV file where this code string appear, and print the column 6 of this row (label description). But this last part I want to do it only one time per fault_code.
Here is my code:
import csv
CAN_ID = "348"
with open('0.csv') as log:
reader = csv.reader(log,delimiter=',')
for log_row in reader:
if log_row[1] == CAN_ID:
if (log_row[5]+log_row[4]) != "0000":
fault_code = log_row[5]+log_row[4]
with open('Fault_codes.csv') as fault:
readerFC = csv.reader(fault,delimiter=';')
for fault_row in readerFC:
if "0x"+fault_code in readerFC:
print("{fault_row[6]}")
Here is a part of the log.csv file
Timestamp,ID,Data0,Data1,Data2,Data3,Data4,Data5,Data6,Data7,
396774,313,0F,00,28,0A,00,00,C2,FF
396774,314,00,00,06,02,10,00,D8,00
396775,**348**,2C,00,**00,00**,FF,7F,E6,02
and this is a part of faultcode.csv
Level;LED Flashes;UID;FID;Type;Display;Message;Description;RecommendedAction
1;2;1;**0x4481**;Warning;F12001;Handbrake Fault;Handbrake is active;Release handbrake
1;5;1;**0x4541**;Warning;F15001;Fan Fault;blablabla;blablalba
1;5;2;**0x4542**;Warning;F15002;blablabla
Also do you think of a better way to do this task? I've read that Pandas can be very good for large files. As log.csv can have 100'000+ row, it's maybe a better idea to use it. What do you think?
Thank you for your help!
Be careful with your indentation, you get this error because you sometimes you use spaces and other tabs to indent.
As PM 2Ring said, reading 'Fault_codes.csv' everytime you read 1 line of your log is really not efficient.
You should read faultcode once and store the content in RAM (if it fits). You can use pandas to do it, and store the content into a DataFrame. I would do that before reading your logs.
You do not need to store all log.csv lines in RAM. So I'd keep reading it line by line with csv module, do my stuff, write to a new file, and read the next line. No need to use pandas here as it will fill your RAM for nothing.

loading a set of text files whose names are stored in an array

So I have a set of text files that are some columns of numbers and whos names are stored in an excel file. I need to load in every file in the directory who's name matches one in the excel file. I also want to inform you all that I am a python beginner, and I'm honestly not very computer savvy (but I'm trying).
I start by loading the excel file into a dataframe and then converting it to an array. Then I was trying to loop through the array and load in any files that match it, with the name of the variable holding the data being the name of the text file (without the .txt)
df=pd.read_excel('names.xlsx', sheet_name="Sheet 1")
array=df.values
for i in array:
str(array[i][0])=np.loadtxt(str(array[i][0])+'.txt')
when I try to run this I get:
str(array[i][0]) = np.loadtxt(str(array[i][0])+'.txt')
^
SyntaxError: can't assign to function call
So my questions are, how can I assign that as the variable name, and because it stops before the code gets there, is it valid to load the files in the way I have?
I found a person to help and they led me to this:
df=pd.read_excel('names.xlsx', sheet_name="Sheet 1")
array=df.values
for i in array:
x,y,z=np.loadtxt(i[0]+'.txt', dtype=float)
It's not exactly what I wanted to be able to do, but I can just put the other things I was going to do with the data in the loop so that it overwrites and does it again, which will work.

csv.writer returning long strings in each row

I've been having some more problems. After you've modified my code well into this.
import csv
mesta=["Ljubljana","Kranj","Skofja Loka","Trzin"]
opis=["ti","mene","ti mene","ne ti mene"]
delodajalci=["GENI","MOJEDELO","MOJADELNICA","HSE"]
ime=["domen","maja","andraz","sanja"]
datum=["2.1.2014","5.10.2014","11.12.2014","5.5.2014"]
with open('sth.csv','w') as csvfile:
zapis = csv.writer(csvfile)
zapis.writerows(zip(ime,delodajalci,opis,datum,mesta))
I have one aditional question. How do I get each piece of my output to have it's own cell and not have 5 really long rows divided by , signs. Since now my output looks like:
domen,GENI,ti,2.1.2014,Ljubljana
maja,MOJEDELO,mene,5.10.2014,Kranj
andraz,MOJADELNICA,ti mene,11.12.2014,Skofja Loka
sanja,HSE,ne ti mene,5.5.2014,Trzin
I hope you will be able to help me. Thank you in advance. Cheers.
So a csv file (Comma-separated values file) is meant to have commas on really long rows as you indicated. To open the file with each value in a cell, say for excel, if you change the extension of the file to .csv it will likely be taken care of. Otherwise, you may need to import the file and indicate that the separators are commas. If you don't have excel, you can try googling for csv viewer (there are many free versions available). In either case, your output looks correct, I think you just need a bit of help opening the file in your program of choice.

Categories

Resources