Unable to write a new DataFrame into csv file - python

I am trying to write a new dataframe into the CSV file. But it is not being written into the file(empty file) neither giving any error.
I feel the issue is with this line. Since I am trying to write a single value to the column.
order['trade_type'] = trade_type
Any idea what's wrong here.
def write_transaction_to_file(trade_type):
order = pd.DataFrame()
order['trade_type'] = trade_type
order.to_csv("open_pos.csv", sep='\t', index=False)
write_transaction_to_file('SELL')

Your code creates an empty DataFrame, without even column names.
And now look at order['trade_type'] = trade_type.
If order contained some rows, among columns there were one named just
'trade_type' (string) and trade_type (variable) was a scalar,
then in all rows in order would receive this value (in this column).
But since order contains no rows, there is no place to write to.
Instead you can append a row, e.g.:
order = order.append({'trade_type': trade_type}, ignore_index=True)
The rest of code is OK, the output file name as ordinary string
is also OK.
Other solution: Just create a DataFrame with a single row and single
named column, filled with your variable:
order = pd.DataFrame([[trade_type]], columns=['trade_type'])
Then write it to CSV file as before.

Related

Is there a way to split data from a cell in a .xlsm to individual arrays in python?

I have .xlsm files that contain 4 values that I want to be able to separate into columns of data. These values are all in a single cell in .xlsm files along with the column names.
I tried reading in the file as
pd.read_excel("1.xlsm", delimiter = ",")
, but am not able to separate each value into a new column. I want to be able to call each column using pandas so I can access each column individually.
That's not working because cell A3 for example contains that entire string. What you can do is split the string at the , which will return a list of Time(s), A, B, C, D.
df['A'].str.split(",", n=-1, expand=True)
The n=-1 means to do it for every comma it sees. And expand = True means it will create new columns for your information.
I can't test this with recreating your Excel file, so please let me know if it works or not. :)

How to compile list of rows from excel and make it as a list array in Python

def check_duplication(excelfile, col_Date, col_Name):
list_rows[]
Uphere is the a bit of the code.
How do I make lists in Python from the excel file? I want to compile every rows that contains the value of Date and Name in the sheet from excel and make it as a list. The reason I want to make a list because later I want to compare between the rows within the list to check if there is a duplicate within the list of rows.
Dataframe Method
To compare excel content, you do not need to make a list. But if you want to make a list, one starting point may be making a dataframe, which you can inspect in python. To make a dataframe, use:
import pandas as pd
doc_path = r"the_path_of_excel_file"
sheets= pd.read_excel(doc_path, sheet_name= None, engine= "openpyxl", header= None)
This code lines read the excel document's all sheets without headers. You may change the parameters.
(For more information: https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html)
Assume Sheet1 is the sheet we have our data in:
d_frame = sheets[0]
list_rows = [df.iloc[i,:] for i in range(len(d_frame.shape[0]))]
I assume you want to use all columns. You may find the list with the code.

Splitting column of dataframe based on text characters in cells

I imported a .csv file with a single column of data into a dataframe that I am trying to clean up by splitting the column based on various string occurrences within the cells. I've tried numerous means to split the column, but can't seem to get it to work. My latest attempt was using the following:
df.loc[:,'DataCol'] = df.DataCol.str.split(pat=':\n',expand=True)
df
The result is a dataframe that is still one column and completely unchanged. What am I doing wrong? This is my first time doing anything like this so please forgive the simple question.
Df.loc creates a copy of the column you've selected - try replacing the code below with df['DataCol'], which references the actual column in the original dataframe.
df.loc[:,'DataCol']

Import table to DataFrame and set group of column as list

I have a table (Tab delimited .txt file) in the following form:
each row is an entry;
first row are headers
the first 5 columns are simple numeric parameters
all column after the 7th column are supposed to be a list of values
My problem is how can I import and create a data frame where the last column contain a list of values?
-----Problem 1 ----
The header (first row) is "shorter", containing simply the name of some columns. All the values after the 7th do not have a header (because it is suppose to be a list). If I import the file as is, this appear to confuse the import functions
If, for example, I import as follow
df = pd.read_table( path , sep="\t")
the DataFrame created has only as many columns as the elements in the first row. Moreover, the data value assigned are mismatched.
---- Problem 2 -----
What is really confusing to me is that if I open the .txt in Excel and save it as Tab-delimited (without changing anything), I can then import it without problems, with headers too: columns with no header are simply given an “Unnamed XYZ” tag.
Why would saving in Excel change it? Using Note++ I can see only one difference: the original .txt is in "Unix (LF)" form, while the one saved in Excel is "Windows (CR LF)". Both are UTF-8, so I do not understand how this would be an issue?!?
Nevertheless, from here I could manipulate the data and try to gather all columns I wish and make them into a list. However, I hope that there is a more elegant and faster way to do it.
Here is a screen-shot of the .txt file
Thank you,

how to write an empty column in a csv based on other columns in the same csv file

I don't know whether this is a very simple qustion, but I would like to do a condition statement based on two other columns.
I have two columns like: the age and the SES and the another empty column which should be based on these two columns. For example when one person is 65 years old and its corresponding socio-economic status is high, then in the third column(empty column=vitality class) a value of 1 is for example given. I have got an idea about what I want to achieve, however I have no idea how to implement that in python itself. I know I should use a for loop and I know how to write conditons, however due to the fact that I want to take two columns into consideration for determining what will be written in the empty column, I have no idea how to write that in a function
and furthermore how to write back into the same csv (in the respective empty column)
[]
Use the pandas module to import the csv as a DataFrame object. Then you can do logical statements to fill empty columns:
import pandas as pd
df = pd.read_csv('path_to_file.csv')
df.loc[(df['age']==65) & (df['SES']=='high'), 'vitality_class'] = 1
df.to_csv('path_to_new_file.csv', index=False)

Categories

Resources