Pandas to_csv writing extra " on column containing raw json - python

I'm attempting to save my dataframe as csv after processing my data.
One caveat is I have a column containing the 'raw json' of the file as well.
When pandas saves the file using to_csv(header=False), I get the following
1,2,"{""col_1"":""1"",""col_2"":""1""}"
My dataframe looks like this:
col_1
col_2
raw_json
1
1
{"col_1":1,"col_2":1}
I've tried adding the json col something like:
for i, row in df:
i_val = row.to_json()
df.at[i,'raw_json'] = i_val
Expected csv:
1,2,{"col_1":"1","col_2":"1"}

You could use something like this:
import csv
import pandas as pd
df.to_csv('output.csv', index=False, header=False, quoting=csv.QUOTE_NONE, sep=';')
As #pranav-hosangadi was explaining:
"CSV format uses quotes to escape fields that themselves contain the
separator"
So when you set quoting=csv.QUOTE_NONE you disable that behavior and nothing will be quoted.
Important:
Note that the separator of the csv will be ";" in this case, so you'll need to be sure that your fields not contains";" characters that could broke your csv

Related

Split column in several columns by delimiter '\' in pandas

I have a txt file which I read into pandas dataframe. The problem is that inside this file my text data recorded with delimiter ''. I need to split information in 1 column into several columns but it does not work because of this delimiter.
I found this post on stackoverflow just with one string, but I don't understand how to apply it once I have a whole dataframe: Split string at delimiter '\' in python
After reading my txt file into df it looks something like this
df
column1\tcolumn2\tcolumn3
0.1\t0.2\t0.3
0.4\t0.5\t0.6
0.7\t0.8\t0.9
Basically what I am doing now is the following:
df = pd.read_fwf('my_file.txt', skiprows = 8) #I use skip rows because there is irrelevant text
df['column1\tcolumn2\tcolumn3'] = "r'" + df['column1\tcolumn2\tcolumn3'] +"'" # i try to make it a row string as in the post suggested but it does not really work
df['column1\tcolumn2\tcolumn3'].str.split('\\',expand=True)
and what I get is just the following (just displayed like text inside a data frame)
r'0.1\t0.2\t0.3'
r'0.4\t0.5\t0.6'
r'0.7\t0.8\t0.9'
I am not very good with regular expersions and it seems a bit hard, how can I target this problem?
It looks like your file is tab-delimited, because of the "\t". This may work
pd.read_csv('file.txt', sep='\t', skiprows=8)

How to use 'Shift In' from text in csv file to split columns

I'm trying to import csv style data from a software designed in Europe into a df for analysis.
The data uses two characters to delimit the data in the files, 'DC4' and 'SI' ("Shift In" I believe). I'm currently concatenating the files and delimiting them by the 'DC4' character using read_csv into a df. Then I use a regex line to replace all the 'SI' characters into ';' in the df. I skip every other line in the code to remove the identifiers I don't need next. If I open the data at this point everything is split by the 'DC4' and all 'SI' are converted to ;.
What would you suggest to further split the df by the ; character now? I've tried to split the df by series.string but got type errors. I've exported to csv and reimported it using ; as the delimiter, but it doesn't split the existing columns that were already split with the first import for some reason? I also get parser errors on some rows way down the df so I think there are dirty rows (this is just information I've found. If not helpful please ignore it). I can ignore these lines without affecting the data I need.
The size of the df is around 60-70 columns and usually less than 75K rows when I pull a full report. I'm using PyCharm and Python 3.8. Thank you all for any help on this, I very much appreciate it. Here is my code so far:
path = file directory location
df = pd.concat([pd.read_csv(f, sep='', comment=" ", na_values='Nothing', header=None, index_col=False)
for f in glob.glob(path + ".file extension")], ignore_index=True)
df = df.replace('', ';', regex=True)
df = df.iloc[::2]
df.to_csv(r'new_file_location', index=False, encoding='utf-8-sig')
So you have a CSV (technically not a CSV I guess) that's separated by two different values (DC4 and SI) and you want to read it into a dataframe?
You can do so directly with pandas, the read_csv function allows you to specify regex delimiters, so you could use "\x0e|\x14" and use either DC4 or SI as selarator: pd.read_csv(path, sep="\x0e|\x14")
An example with readable characters:
The csv contains:
col1,col2;col3
val1,val2,val3
val4;val5;val6
Which can be read as follows:
import pandas as pd
df = pd.read_csv(path, sep=",|;")
which results in df being:
col1 col2 col3
0 val1 val2 val3
1 val4 val5 val6

Converting Comma delimted CSV to Tab delimted CSV in pandas

I am using python I have a CSV file which had values separated by tab,
I applied a rule to each of its row and created a new csv file, the resulting dataframe is comma separated , I want this new csv to be tab separated as well. How can I do it ?
I understand using sep = '\t' can work but where do I apply it ?
I applied the following code but it didn't work either
df = pd.read_csv('data.csv', header=None)
df_norm= df.apply(lambda x:np.where(x>0,x/x.max(),np.where(x<0,-x/x.min(),x)),axis=1)
df_norm.to_csv("file.csv", sep="\t")
Have you tried, this ?
pd.read_csv('file.csv', sep='\t')
I found the issue, the rule had changed the type to "object', because of which I was unable to perform any further operations. I followed Remove dtype at the end of numpy array, and converted my data frame to a list which solved the issue.
df = pd.read_csv('data.csv', header=None)
df_norm= df.apply(lambda x:np.where(x>0,x/x.max(),np.where(x<0,-x/x.min(),x)),axis=1)
df_norm=df_norm.tolist()
df_norm = np.squeeze(np.asarray(df_norm))
np.savetxt('result.csv', df_norm, delimiter=",")

Tab seperated CSV file exported in only one column

I'm new to python and want to import some csv data with pandas. The data is seperated with tabs ('\t').
The CSV file has 35339 rows and 23 columns.
The data read in over pandas works as expected but if I try to visualize the data it says that the read in data has only 35339 rows and 1 column. Even if the data seems to be extracted correctly in the console over the print command, it seems that only one column but all rows are exported.
I tried several different options on how to import data over pandas. I also just the csv reader but did not get the expected result.Here is a snapshot of the data.
import pandas as pd
import glob
for filename in glob.glob('*.csv'):
print(filename)
sensor_df = pd.read_csv(filename, sep='\t',low_memory=False)
print(sensor_df)
The output is
[35338 rows x 1 columns]
expected is
[35338 rows x 23 columns]
Try skiprows, it looks like the first row is a comment about the separator.
sensor_df = pd.read_csv(filename, sep='\t', low_memory=False, skiprows=1)

pandas.read_csv not partitioning data at semicolon delimiter

I'm having a tough time correctly loading csv file to pandas dataframe. The file is csv saved in MS Excel, where the rows looks like this:
Montservis, s.r.o.;"2 012";"-14.98";"-34.68";"- 11.7";"0.02";"0.09";"0.16";"284.88";"10.32";"
I am using
filep="file_name.csv"
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter=";")
(I have tried several combinations and alternatives of read_csv arguments, but without any success.....I have tried also read_table )
What I want to see in my dataframe that each semi colon separated value will be in separate column (I understand that read_csv works this way(?)).
Unfortunately, I always end up with whole row being placed in first column of dataframe. So basicly after loading I have many rows, but only one column (two if I count also indexes)
I have placed sample here:
datafile
Any idea welcomed.
Add quoting = 3. 3 is for QUOTE_NONE refer this.
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter=";", quoting = 3)
This will give [7 rows x 23 columns] dataframe
The problem is enclosing characters which can be ignored by \ character.
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter='\;')

Categories

Resources