I am using python I have a CSV file which had values separated by tab,
I applied a rule to each of its row and created a new csv file, the resulting dataframe is comma separated , I want this new csv to be tab separated as well. How can I do it ?
I understand using sep = '\t' can work but where do I apply it ?
I applied the following code but it didn't work either
df = pd.read_csv('data.csv', header=None)
df_norm= df.apply(lambda x:np.where(x>0,x/x.max(),np.where(x<0,-x/x.min(),x)),axis=1)
df_norm.to_csv("file.csv", sep="\t")
Have you tried, this ?
pd.read_csv('file.csv', sep='\t')
I found the issue, the rule had changed the type to "object', because of which I was unable to perform any further operations. I followed Remove dtype at the end of numpy array, and converted my data frame to a list which solved the issue.
df = pd.read_csv('data.csv', header=None)
df_norm= df.apply(lambda x:np.where(x>0,x/x.max(),np.where(x<0,-x/x.min(),x)),axis=1)
df_norm=df_norm.tolist()
df_norm = np.squeeze(np.asarray(df_norm))
np.savetxt('result.csv', df_norm, delimiter=",")
Related
I am trying to read a data file with a header. The data file is attached and I am using the following code:
import pandas as pd
data=pd.read_csv('TestData.out', sep=' ', skiprows=1, header=None)
The issue is that I have 20 columns in my data file, while I am getting 32 columns in the variable data. How can I resolve this issue. I am very new to Python and I am learning.
Data_File
Your Text File has two spaces together, in from of any value that does not have a minus sign. if sep=' ', pandas sees this as two delimiters with nothing (Nan) inbetween.
This will fix it:
data = pd.read_csv('TestData.out', sep='\s+', skiprows=1, header=None)
In this case the sep is interpreted as a regex, which looks for "one of more spaces" as the delimiter, and reurns Columns 0 though 19.
Your data file has inconsistent space delimitation. So, you just have to skip the subsequent space after the delimiter. This simple code works:
data= pd.read_csv('TestData.out',sep=' ',skiprows=1,skipinitialspace=True)
def Text2Col(df_File):
for i in range(0,len(df_File)):
with open(df_File.iloc[i]['Input']) as inf:
with open(df_File.iloc[i]['Output'], 'w') as outf:
i=0
for line in inf:
i=i+1
if i==2 or i==3:
continue
outf.write(','.join(line.split(';')))
Above code is used to convert a csv file from text to column.
This code makes all values string ( because split() ) which is problematic for me.
I tried using map function but cant make it.
Is there any other way in which I can do this.
My input file has 5 columns, the first column is string, the second is int and the rest are float.
I think it required some modification in last statement
outf.write(','.join(line.split(';')))
Please let me know if any other input is required.
Ok, trying to help here. If this doesn't work, please specify in your question, what you're missing or what else needs to be done:
Use pandas to read in a csv file:
import pandas as pd
df = pd.read_csv('your_file.csv')
If you have a header on the first row, then use:
import pandas as pd
df = pd.read_csv('your_file.csv', header=0)
If you have a tab delimiter instead of a comma delimiter, then use:
import pandas as pd
df = pd.read_csv('your_file.csv', header=0, sep='\t')
Thank you !
Following Code worked:
def Text2Col(df_File):
for i in range(0,len(df_File)):
df = pd.read_csv(df_File.iloc[i]['Input'],sep=';')
df = df[df.index != 0]
df= df[df.index != 1]
df.to_csv(df_File.iloc[i]['Output'])
File_List="File_List.csv"
df_File=pd.read_csv(File_List)
Text2Col(df_File)
Input files are kept in same folder with same name as mentioned in File_List.xls
Output files will be created in same folder with separated in column. I deleted row 0 and 1 for my use. One can skip or add depending upon his requirement.
In above code df_file is dataframe contain two column list, first column is input file name and second column is output file name.
i have a problem with excel. I downloaded a csv that has data not separated by comma or other and are not divided by row or single cell. Here is a screenshot. How can i divide this data in python or R to manage them separated one by one?
Thanks in advance
If the number of columns is the same on all lines, this should work:
import pandas as pd
df = pd.read_csv('my_full_filepath.csv', sep ='|')
print(df.info())
Otherwise, you can try, as a first approach,
df = pd.read_csv('my_full_filepath.csv', sep ='|', error_bad_lines = False)
print(df.info())
If there are data dropped, you might have to try opening your dataframe with an argument passing the maximal number of columns of your file.
This can be done, for example by :
n = int('number of my columns')
df = pd.read_csv('my_full_file_path.csv', sep ='|', names = range(n))
Im trying to export this dataframe to CSV using pandas, with a space as a separator. I do this because I have two dataframes and I want to stack them side by side (horizontally) instead of vertically. I use concat to do this, but exporting to CSV makes python automatically regard them as a single cell. Hence I tried to enforce a space separator (argument sep=" ") when I write it CSV. However the output still ignores the separator and the result is still a single column data. My code is as follows
import pandas as pd
file='RALS-03.csv'
df=pd.read_csv(file
#get two items on two different columns )
items1=df.iloc[:,0]
items2=df.iloc[:,4]
#get corresponding numbers for the two columns of items
mar1=df.iloc[:,2]
mar2=df.iloc[:,6]
#stack them vertically
df1=items1.append(items2)
df2=mar1.append(mar2)
#put them side by side
df4=pd.concat([df1, df2], axis=1)
#write to CSV file, with a space as a separator
df4.to_csv("new.csv", sep=" ")
Any advices? Thx in adv
Use sep= '\t'
df4.to_csv('new.csv', sep='\t')
print df1 and df2 before concat. Check their column label and row index, column label cannot be the same. but row index must be the same.
And then use the following to print the result to a file
with open('df_out.txt', 'w') as f:
print(df4, file=f)
I'm having a tough time correctly loading csv file to pandas dataframe. The file is csv saved in MS Excel, where the rows looks like this:
Montservis, s.r.o.;"2 012";"-14.98";"-34.68";"- 11.7";"0.02";"0.09";"0.16";"284.88";"10.32";"
I am using
filep="file_name.csv"
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter=";")
(I have tried several combinations and alternatives of read_csv arguments, but without any success.....I have tried also read_table )
What I want to see in my dataframe that each semi colon separated value will be in separate column (I understand that read_csv works this way(?)).
Unfortunately, I always end up with whole row being placed in first column of dataframe. So basicly after loading I have many rows, but only one column (two if I count also indexes)
I have placed sample here:
datafile
Any idea welcomed.
Add quoting = 3. 3 is for QUOTE_NONE refer this.
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter=";", quoting = 3)
This will give [7 rows x 23 columns] dataframe
The problem is enclosing characters which can be ignored by \ character.
raw_data = pd.read_csv(filep,engine="python",index_col=False, header=None, delimiter='\;')