pandas read csv then remove leading 0 then rewrtie CSV - python

Hey Guys I have a csv named info.csv
Number,Name 01,john 02,mike 010,kevin 012,joe
020,rob
I want to read in the csv using python pandas from my path remove the leading 0 and then rewrite it to a new csv named newinfo.csv. I have not been able to find any type of answer on SOF with this process.

When I import with Pandas it recognizes the first column as an integer and removes the leading zeros.
import pandas as pd
df = pd.read_csv("info.csv")
df.to_csv("newinfo.csv", index=False)
Or you could change the type to integer yourself.
df.Number = df.Number.astype(int)

Do you have any code you can show for this? What have you tried so far?
You can do something like:
with open('your_file.csv', 'rb') as f:
data = f.read()
And then slice the first element to remove the 0
Cheers!

Related

Python pandas extra 0 in numeric values

I have a simple code that read csv file. After that I change the names of the columns and print them. I found one weird issue that for some numeric columns its adding extra .0 Here is my code:
v_df = pd.read_csv('csvfile', delimiter=;)
v_df = v_df.rename(columns={Order No. : Order_Id})
for index, csv_row in v_df.iterrows():
print(csv_row.Order_Id)
Output is:
149545961155429.0
149632391661184.0
If I remove the empty row (2nd one in the above output) from the csv file, .0 does not appear in the ORDER_ID.
After doing some search, I found that converting this column to string will solve the problem. It does work if I change the first row of the above code to:
v_df = pd.read_csv('csvfile', delimiter=;, dtype={'Order No.' : 'str'})
However, the issue is that the column name 'Order No.' is changing to Order_Id as I am doing the rename so I can not use 'Order No.'. For this reason I tried the following:
v_df[['Order_Id']] = v_df[['Order_Id']].values.astype('str')
But unfortunately it seems that astype is not changing the datatype and .0 is still appearing. My questions are:
1- Why .0 is coming at the first place if there is an empty row in the csv file?
2- Why datatype change is not happening after rename?
My aim is to just get rid of .0, I don't want to change the datatype if .0 can go away using any other method.
I am trying to emulate your df here, although it has some differences I think it will work for you:
import pandas as pd
import numpy as np
v_df = pd.DataFrame([['13-Oct-22','149545961155429.0','149545961255429','Delivered'],
['12-Oct-22',None,None,'delivered'],
['15-Oct-22','149632391661184.0','149632391761184','Delivered']], columns=
['Transaction Date','Order_Id','Order Item No.','Order Item Status'])
v_df[['Order_Id']] = v_df[['Order_Id']].fillna(np.nan).values.astype('float').astype('int').astype('str')
Try it and let me know

Readline to Array or List output

I am new to Python and was recently given a problem to solve.
Briefly, I have a .csv file consisting of some data. I am supposed to read the .csv file and print out the first 5 column header names, followed by 5 rows of the data in the following format shown in the picture.
Results
Currently, I have written the code up to:
readfiles = file.readlines()
for i in readfiles:
data = i.strip()
print(data)
and have managed to churn out all the data. However, I am not too sure how I can get the 5 rows of data which is required by the problem. I am thinking if the .csv file should be converted into an array/list? Hoping someone can help me on this. Thank you.
I can't use pandas or csv for this by the way. =/
df = pd.read_csv('\#pathtocsv.csv')
df.head()
if you want it in list
needed_list = df.head().tolist()
First of all, if you want to read a csv file, you can use pandas library to do it.
import pandas as pd
df = pd.read_csv("path/to/your/file")
print(df.columns[0:5]) # print first 5 column names
print(df.head(5)) # Print first 5 rows
Or if you want it to do without pandas then,
rows = []
with open("path/to/file.csv", "r") as fl:
rows = [x.split(",") for x in fl.read().split("\n")]
print(rows[0][0:5]) # print first 5 column names
print(rows[0:5]) # print first 5 rows

I have to extract all the rows in a .csv corresponding to the rows with 'watermelon' through pandas

I am using this code. but instead of new with just the required rows, I'm getting an empty .csv with just the header.
import pandas as pd
df = pd.read_csv("E:/Mac&cheese.csv")
newdf = df[df["fruit"]=="watermelon"+"*"]
newdf.to_csv("E:/Mac&cheese(2).csv",index=False)
I believe the problem is in how you select the rows containing the word "watermelon". Instead of:
newdf = df[df["fruit"]=="watermelon"+"*"]
Try:
newdf = df[df["fruit"].str.contains("watermelon")]
In your example, pandas is literally looking for cells containing the word "watermelon*".
missing the underscore in pd.read_csv on first call, also it looks like the actual location is incorrect. missing the // in the file location.

How do I print a particular field from a CSV file in python?

import csv
with open('doc.csv','r') as f:
file=csv.reader(f)
for row in file:
if row==['NAME']:
print(row)
I wanted to print all the names from a csv file in python. I tried this using this method but I got a blank output. Can anyone help me out ?
row is just a list, if you want first column from row, try:
print(row[0])
if you want all row, just
print(row[:])
if you want cells number 2 and 4:
print(row[1],row[3])
if you really want to have a better control of csv try with read_csv() pandas method:
import pandas as pd
df = pd.read_csv('AAPL.csv')
print(df['your_field_name_here'])

CSV File , Text to Column using Panda

def Text2Col(df_File):
for i in range(0,len(df_File)):
with open(df_File.iloc[i]['Input']) as inf:
with open(df_File.iloc[i]['Output'], 'w') as outf:
i=0
for line in inf:
i=i+1
if i==2 or i==3:
continue
outf.write(','.join(line.split(';')))
Above code is used to convert a csv file from text to column.
This code makes all values string ( because split() ) which is problematic for me.
I tried using map function but cant make it.
Is there any other way in which I can do this.
My input file has 5 columns, the first column is string, the second is int and the rest are float.
I think it required some modification in last statement
outf.write(','.join(line.split(';')))
Please let me know if any other input is required.
Ok, trying to help here. If this doesn't work, please specify in your question, what you're missing or what else needs to be done:
Use pandas to read in a csv file:
import pandas as pd
df = pd.read_csv('your_file.csv')
If you have a header on the first row, then use:
import pandas as pd
df = pd.read_csv('your_file.csv', header=0)
If you have a tab delimiter instead of a comma delimiter, then use:
import pandas as pd
df = pd.read_csv('your_file.csv', header=0, sep='\t')
Thank you !
Following Code worked:
def Text2Col(df_File):
for i in range(0,len(df_File)):
df = pd.read_csv(df_File.iloc[i]['Input'],sep=';')
df = df[df.index != 0]
df= df[df.index != 1]
df.to_csv(df_File.iloc[i]['Output'])
File_List="File_List.csv"
df_File=pd.read_csv(File_List)
Text2Col(df_File)
Input files are kept in same folder with same name as mentioned in File_List.xls
Output files will be created in same folder with separated in column. I deleted row 0 and 1 for my use. One can skip or add depending upon his requirement.
In above code df_file is dataframe contain two column list, first column is input file name and second column is output file name.

Categories

Resources