Python Select Specific Row and Column from CSV file - python

I want to print specific row and column from a csv file.
csv file look like,
R,IMSI,DATE FIRST EVENT,TIME FIRST EVENT,DATE LAST EVENT,TIME LAST EVENT,DC(HHMMSS),NC,VOLUME,SDR
R
C,634012007277489,20221122,150025,20221122,150025,711,1,0,294
C,634012031576061,20221122,150859,20221122,151738,905,3,0,1597
C,634012045006518,20221122,144022,20221122,144022,902,1,0,368
R
R
R,END OF REPORT
T,18
Output should be look like,
C,634012007277489,20221122,150025,20221122,150025,711,1,0,294 C,634012031576061,20221122,150859,20221122,151738,905,3,0,1597 C,634012045006518,20221122,144022,20221122,144022,902,1,0,368

Use pandas (you need to install it first by pip install pandas in the terminal).
import pandas as pd
df = pd.read_csv(fullpath.csv)
x = df[column_name].iloc[row_number]

Try reading it with pandas.read_csv()
import pandas as pd
df = pd.read_csv('filename.csv', skipfooter=1, header=1)
df.iloc[row_number,column_number]

You can use .iat too.
import pandas as pd
df = pd.read_csv("example.csv", delimiter =",")
for row in range(len(df)):
for column in range(len(df.columns)):
print(df.iat[row, column])

Related

How to save each row to csv in dataframe AND name the file based on the the first column in each row

I have the following df, with the row 0 being the header:
teacher,grade,subject
black,a,english
grayson,b,math
yodd,a,science
What is the best way to use export_csv in python to save each row to a csv so that the files are named:
black.csv
grayson.csv
yodd.csv
Contents of black.csv will be:
teacher,grade,subject
black,a,english
Thanks in advance!
Updated Code:
df8['CaseNumber'] = df8['CaseNumber'].map(str)
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
data.to_csv('c:\\users\\admin\\' + Casenumber + '.csv')'''
This can be done simply by using pandas:
import pandas as pd
# Preempt the issue of columns being numeric by marking dtype=str
df = pd.read_csv('your_data.csv', header=1, dtype=str)
df.set_index('teacher', inplace=True)
for teacher, data in df.iterrows():
data.to_csv(teacher + '.csv')
Edits:
df8.set_index('CaseNumber', inplace=True)
for Casenumber, data in df8.iterrows():
# Use r and f strings to make your life easier:
data.to_csv(rf'c:\users\admin\{Casenumber}.csv')

Python, how to add a new column in excel

I am having below file(file1.xlsx) as input. In total i am having 32 columns in this file and almost 2500 rows. Just for example i am mentioning 5 columns in screen print
I want to edit same file with python and want output as (file1.xlsx)
it should be noted i am adding one column named as short and data is a kind of substring upto first decimal of data present in name(A) column of same excel.
Request you to please help
Regards
Kawaljeet
Here is what you need...
import pandas as pd
file_name = "file1.xlsx"
df = pd.read_excel(file_name) #Read Excel file as a DataFrame
df['short'] = df['Name'].str.split(".")[0]
df.to_excel("file1.xlsx")
hello guys i solved the problem with below code:
import pandas as pd
import os
def add_column():
file_name = "cmdb_inuse.xlsx"
os.chmod(file_name, 0o777)
df = pd.read_excel(file_name,) #Read Excel file as a DataFrame
df['short'] = [x.split(".")[0] for x in df['Name']]
df.to_excel("cmdb_inuse.xlsx", index=False)

How to read specific columns in an xlsb in Python

I'm trying to read spreadsheets in an xlsb file in python and I've used to code below to do so. I found the code in stack overflow and I'm sure that it reads every single column in a row of a spreadsheet and appends it to a dataframe. How can I modify this code so that it only reads/appends specific columns of the spreadsheet i.e. I only want to append data in columns B through D into my dataframe.
Any help would be appreciated.
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row])
df = pd.DataFrame(df[1:], columns=df[0])
pyxlsb itself cannot do it, but it is doable with the help of xlwings.
import pandas as pd
import xlwings as xw
from pyxlsb import open_workbook as open_xlsb
with open_xlsb(r"W:\path\filename.xlsb") as wb:
Data=xw.Range('B:D').value
#Creates a dataframe using the first list of elements as columns
Data_df = pd.DataFrame(Data[1:], columns=Data[0])
Just do:
import pandas as pd
from pyxlsb import open_workbook as open_xlsb
df = []
with open_xlsb('some.xlsb') as wb:
with wb.get_sheet(1) as sheet:
for row in sheet.rows():
df.append([item.v for item in row if item.c > 0 and item.c < 4])
df = pd.DataFrame(df[1:], columns=df[0])
item.c refers to the column number starting at 0

how to subtract one column data from 2nd row to 1st row in csv files using python

I have CSV file with data like
data,data,10.00
data,data,11.00
data,data,12.00
I need to update this as
data,data,10.00
data,data,11.00,1.00(11.00-10.00)
data,data,12.30,1.30(12.30-11.00)
could you help me to update the csv file using python
You can use pandas and numpy. pandas reads/writes the csv and numpy does the calculations:
import pandas as pd
import numpy as np
data = pd.read_csv('test.csv', header=None)
col_data = data[2].values
diff = np.diff(col_data)
diff = np.insert(diff, 0, 0)
data['diff'] = diff
# write data to file
data.to_csv('test1.csv', header=False, index=False)
when you open test1.csv then you will find the correct results as you described above with the addition of a zero next to the first data point.
For more info see the following docs:
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
http://pandas.pydata.org/pandas-docs/version/0.18.1/generated/pandas.DataFrame.to_csv.html

Not reading all rows while importing csv into pandas dataframe

I am trying the kaggle challenge here, and unfortunately I am stuck at a very basic step.
I am trying to read the datasets into a pandas dataframe by executing following command:
test = pd.DataFrame.from_csv("C:/Name/DataMining/hillary/data/output/emails.csv")
The problem is that this file as you would find out has over 300,000 records, but I am reading only 7945.
print (test.shape)
(7945, 21)
Now I have double checked the file and I cannot find anything special about line number 7945. Any pointers why this could be happening?
I think better is use function read_csv with parameters quoting=csv.QUOTE_NONE and error_bad_lines=False. link
import pandas as pd
import csv
test = pd.read_csv("output/Emails.csv", quoting=csv.QUOTE_NONE, error_bad_lines=False)
print (test.shape)
#(381422, 22)
But some data (problematic) will be skipped.
If you want skip emails body data, you can use:
import pandas as pd
import csv
test = pd.read_csv(
"output/Emails.csv",
quoting=csv.QUOTE_NONE,
sep=',',
error_bad_lines=False,
header=None,
names=[
"Id", "DocNumber", "MetadataSubject", "MetadataTo", "MetadataFrom",
"SenderPersonId", "MetadataDateSent", "MetadataDateReleased",
"MetadataPdfLink", "MetadataCaseNumber", "MetadataDocumentClass",
"ExtractedSubject", "ExtractedTo", "ExtractedFrom", "ExtractedCc",
"ExtractedDateSent", "ExtractedCaseNumber", "ExtractedDocNumber",
"ExtractedDateReleased", "ExtractedReleaseInPartOrFull",
"ExtractedBodyText", "RawText"])
print (test.shape)
#delete row with NaN in column MetadataFrom
test = test.dropna(subset=['MetadataFrom'])
#delete headers in data
test = test[test.MetadataFrom != 'MetadataFrom']

Categories

Resources