pandas.read_excel() on strangley formatted excel tables - python

Would pandas.read_excel() read something like this?
As you can see, there's a few lines of text before and after the table.
I cannot manually delete those unwanted lines of text.

you can try below code
import pandas as pd
data = pd.read_csv("/Users/manoj/demo.csv", encoding='utf-8', skiprows=5)
Refer Panda documentation

if pandas skip rows doesn't work;
you could use xlsxwriter or some other xlsx engine to read it into an array then import that array into a pandas dataframe

Related

Converte json file to csv file with proper formatted rows and columns in excel

Currently I'm working a script that can convert json file to csv format my script is working but I need to modify it to have proper data format like having rows and columns when the json file is converted to csv file, May I know what I need to add or modify on my script?
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_csv (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Taking reference from your code,you can try
df.to_csv(r'/home/admin/xml/myfileSample.csv', encoding='utf-8', header=header,index = None, sep=":")
This could be useful.
import pandas as pd
df_json=pd.read_json("input_file.json")
df_json.head()
df_json.to_csv("output_file.csv",index=False)
Your code is all fine, Just change the to_csv to to_excel function and it should work all fine!
import pandas as pd
df = pd.read_json (r'/home/admin/myfile.json')
df.to_excel (r'/home/admin/xml/myfileSample.csv', index = None, sep=":")
Learn more about the to_excel function of pandas here:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_excel.html

What code should I use in extracting specific column (with specific data) from a csv file to python. It can be either pandas or numpy

please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?
How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent

Importing CSV Data and formatting in Excel via Python

I am importing CSV based data in an Excel spreadsheet via Python. I would like to know if it is possible to import the data and divide it in several columns (like we would do via the importing menu under DATA in Excel).
So far, I convert my CSV to a pandas and imported it in Excel, but all my data is clustered in 1 column :
df = pd.read_csv(r'C:\Users\Contractuel\Desktop\Test\Candiac_TypeLum_UTF8.csv')
writer = pd.ExcelWriter('TypeLum_TEST.xlsx')
df.to_excel(writer, index=False)
writer.save()
Thanks!
The read_csv method takes an argument sep= which tells pandas what separates the data. You probably need to use this to specify what the separator in the CSV file is. Default is , but CSVs sometimes have ; or other things as separators.

Python code for word search in excel

I need a code that can search for the specific word in excel file. In the specific columns and I want it to output with columns letter and rows number and sheet name? I have started but don't know further:
from xlrd import open_workbook
book = open_workbook("excel1.xlsx")
for sheet in book.sheets():
its needs to print row number, column letter, sheet name? also if you can use pandas instead xlrd it will be great.
An alternative is you may use Pandas library to do this. Pandas works on data frames i.e. tabular data, so have rows and columns. You need to specify your needs according to the dataset in Pandas.
import pandas as pd
df = pd.read_excel('filename.xls')
df[df['col_name'].str.contains('ABC')].head()
df.query('col_name == ["words"]').head()
df[df['Column'] >= 'Your_search_word'].head()
etc. You can search more on the documentation of Pandas http://pbpython.com/excel-pandas-comp-2.html
Note: Pandas merge all sheets together to create one data frame in a tabular structure that can make things easier to search.

How to convert Pandas Dataframe to csv reader directly in python?

I have a csv file on with millions of rows. I used to create a dictionary out the csv file like this
with open('us_db.csv', 'rb') as f:
data = csv.reader(f)
for row in data:
Create Dictionary based on a column
Now to filter the rows based on some conditions I use pandas Dataframe as it is super fast in these operations. I load the csv as pandas Dataframe do some filtering. Then I want to continue doing the above. I thought of using pandas df.iterrows() or df.itertuples() but it is really slow.
Is there a way to convert the pandas dataframe to csv.reader() directly so that I can continue to use the above code. If I use csv_rows = to_csv(), it gives a long string. Ofcourse, I can write out a csv and then read from it again. But I want to know if there is a way to skip the extra read and write to a file.
You could do something like this..
import numpy as np
import pandas as pd
from io import StringIO
import csv
#random dataframe
df = pd.DataFrame(np.random.randn(3,4))
buffer = StringIO() #creating an empty buffer
df.to_csv(buffer) #filling that buffer
buffer.seek(0) #set to the start of the stream
for row in csv.reader(buffer):
#do stuff
Why don't you apply the Create Dictionary function to the target column?
Something like:
df['column_name'] = df['column_name'].apply(Create Dictionary)

Categories

Resources