Extracting specific data from multiple text files to excel file using python - python

I have mutiple text files that contains data like file1 ,file2,file3. Its just an example, I am wondering how to populate specific data in an Excel sheet like this excel sheet
I am new to learning python and the combination of text to excel through python that's why finding it hard to approch

Basically what you need is to Parse and write a new File in the csv File format for the use in excel
file1 -> PythonScript.py -> excel.csv
File Parser Python Tutorial Tutorial
The .csv File looks like this. You have a header and the data seperated with commas.
excel.csv:
Name,Data
hibiscus_3,54k
hibiscus_7,67k
Rose_3,87MB
Hope i could help you

Related

exporting to csv converts text to date

From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.

How to filter out useable data from csv files using python?

Please help me in extracting important data from a .csv file using python. I got .csv file from 'citrine'.
I want to extract the element name and atomic percentage in the form of "Al2.5B0.02C0.025Co14.7Cr16.0Mo3.0Ni57.48Ti5.0W1.25Zr0.03"
ORIGINAL
[{""element"":""Al"",""idealAtomicPercent"":{""value"":""5.4""}},{""element"":""B"",""idealAtomicPercent"":{""value"":""0.02""}},{""element"":""C"",""idealAtomicPercent"":{""value"":""0.13""}},{""element"":""Co"",""idealAtomicPercent"":{""value"":""7.5""}},{""element"":""Cr"",""idealAtomicPercent"":{""value"":""6.1""}},{""element"":""Mo"",""idealAtomicPercent"":{""value"":""2.0""}},{""element"":""Nb"",""idealAtomicPercent"":{""value"":""0.5""}},{""element"":""Ni"",""idealAtomicPercent"":{""value"":""61.0""}},{""element"":""Re"",""idealAtomicPercent"":{""value"":""0.5""}},{""element"":""Ta"",""idealAtomicPercent"":{""value"":""9.0""}},{""element"":""Ti"",""idealAtomicPercent"":{""value"":""1.0""}},{""element"":""W"",""idealAtomicPercent"":{""value"":""5.8""}},{""element"":""Zr"",""idealAtomicPercent"":{""value"":""0.13""}}]
Original CSV
Expected output
Without having the file structure it is hard to tell.
Try to load the file using:
import csv
with open(file_path) as file:
reader = csv.DictReader(...)
You will have to figure out the arguments for the function which depend on the file.

Is there anyway to read html file as excel or csv using pandas?

I have a Excel file in .xls extension, I cannot read as Excel using pd.read_excel, it will shows error as below.
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<table '
but if if I use pd.read_html, it can be read, however, it cannot read their positions and create a new DataFrame like pd.read_excel.
Can i know is there anyway to read this kind of file? or do i need to convert the file to Excel/CSV first? if so, may i know how?
Looks like someone saved html file as xls, try opening that file in a text editor first and have a look.

Convert Excel zip file content to actual Excel file?

I am using cmis package available in python to download the document from FileNet repository. I am using getcontentstream method available in the package. However it returns content file that beings with 'Pk' and ends in 'PK'. when I googled I came to know it is excel zip package content. is there a way to save the content into an excel file. I should be able to open the downloaded excel. I am using below code. but getting byte-liked object is required not str. I noticed type of result is string.io.
# expport the result
result = testDoc.getContentStream()
outfile = open(sample.xlsx, 'wb')
outfile.write(result.read())
result.close()
outfile.close()
Hi there and welcome to stackoverflow. There are a few bits I noticed about your post.
To answer the error code you are getting directly. You called the outfile FileStream to be in terms of binary, however the result.read() must be in Unicode string format which is why you are getting this error. You can try to encode it before passing it to the outfile.write() function (ex: outfile.write(result.read().encode())).
You can also simply just write Unicode directly by:
result = testDoc.getContentStream()
result_text = result.read()
from zipfile import ZipFile
with ZipFile(filepath, 'w') as zf:
zf.writestr('filename_that_is_zipped', result_text)
Not I am not sure what you have in your ContentStream but note that a excel file is made up of xml files zipped up. The minimum file structure you need for an excel file is as follows:
_rels/.rels contains excel schemas
docProps/app.xml contains number of sheets and sheet names
docProps/core.xml boiler plate user info and date created
xl/workbook.xml contains sheet names rdId to workbook link
xl/worksheets/sheet1.xml (and more sheets in this folder) contains cell data for each sheet
xl/_rels/workbook.xml.rels contains sheet file locations within zipfile
xl/sharedStrings.xml if you have string only cell values
[Content_Types].xmlapplies schemas to file types
I recently went through piecing together an excel file from scratch, if you want to see the code check out https://github.com/PydPiper/pylightxl

Reformatting text file to be table in csv format

I used PyPDF2 to retrieve tabular data and I put it into a text file. It comes out with each word or number as a new line. I want to specify that the "states" be rows, the "years" to be column headers, and the following numbers to be put into the rows 10 at a time.
The PDF of the file has a good illustration of what I am trying to do. Does anyone have some good ideas as to how to reformat the text file to do so?
Provided here is the link to the pdf which I am taking data from.
file:///V:/Final%20Project/Data/DuckStampSales.pdf
My text file looks as such.
textfile

Categories

Resources