Retrieve the headers in excel using Python - python

I want to retrieve headers of this excel file (Only A,B,C)and store it in a list using Python. I opened my file but I am unable to retrieve it.
import xlrd
file_location= "C:/Users/Desktop/Book1.xlsx"
workbook= xlrd.open_workbook(file_location)
sheet=workbook.sheet_by_index(0)
Can anyone please help me with that? I am new to Python.
Thank you for your help.

You could also try using the numpy method loadtxt:
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
Which gives you arrays for each column (you can also skip specified columns so that you only get the A,B,C as you desire). You can write a for loop to get the first entry of each column and put them in a list.
data = loadtxt("Book1.xlsx")
headers = []
for c in range(1,data.shape[0]):
if data[c, 0] != "":
headers.append(data[c, 0])

iterheaders = iter(sheet.row(1))
headers = [str(cell.value) for cell in iterheaders[1:] if cell is not None and cell.value != '']
Hope that helps...

Related

write xls cel in existing file without overwriting rest file

I'm trying to update a single cell in an existing excel file.
Here is my code:
file=(r'C:/Users/user/Desktop/test.xls')
df=pd.read_excel(file)
code=input('Patiste Kodiko:')
size=0
sizeint=int(input('Patiste Noumero:'))
given=int(input('Posa efigan?:'))
oldstock=(df[size].where(df['ΚΩΔΙΚΟΣ']==code))
oldstock=oldstock.dropna()
oldstock=oldstock.values[0]
oldstock = int(oldstock)
newstock = oldstock - given
x=(df['Α/Α'].where(df['ΚΩΔΙΚΟΣ']==code)+2)
x=x.dropna()
x = int(x)
dffin=df.at[x,size] = newstock
dffin.to_excel(file)
close()
After running this code, I receive an empty .xls file with only one cell written and everything else empty.
What am I missing here?
Thanks in advance.
You should be able to do a quick df.at function if you have your X and y cords or names.
import pandas as pd
fileLocation = (r'TestExcelsheet.xlsx')
excel = pd.read_excel(FileLocation,converters={'NimikeNro':str})
excel.dtypes
print(excel.index)
print(excel.head)
excel.at[1,'One'] = 444
print(excel)
excel.to_excel('TestExcelsheet.xlsx')
Where it's the Excel.at function you need to use to add data at a single cell and use a for loop for more than one cell

Copy the first column of a .csv file into a new file

I know this is a very easy task, but i am acting pretty dumb right now and dont get it solved. I need to copy the first column of a .csv file including header into a newly created file. My code:
station = 'SD_01'
import csv
import pandas as pd
df = pd.read_csv(str( station ) + "_ED.csv", delimiter =';')
list1 = []
matrix1 = df[df.columns[0]].as_matrix()
list1 = matrix1.tolist()
with open('{0}_RRS.csv'.format(station),"r+") as f:
writer = csv.writer(f)
writer.writerows(map(lambda x: [x], list1))
As result, my file has an empty line between the values, has no header (i could continue without the header, though) and something at the bottom which a can not identify.
>350
>
>351
>
>352
>
>...
>
>949
>
>950
>
>Ž‘’“”•–—˜™š›œžŸ ¡¢
Just a short impression of the 1200+ lines
I am pretty sure that this is a very clunky way to do this; easyier ways are always welcome.
How do i get rid of all the empty lines and this crazy stuff in the end?
When you get a column from a dataframe, it's returned as type Series - and the Series has a built in to_csv method you can use. So you don't need to do any matrix casting or anything like that.
import pandas as pd
df = pd.read_csv('name.csv',delimiter=';')
first_column = df[[df.columns[0]]
first_column.to_csv('new_file.csv')

how to edit a csv in python and add one row after the 2nd row that will have the same values in all columns except 1

I'm new in Python language and i'm facing a small challenge in which i havent been able to figure it out so far.
I receive a csv file with around 30-40 columns and 5-50 rows with various details in each cell. The 1st row of the csv has the title for each column and by the 2nd row i have item values.
What i want to do is to create a python script which will read the csv file and every time to do the following:
Add a row after the actual 1st item row, (literally after the 2nd row, cause the 1st row is titles), and in that new 3rd row to contain the same information like the above one with one difference only. in the column "item_subtotal" i want to add the value from the column "discount total".
all the bellow rows should remain as they are, and save this modified csv as a new file with the word "edited" added in the file name.
I could really use some help because so far i've only managed to open the csv file with a python script im developing, but im not able so far to add the contents of the above row to that newly created row and replace that specific value.
Looking forward any help.
Thank you
Here Im attaching the CSV with some values changed for privacy reasons.
order_id,order_number,date,status,shipping_total,shipping_tax_total,fee_total,fee_tax_total,tax_total,discount_total,order_total,refunded_total,order_currency,payment_method,shipping_method,customer_id,billing_first_name,billing_last_name,billing_company,billing_email,billing_phone,billing_address_1,billing_address_2,billing_postcode,billing_city,billing_state,billing_country,shipping_first_name,shipping_last_name,shipping_address_1,shipping_address_2,shipping_postcode,shipping_city,shipping_state,shipping_country,shipping_company,customer_note,item_id,item_product_id,item_name,item_sku,item_quantity,item_subtotal,item_subtotal_tax,item_total,item_total_tax,item_refunded,item_refunded_qty,item_meta,shipping_items,fee_items,tax_items,coupon_items,order_notes,download_permissions_granted,admin_custom_order_field:customer_type_5
15001_TEST_2,,"2017-10-09 18:53:12",processing,0,0.00,0.00,0.00,5.36,7.06,33.60,0.00,EUR,PayoneCw_PayPal,"0,00",0,name,surname,,name.surname#gmail.com,0123456789,"address 1",,41541_TEST,location,,DE,name,surname,address,01245212,14521,location,,DE,,,1328,302,"product title",103,1,35.29,6.71,28.24,5.36,0.00,0,,"id:1329|method_id:free_shipping:3|method_title:0,00|total:0.00",,id:1330|rate_id:1|code:DE-MWST-1|title:MwSt|total:5.36|compound:,"id:1331|code:#getgreengent|amount:7.06|description:Launchcoupon for friends","text string",1,
You can also use pandas to manipulate the data from the csv like this:
import pandas
import copy
Read the csv file into a pandas dataframe:
df = pandas.read_csv(filename)
Make a deepcopy of the first row of data and add the discount total to the item subtotal:
new_row = copy.deepcopy(df.loc[1])
new_row['item_subtotal'] += new_row['discount total']
Concatenate the first 2 rows with the new row and then everything after that:
df = pandas.concat([df.loc[:1], new_row, df.loc[2:]], ignore_index=True)
Change the filename and write the out the new csv file:
filename = filename.strip('.csv') + 'edited.csv'
df.to_csv(filename)
I hope this helps! Pandas is great for cleanly handling massive amounts of data, but may be overkill for what you are trying to do. Then again, maybe not. It would help to see an example data file.
The first step is to turn that .csv into something that is a little easier to work with. Fortunately, python has the 'csv' module which makes it easy to turn your .csv file into a much nicer list of lists. The below will give you a way to both turn your .csv into a list of lists and turn the modified data back into a .csv file.
import csv
import copy
def csv2list(ifile):
"""
ifile = the path of the csv to be converted into a list of lists
"""
f = open(ifile,'rb')
olist=[]
c = csv.reader(f, dialect='excel')
for line in c:
olist.append(line) #and update the outer array
f.close
return olist
#------------------------------------------------------------------------------
def list2csv(ilist,ofile):
"""
ilist = the list of lists to be converted
ofile = the output path for your csv file
"""
with open(ofile, 'wb') as csvfile:
csvwriter = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
[csvwriter.writerow(x) for x in ilist]
Now, you can simply copy list[1] and change the appropriate element to reflect your summed value using:
listTemp = copy.deepcopy(ilist[1])
listTemp[n] = listTemp[n] + listTemp[n-x]
ilist.insert(2,listTemp)
As for how to change the file name, just use:
import os
newFileName = os.path.splitext(oldFileName)[0] + "edited" + os.path.splitext(oldFileName)[1]
Hopefully this will help you out!

Python: for loop and saving to a new CSV file with pandas

been searching everywhere but cannot seem to solve this problem.
I have a csv file which contains two headings, "Name" and "URL". I've saved this in a variable called df1, as per below:
`
import pandas as pd
df1 = pd.read_csv('yahoo finance.csv')
print(df1)
Name URL
0 Gainers https://au.finance.yahoo.com/gainers?e=ax
1 Losers https://au.finance.yahoo.com/losers
2 Active https://au.finance.yahoo.com/most-active
`
What I'm trying to do is go into each of the above URL's, parse the table within it, and save the data in a new CSV file.
`
for u in df1.URL:
u2 = pd.read_html(u)
for n in u2:
row2 = pd.DataFrame(num)
row2.to_csv(name+'.csv', index=False)
`
I am missing a big step here that I can't resolve, I want to save the table from each URL into a new CSV with the name from the "Name" column of the corresponding url.
Can someone help me fix this simple part? Currently all this code does is save the last URL's data to a csv named "Active", it's not saving the first two URL's at all.
Thank you in advance!
Thank you, this has helped solve the issue now, the CSV's are saving as they should be. The updated code is:
for row in df1.iterrows():
name = row[1]['Name']
url = row[1]['URL']
url2 = str(url)
url3 = pd.read_html(url2)
for num in url3:
row2 = pd.DataFrame(num)
row2.to_csv(name+'.csv', index=False)
Do you mean you need to iterate a dataframe row by row? Is URL value used for getting data. Is Name is used for saving data. If yes probably you need it
for row in df.iterrows():
name = row[1]['Name']
url = row[1]['URL']

Python to delete a row in excel spreadsheet

I have a really large excel file and i need to delete about 20,000 rows, contingent on meeting a simple condition and excel won't let me delete such a complex range when using a filter. The condition is:
If the first column contains the value, X, then I need to be able to delete the entire row.
I'm trying to automate this using python and xlwt, but am not quite sure where to start. Seeking some code snippits to get me started...
Grateful for any help that's out there!
Don't delete. Just copy what you need.
read the original file
open a new file
iterate over rows of the original file (if the first column of the row does not contain the value X, add this row to the new file)
close both files
rename the new file into the original file
I like using COM objects for this kind of fun:
import win32com.client
from win32com.client import constants
f = r"h:\Python\Examples\test.xls"
DELETE_THIS = "X"
exc = win32com.client.gencache.EnsureDispatch("Excel.Application")
exc.Visible = 1
exc.Workbooks.Open(Filename=f)
row = 1
while True:
exc.Range("B%d" % row).Select()
data = exc.ActiveCell.FormulaR1C1
exc.Range("A%d" % row).Select()
condition = exc.ActiveCell.FormulaR1C1
if data == '':
break
elif condition == DELETE_THIS:
exc.Rows("%d:%d" % (row, row)).Select()
exc.Selection.Delete(Shift=constants.xlUp)
else:
row += 1
# Before
#
# a
# b
# X c
# d
# e
# X d
# g
#
# After
#
# a
# b
# d
# e
# g
I usually record snippets of Excel macros and glue them together with Python as I dislike Visual Basic :-D.
You can try using the csv reader:
http://docs.python.org/library/csv.html
You can use,
sh.Range(sh.Cells(1,1),sh.Cells(20000,1)).EntireRow.Delete()
will delete rows 1 to 20,000 in an open Excel spreadsheet so,
if sh.Cells(1,1).Value == 'X':
sh.Cells(1,1).EntireRow.Delete()
If you just need to delete the data (rather than 'getting rid of' the row, i.e. it shifts rows) you can try using my module, PyWorkbooks. You can get the most recent version here:
https://sourceforge.net/projects/pyworkbooks/
There is a pdf tutorial to guide you through how to use it. Happy coding!
I have achieved this using Pandas package....
import pandas as pd
#Read from Excel
xl= pd.ExcelFile("test.xls")
#Parsing Excel Sheet to DataFrame
dfs = xl.parse(xl.sheet_names[0])
#Update DataFrame as per requirement
#(Here Removing the row from DataFrame having blank value in "Name" column)
dfs = dfs[dfs['Name'] != '']
#Updating the excel sheet with the updated DataFrame
dfs.to_excel("test.xls",sheet_name='Sheet1',index=False)

Categories

Resources