I need to extract the domain for example: (http: //www.example.com/example-page, http ://test.com/test-page) from a list of websites in an excel sheet and modify that domain to give its url (example.com, test.com). I have got the code part figured put but i still need to get these commands to work on excel sheet cells in a column automatically.
here's_the_code
I think you should read in the data as a pandas DataFrame (pd.read_excel), make a function from your code then apply to the dframe (df.apply). Then it is easy to save to excel with pd.to_excel().
ofc you will need pandas to be installed.
Something like:
import pandas as pd
dframe = pd.read_excel(io='' , sheet_name='')
dframe['domains'] = dframe['urls col name'].apply(your function)
dframe.to_excel('your path')
Best
Related
please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?
How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent
enter image description here
Hi, I am very new to Python and I plan to create a final exportable table with these reviews scraped from a website to see the words that were most used. I have thus managed to get this 2 columns but have no idea how to proceed, can I directly export this into a table in excel or must I convert it into a dataframe then export it to a CSV? And what is the required code to run as such? Thank you so much for your help!!
It's convenient to use pandas library for working with dataframes:
import pandas as pd
series = pd.Series(wordcount)
series.to_csv("wordcount.csv")
However, if you use the code above, you'll get a warning. To fix it, there are 2 ways:
1) Add header parameter:
series.to_csv("wordcount.csv", header=True)
2) Or convert series to dataframe and then save it (without new index):
df = series.reset_index()
df.to_csv("wordcount.csv", index=False)
There are multiple ways to read excel data into python.
Pandas provides aslo an API for writing and reading
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('File.xlsx', sheetname='Sheet1')
That works fine.
BUT: What is the way to access the tables of every sheet directly into a pandas dataframe??
The above picture shows a sheet including a table SEPARATED THAN CELL (1,1).
Moreover the sheet might include several tables (listobjects in VBA).
I can not find anywhere the way to read them into pandas.
Note1: It is not possible to modify the workbook to bring all the tables towards cell(1,1).
Note2: I would like to use just pandas (if it is possible) and minimize the need to import other libraries. But it there is no other way I am ready to use other lybray. In any case I could not manage with xlwings for instance.
here it looks like its possible to parse the excel file, but no soilution is provided for tables, just for complete sheets.
The documentation of pandas does not seem to offer that possibility.
Thanks.
You can use xlwings, great package for working with excel files in python.
This is for a single table, but it is pretty trivial to use xlwings collections (App>books>sheets>tables) to iterate over all tables. Tables are ofcourse listobjects.
import xlwings
import pandas
with xlwings.App() as App:
_ = App.books.open('my.xlsx')
rng = App.books['my.xlsx'].sheets['mysheet'].tables['mytablename'].range
df: pandas.DataFrame = rng.expand().options(pandas.DataFrame).value
I understand that this question has been marked solved already, but I found an article that provides a much more robust solution:
Full Post
I suppose a newer version of this library supports better visibility of the workbook structure. Here is a summary:
Load the workbook using the load_workbook function from openpyxl
Then, you are able to access the sheets within, which contains collection of List-Objects (Tables) in excel.
Once you gain access to the tables, you are able to get to the range addresses of those tables.
Finally they loop through the ranges and create a pandas data-frame from it.
This is a nicer solution as it gives us the ability to loop through all the sheets and tables in a workbook.
Here is a way to parse one table, howver it's need you to know some informations on the seet parsed.
df = pd.read_excel("file.xlsx", usecols="B:I", index_col=3)
print(df)
Not elegant and work only if one table is present inside the sheet, but that a first step:
import pandas as pd
import string
letter = list(string.ascii_uppercase)
df1 = pd.read_excel("file.xlsx")
def get_start_column(df):
for i, column in enumerate(df.columns):
if df[column].first_valid_index():
return letter[i]
def get_last_column(df):
columns = df.columns
len_column = len(columns)
for i, column in enumerate(columns):
if df[column].first_valid_index():
return letter[len_column - i]
def get_first_row(df):
for index, row in df.iterrows():
if not row.isnull().values.all():
return index + 1
def usecols(df):
start = get_start_column(df)
end = get_last_column(df)
return f"{start}:{end}"
df = pd.read_excel("file.xlsx", usecols=usecols(df1), header=get_first_row(df1))
print(df)
I have an Excel sheet, with data of stock prices and I want to build a code where I check if the guess is correct or not. So I need to compare my python value to the value on the Excel sheet.
I have tried using repl.it in a .csv file, but it was not compatible and I was not able to check my values. I have also tried using a .xlsx file on repl.it, but I still could not access the values.
Is there any way I can compare the values?
Try this:
import pandas as pd
table = pd.read_excel('file_name.xslx')
for row in table.values.tolist():
first_value_in_row = row[0]
second_value_in_row = row[1]
I currently have an Excel workbook with some graphs (charts?). The graphs are plotted from numerical values. I can access the values in LibreOffice if I right click on the graph and select "Data table". These values are nowhere else in the file.
I would like to access these values programmatically with Python. I tried things like xlrd, but it seems xlrd ignores graphical elements. When I run it on my workbook I only get empty cells back.
Have you ever encountered this issue?
Sadly I cannot provide the file as it is confidential.
import pandas as pd
df = pd.read_excel('path/name_of_your_file.xlsx')
print(df.head())
You should have a dataframe (df) to play with in python!
I never worked with graphical excel file. But i used to read normal excel with following code. have you tried this?
import xlrd
file = 'temp.xls'
book = xlrd.open_workbook(file)
for sheet in book.sheets():
#to check columns in sheet
if sheet.ncols:
#row values
row_list = sheet.row_values
for value in row_list:
print(value)