I'm trying to concatenate two columns from an existing Excel file that has multiple sheets inside, using Python.
I already started with importing the file to "jupyter" with this code down below and it worked, but i am stuck in this next step.
import xlrd
import pandas as pd
df = pd.read_excel (r'C:\Users\zahir\Desktop\Stage\BDD_Cells_2G+3G+4G_01072019.xlsx')
print(df)
The column headers are unknown, given the information provided in your question. Not knowing, I would start here:
column_header_1='ch1'
column_header_2='ch2'
column_header_3='ch3'
df['newColumn3']= df[column_header_1].map(str)+df[column_header_2].map(str)
Related
In stack, overflow I see a lot of questions about removing index from dataframes made be to_csv.
However, what I want to do is add an index to an already made csv file with no index.
Here is my file:
How do we add an index to this csv with pandas?
If you read csv file as dataframe, pandas will automatically generate index. You don't need to do something else.
So, just read it and write again as below
import pandas as pd
df = pd.read_csv("your_file.csv")
df.to_csv("your_file_to_save.csv")
I'm new using pandas and I'm tring to use pandas read excel to work with a file as a df. The spreadsheet looks like this:
Excel Matrix
The problem is that this file contains double headers in the colums and rows and the first header for each of them include merged cells. I tried this:
file = 'country_sector_py.xlsx'
matrix = pd.read_excel(file, sheet_name = 'matrix', header=[0, 1], index_col=[0, 1])
the error I get is "ValueError: Length of new names must be 1, got 2." I've read some related posts that says it's due to some the headers are repeated, but I haven't been able to solve it. any guide would be much appreciated.
References:
Pandas read excel sheet with multiple header when first column is empty
Error when using pandas read_excel(header=[0,1])
Not an answer, but to post more details than comments allow...
Using your code I cannot recreate.
import pandas as pd
df = pd.read_excel('matrix.xlsx', sheet_name = 'matrix', header=[0,1], index_col=[0, 1])
df
Worst I get is copying 'region 2' twice doesn't show again and also messes up the sub-columns numbering. Example:
Must be something else in your file. Share it if you can, else look around inside it, or even open and perhaps save as a different Excel version (maybe XLSM or if that then not than).
Maybe worth checking the version of Pandas with pip show pandas
>>>># pip show pandas
Name: pandas
Version: 1.3.0
Summary: Powerful data structures for data analysis, time series, and statistics
Home-page: https://pandas.pydata.org
Author: The Pandas Development Team
Author-email: pandas-dev#python.org
I need to extract the domain for example: (http: //www.example.com/example-page, http ://test.com/test-page) from a list of websites in an excel sheet and modify that domain to give its url (example.com, test.com). I have got the code part figured put but i still need to get these commands to work on excel sheet cells in a column automatically.
here's_the_code
I think you should read in the data as a pandas DataFrame (pd.read_excel), make a function from your code then apply to the dframe (df.apply). Then it is easy to save to excel with pd.to_excel().
ofc you will need pandas to be installed.
Something like:
import pandas as pd
dframe = pd.read_excel(io='' , sheet_name='')
dframe['domains'] = dframe['urls col name'].apply(your function)
dframe.to_excel('your path')
Best
I have large data-frame in a Csv file sample1 from that i have to generate a new Csv file contain only 100 data-frame.i have generate code for it.but i am getting key Error the label[100] is not in the index?
I have just tried as below,Any help would be appreciated
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv")
data_frame1 = data_frame[:100]
data_frame.to_csv("C:/users/raju/sample.csv")`
`
The correct syntax is with iloc:
data_frame.iloc[:100]
A more efficient way to do it is to use nrows argument who purpose is exactly to extract portions of files. This way you avoid wasting resources and time parsing useless rows:
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv", nrows=101) # 100+1 for header
data_frame.to_csv("C:/users/raju/sample.csv")
Total beginner to python: Trying to import excel values from a column. Lookup the imported values in python dictionary (was able to create this) and then write the results into the excel file and see if they match to another column in the file.
You can use a module called pandas.
pip install pandas
To read the file use the following:
import pandas as pd
file = pd.ExcelFile('path/to/excelsheet/').parse('sheet_you_want_to_use') # 'Sheet 1' for Sheet 1
you can now access the columns using the column names as keys: file['column_name'].
You can now append the looked up values to a list. Then write to a excel file as follows:
list = ['....values....']
pd.DataFrame(list).to_excel('where/to/save/file')
I would advise you to read the following documentation:
pandas DataFrame
pandas ExcelFile
pandas to_excel
pandas