Total beginner to python: Trying to import excel values from a column. Lookup the imported values in python dictionary (was able to create this) and then write the results into the excel file and see if they match to another column in the file.
You can use a module called pandas.
pip install pandas
To read the file use the following:
import pandas as pd
file = pd.ExcelFile('path/to/excelsheet/').parse('sheet_you_want_to_use') # 'Sheet 1' for Sheet 1
you can now access the columns using the column names as keys: file['column_name'].
You can now append the looked up values to a list. Then write to a excel file as follows:
list = ['....values....']
pd.DataFrame(list).to_excel('where/to/save/file')
I would advise you to read the following documentation:
pandas DataFrame
pandas ExcelFile
pandas to_excel
pandas
Related
In stack, overflow I see a lot of questions about removing index from dataframes made be to_csv.
However, what I want to do is add an index to an already made csv file with no index.
Here is my file:
How do we add an index to this csv with pandas?
If you read csv file as dataframe, pandas will automatically generate index. You don't need to do something else.
So, just read it and write again as below
import pandas as pd
df = pd.read_csv("your_file.csv")
df.to_csv("your_file_to_save.csv")
please see attached photo
here's the image
I only need to import a specific column with conditions(such as specific data found in that column). And also, I only need to remove unnecessary columns. dropping them takes too much code. What specific code or syntax is applicable?
How to get a column from pandas dataframe is answered in Read specific columns from a csv file with csv module?
To quote:
Pandas is spectacular for dealing with csv files, and the following
code would be all you need to read a csv and save an entire column
into a variable:
import pandas as pd
df = pd.read_csv(csv_file)
saved_column = df.column_name #you can also use df['column_name']
So in your case, you just save the the filtered data frame in a new variable.
This means you do newdf = data.loc[...... and then use the code snippet from above to extract the column you desire, for example newdf.continent
Actually i am new to python pandas module. by using pandas i want to filter data in one column and add to the filtered data into new column name like 'comment'. Suppose in one excel file headers be like A,B,C,D and E. in B header has different data like 'python','flask','pandas','python','numpy','pandas','pandas,'python','pandas'. so i want to filter in B header by selecting 'python' that data will be filtered and add data to the new column name like 'comment'.
here is my python code:
import pandas as pd
import numpy as np
source_file = 'C:\\Users\\user98\\Desktop\\excel_1.xlsx'
read_file = pd.read_excel(source_file)
data=read_file[read_file.B== 'python']
read_file['comment']=data
print(read_file)
in above code throwing the error like "ValueError: Wrong number of items passed 3, placement implies 1".
in my concern comment column will be added and In B header after selecting the filter using python data will be added to same row in comment line row.
for example: this is excel data set before filtering the B header column by selecting python name in excel file.
B comment
python python
pandas pandas
numpy numpy
python pandas
pandas pandas
python python
after filtering the data as shown below like data will be added to both columns for same row numbers.
B comment
python B-python
python B-python
python B-python
I'm trying to concatenate two columns from an existing Excel file that has multiple sheets inside, using Python.
I already started with importing the file to "jupyter" with this code down below and it worked, but i am stuck in this next step.
import xlrd
import pandas as pd
df = pd.read_excel (r'C:\Users\zahir\Desktop\Stage\BDD_Cells_2G+3G+4G_01072019.xlsx')
print(df)
The column headers are unknown, given the information provided in your question. Not knowing, I would start here:
column_header_1='ch1'
column_header_2='ch2'
column_header_3='ch3'
df['newColumn3']= df[column_header_1].map(str)+df[column_header_2].map(str)
I have large data-frame in a Csv file sample1 from that i have to generate a new Csv file contain only 100 data-frame.i have generate code for it.but i am getting key Error the label[100] is not in the index?
I have just tried as below,Any help would be appreciated
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv")
data_frame1 = data_frame[:100]
data_frame.to_csv("C:/users/raju/sample.csv")`
`
The correct syntax is with iloc:
data_frame.iloc[:100]
A more efficient way to do it is to use nrows argument who purpose is exactly to extract portions of files. This way you avoid wasting resources and time parsing useless rows:
import pandas as pd
data_frame = pd.read_csv("C:/users/raju/sample1.csv", nrows=101) # 100+1 for header
data_frame.to_csv("C:/users/raju/sample.csv")