Unable to append two dataframes in pandas with same columns length - python

This is what I am trying to accomplish
1. There is a master table. So i extracted column 1,2,3 and columns 4,5,6. I need to stack them on top of one another (not side by side).
2. For this, I extracted them into two data frames and tried to append but it does not seem to work. Here is my code.
import pandas as pd
import html5lib
link = "https://en.wikipedia.org/wiki/Mobile_telephone_numbering_in_India"
tables = pd.read_html(link)[3]#select the 3rd section which is our table
base=tables.iloc[:, 0:3]
top=tables.iloc[:,3:6]
print(base)
base=base.append(top)
print(base)
Here is my output:
enter image description here
I need the rows to be added to one another. How can I do it?

think you need to use concat instead of append;
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.concat.html

When you want to append a 2nd dataframe, you need to reassign it as follows:
base = base.append(top)

Related

How to transform values from dataframe(python) attributes from a row into columns?

I have the following dataframe Current Dataframe loaded from a csv, that I want to use do some sampling tests.
For that I wanted to use all of the current columns, but trying to transform Element_Count and Tag_Count into separate columns from the values from Element_Count(e.g link: 10) and Tag_Count(separately).
I want to extract each value and turn it into a column. The final dataframe would be something like this(obviously depending on the values inside of Element/Tag_Count) :
Index (the 0,1,2 etc from the dataframe its self) PageID ,Uri, A, AA, AAA, link (and its value inside of Element_Count, e.g link as column and in the case of the first one in the picture 44 in the row for that specific url) etc, html, etc (with all the values of Tag_Count present in all of the content inside of the rows of the column Tag_Count as explained for Element_Count)
The current code to generate the dataframe is the following:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore") #to ignore some warnings which have no effect in this particular case.
df = pd.read_csv('test.csv', sep=';')
df.head()
I have searched google, also in here for some answers to no avail.
Have tried changing the test csv to achieve my goal, with no success. Have also tried after seeing a question on here to use:
pd.DataFrame(df.ranges.str.split(',').tolist())
to achieve the desired result with no success.
Any ideas in how to achieve this via dataframes, or by any other method?
(Anything that I have forgot to mention that u feel is important to understand the problem please say and I will edit it in)
Edit :
Although logic would say that the element and tag count arrays should be in dictionary form and easily dividable, that is not the case as shown in the print

How to append dataframes in Pandas without staggered format

I was able to append dataframes but as they are added, they appear at the end of the one previously appended an so on.
Each dataframe has a different header name.
Here’s what I’ve tried so far:
df1 = df1.append(dforiginal,sort=False, ignore_index=False)
What’s more, every time they are appended, their index is set back to 0. Is it possible to append each dataframe all starting at Index=0?
The screenshots below show what I'm getting(top image) and what I'm trying to accomplish (bottom image).
Thanks.
[1
If I got your point correctly you want to add rows instead of columns to your Dataframe, dont you?
Nevertheless, you could use for example this website to get a general overview on how to use the append function: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
Moreover, you can reset the index if you set the keyword ignore_index as True.

How to force pdfplumber to extract table according to the number of columns in the upper row?

I am trying to extract a table from PDF document with python package pdfplumber. The table has four columns and multiple rows. The first row are headers and the second row has only one merged cell, then the values are saved normally (example)
pdfplumber was able to retrive the table, but it made 6 columns out if four and saved values not according to the columns.
Table as shown in PDF document
I tried to use various table settings, including "vertical strategy": "lines", but this yields me the same result.
# Python 2.7.16
import pandas as pd
import pdfplumber
path = 'file_path'
pdf = pdfplumber.open(path)
first_page = pdf.pages[7]
df5 = pd.DataFrame(first_page.extract_table())
getting six columns instead of four with values in wrong columns.
Output example:
Table as output in jupyter notebooks
I would be happy to hear, if anybody has any suggestion, solution.
Did you got the answer as i want ot replace the \n coming in the text of column?
This is not exactly what you're looking for but you could load the op into a dataframe and iterate over it using the non-null values in the first row as column names for another dataframe. After that it is easy, you can just collate all the data between 2 column name columns in the output dataframe and insert it into the new dataframe after merging those cells.

how to write an empty column in a csv based on other columns in the same csv file

I don't know whether this is a very simple qustion, but I would like to do a condition statement based on two other columns.
I have two columns like: the age and the SES and the another empty column which should be based on these two columns. For example when one person is 65 years old and its corresponding socio-economic status is high, then in the third column(empty column=vitality class) a value of 1 is for example given. I have got an idea about what I want to achieve, however I have no idea how to implement that in python itself. I know I should use a for loop and I know how to write conditons, however due to the fact that I want to take two columns into consideration for determining what will be written in the empty column, I have no idea how to write that in a function
and furthermore how to write back into the same csv (in the respective empty column)
[]
Use the pandas module to import the csv as a DataFrame object. Then you can do logical statements to fill empty columns:
import pandas as pd
df = pd.read_csv('path_to_file.csv')
df.loc[(df['age']==65) & (df['SES']=='high'), 'vitality_class'] = 1
df.to_csv('path_to_new_file.csv', index=False)

Dataframe is empty after merging two dataframes with Pandas

I have two Dataframes in which I am trying to merge using pandas. One table is 4 columns and the other one is 3. I am attempting an inner join on an int64 type column.
On the link you can see both columns named UPC are int64 types.
Just to make sure the Dataframes weren't empty, I have added a picture of the first 20 rows for each table.
when I try to merge I put the following command.
result = merge(MPA_COMMODITY, MDM_LINK_VIEW, on='UPC')
When I try to check the return value, it returns the column names but it says that the Dataframe is empty.
This is using Python 3.6.4 and Pandas version 0.22.0.
If there is any other information needed please let me know. More than glad to update post if have to.
I think you want
MPA_COMMODITY.merge(MDM_LINK_VIEW, on='UPC')
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.merge.html

Categories

Resources