Multiple column header xlx in python

Multiple column header xlx in python - python

Please do help me on converting the below in python:
--Input data--
SI.no Taskno Colour
Person Party
Fruit
------------------------
1 123 Red
Siva Birthday
Orange
2 245 Pink
Ravi Marriage
Apple
--Output data--
SI.No TaskNo Person Colour Party Fruit
1 123 Siva Red Birthday Orange
2 245 Ravi Pink Marriage Apple
As iam new to python, not sure how to handle this, so request to guide me on the same..
Here the data i am receiving has more column headers, so want extra column header to convert into a new column
Thankyou

Related

Extraction of a common column pandas

I have two data frames, I need to extract data in Column_3 of the second dataframe DF2.
Question 1: How should I create "Column_3" from "Column_1" and "Column_2" of the first dataframe?
DF1 =
Column_1 Column_2 Column_3
Red Apple small Red Apple small
Green fruit Large Green fruit Large
Yellow Banana Medium Yellow Banana Medium
Pink Mango Tiny Pink Mango Tiny
Question 2: I need to extract "n_col3" from n_col1 & n_col2 but that should be similar to the column_3 of data frame 1. (see the brackets for info of what to be extracted)
Note: If all the information of Column_3 is not available in Column_1 & Column_2 like in Row 1 & Row 3, Only that information that is available should be extracted)
DF2 =
n_col1 n_col2 n_col3
L854 fruit Charlie Green LTD Large fruit Large(Green missing Fruit Large extracted)
Red alpha 8 small Tango G250 Apple Red Apple small(all information extracted)
Mk43 Mango Beta Tiny J448 T Mango Tiny(Pink missing, Mango Tiny is extracted)
M40 Yellow Medium Romeo Banana Yellow Banana Medium(all information extracted)
I want to extract that column so that I can do further processing of merging. Can anyone help me with this. Thank you in advance.

Splitting a column into two in dataframe

It's solution is definitely out there but I couldn't find it. So posting it here.
I have a dataframe which is like
object_Id object_detail
0 obj00 red mug
1 obj01 red bowl
2 obj02 green mug
3 obj03 white candle holder
I want to split the column object_details into two columns: name, object_color based on a list that contains the color name
COLOR = ['red', 'green', 'blue', 'white']
print(df)
# want to perform some operation so that It'll get output
object_Id object_detail object_color name
0 obj00 red mug red mug
1 obj01 red bowl red bowl
2 obj02 green mug green mug
3 obj03 white candle holder white candle holder
This is my first time using dataframe so I am not sure how to achieve it using pandas. I can achieve it by converting it into a list and then apply a filter. But I think there are easier ways out there that I might miss. Thanks

Use Series.str.extract with joined values of list by | for regex OR and then all another values in new column splitted by space:
pat = "|".join(COLOR)
df[['object_color','name']] = df['object_detail'].str.extract(f'({pat})\s+(.*)',expand=True)
print (df)
object_Id object_detail object_color name
0 obj00 Barbie Pink frock Barbie Pink frock
1 obj01 red bowl red bowl
2 obj02 green mug green mug
3 obj03 white candle holder white candle holder

Counting rows that have same values in spcific columns in csv

I have a csv that i want to count how many rows that match specific columns, what would be the best way to do this? So for example if this was the csv:
fruit days characteristic1 characteristic2
0 apple 1 red sweet
1 orange 2 round sweet
2 pineapple 5 prickly sweet
3 apple 4 yellow sweet
the output i would want would be
1 apple: red,sweet

A csv is a file with values that are separated by commas. I would recommend turning this into a .txt file and using this same format. Then establish consistent spacing throughout your file (using tab for example). So that when you loop through each line it knows where the actual information is. Then when you what info is in what column you print those specific values.
# Use a tab in between each column
fruit days charac1 charac2
0 apple1 1 red sweet
1 orange 2 round sweet
2 pineapple 5 prickly sweet
3 apple 4 yellow sweet
This is just to get you started.

Checking unique value for a variable in a different column

I currently have a dataframe which looks like this:
Owner Vehicle_Color
0 James Red
1 Peter Green
2 James Blue
3 Sally Blue
4 Steven Red
5 James Blue
6 James Red
7 Peter Blue
And I am trying to verify whether one Owner has one or multiple vehicle colors assigned to the person. Keeping in mind that my dataframe has more than a million number of different entries for owners (which can be duplicate), what would be the best solution?

One way may be to use groupby and nunique:
df.groupby('Owner')['Vehicle_Color'].nunique()
Results:
Owner
James 2
Peter 2
Sally 1
Steven 1
Name: Vehicle_Color, dtype: int64

groupby and join text column

I have a csv file with this header text|business_id
I wanna group all texts related to one business
I used review_data=review_data.groupby(['business_id'])['text'].apply("".join)
The review_data is like:
text \
0 mr hoagi institut walk doe seem like throwback...
1 excel food superb custom servic miss mario mac...
2 yes place littl date open weekend staff alway ...
business_id
0 5UmKMjUEUNdYWqANhGckJw
1 5UmKMjUEUNdYWqANhGckJw
2 5UmKMjUEUNdYWqANhGckJw
I get this error: TypeError: sequence item 131: expected string, float found
these are the lines 130 to 132:
130 use order fair often past 2 year food get progress wors everi time order doesnt help owner alway regist rude everi time final decid im done dont think feel let inconveni order food restaur let alon one food isnt even good also insid dirti heck deliv food bmw cant buy scrub brush found golden dragon collier squar 100 time better|SQ0j7bgSTazkVQlF5AnqyQ
131 popular denni|wqu7ILomIOPSduRwoWp4AQ
132 want smth quick late night would say denni|wqu7ILomIOPSduRwoWp4AQ

I think you need filter notnull data with boolean indexing before groupby:
print review_data
text business_id
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw
1 excel food 5UmKMjUEUNdYWqANhGckJw
2 NaN 5UmKMjUEUNdYWqANhGckJw
3 yes place 5UmKMjUEUNdYWqANhGckJw
review_data = review_data[review_data['text'].notnull()]
print review_data
text business_id
0 mr hoagi 5UmKMjUEUNdYWqANhGckJw
1 excel food 5UmKMjUEUNdYWqANhGckJw
3 yes place 5UmKMjUEUNdYWqANhGckJw
review_data=review_data.groupby(['business_id'])['text'].apply("".join)
print review_data
business_id
5UmKMjUEUNdYWqANhGckJw mr hoagi excel food yes place
Name: text, dtype: object

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Multiple column header xlx in python - python

Related

Extraction of a common column pandas

Splitting a column into two in dataframe

Counting rows that have same values in spcific columns in csv

Checking unique value for a variable in a different column

groupby and join text column

Categories

Resources