I have a CSV file which has two sets of data on the same sheet. I did my research and the closest I could find is what I have attached. The issue I am having is that both of them are not tables, their separate data sets; both of which are separated by a number of rows. I want to save each of the data sets as a separate CSV. Is this possible in Python? Please provide your kind assistance.
Python CSV module: How can I account for multiple tables within the same file?
First Set:
Presented_By: Source: City:
Chris Realtor Knoxville
John Engineer Lantana
Wade Doctor Birmingham
Second Set:
DriveBy 15
BillBoard 45
Social Media 85
My source is a Excel file which I convert into a CSV file.
import pandas as pd
data_xls = pd.read_excel('T:\DataDump\Matthews\REPORT 11.13.16.xlsm', 'InfoCenterTracker', index_col=None)
data_xls.to_csv('your_csv.csv', encoding='utf-8')
second_set = pd.read_csv('your_csv.csv',skiprows=[10,11,12,13,14,15,16,17,18,19,20,21,22,23,23])
Use skiprows in pandas' read_csv
$ cat d.dat
Presented_By: Source: City:
Chris Realtor Knoxville
John Engineer Lantana
Wade Doctor Birmingham
DriveBy 15
BillBoard 45
Social Media 85
In [1]: import pandas as pd
In [2]: pd.read_csv('d.dat',skiprows=[0,1,2,3])
Out[2]:
DriveBy 15
0 BillBoard 45
1 Social Media 85
In [3]: pd.read_csv('d.dat',skiprows=[4,5,6])
Out[3]:
Presented_By: Source: City:
0 Chris Realtor Knoxv...
1 John Engineer Lantana
2 Wade Doctor Birmi...
You can detect what rows to skip by searching for when the csv has 2 entries not 3
In [25]: for n, line in enumerate(open('d.dat','r').readlines()):
...: if len(line.split()) !=3:
...: breakpoint = n
...:
In [26]: pd.read_csv('d.dat',skiprows=range(breakpoint-1))
Out[26]:
DriveBy 15
0 BillBoard 45
1 Social Media 85
In [27]: pd.read_csv('d.dat',skiprows=range(breakpoint-1, n+1))
Out[27]:
Presented_By: Source: City:
0 Chris Realtor Knoxv...
1 John Engineer Lantana
2 Wade Doctor Birmi...
Related
I need to compare two columns together: "EMAIL" and "LOCATION".
I'm using Email because it's more accurate than name for this issue.
My objective is to find total number of locations each person worked
at, sum up the total of locations to select which sheet the data
will been written to and copy the original data over to the new
sheet(tab).
I need the original data copied over with all the duplicate
locations, which is where this problem stumps me.
Full Excel Sheet
Had to use images because it flagged post as spam
The Excel sheet (SAMPLE) I'm reading in as a data frame:
Excel Sample Spreadsheet
Example:
TOMAPPLES#EXAMPLE.COM worked at WENDYS,FRANKS HUT, and WALMART - That
sums up to 3 different locations, which I would add to a new sheet
called SHEET: 3 Different Locations
SJONES22#GMAIL.COM worked at LONDONS TENT and YOUTUBE - That's 2 different locations, which I would add to a new sheet called SHEET:
2 Different Locations
MONTYJ#EXAMPLE.COM worked only at WALMART - This user would be added
to SHEET: 1 Location
Outcome:
data copied to new sheets
Sheet 2
Sheet 2: different locations
Sheet 3
Sheet 3: different locations
Sheet 4
Sheet 4: different locations
Thanks for taking your time looking at my problem =)
Hi Check below lines if work for you..
import pandas as pd
df = pd.read_excel('sample.xlsx')
df1 = df.groupby(['Name','Location','Job']).count().reset_index()
# this is long line
df2 = df.groupby(['Name','Location','Job','Email']).agg({'Location':'count','Email':'count'}).rename(columns={'Location':'Location Count','Email':'Email Count'}).reset_index()
print(df1)
print('\n\n')
print(df2)
below is the output change columns to check more variations
df1
Name Location Job Email
0 Monty Jakarta Manager 1
1 Monty Mumbai Manager 1
2 Sahara Jonesh Paris Cook 2
3 Tom App Jakarta Buser 1
4 Tom App Paris Buser 2
df2 all columns
Name Location ... Location Count Email Count
0 Monty Jakarta ... 1 1
1 Monty Mumbai ... 1 1
2 Sahara Jonesh Paris ... 2 2
3 Tom App Jakarta ... 1 1
4 Tom App Paris ... 2 2
I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). I've provided a rough example of what this data-set looks like in Ex1 below.
Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title?
For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.).
Ex1
Agent Business Name Business ID Revenue
John Doe Bobs Ice Cream 12234 $400
John Doe Car Repair 445848 $2331
John Doe Corner Store 243123 $213
John Doe Cool Taco Stand 2141244 $8912
Jane Doe Fresh Ice Cream 9271499 $2143
Jane Doe Breezy Air 0123801 $3412
Steve Smith Big Golf Range 12938192 $9912
Steve Smith Iron Gyms 1231233 $4133
Steve Smith Tims Tires 82489233 $781
I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started.
I would loop over the groups of names, then save each group to its own excel file:
s = df.groupby('Agent')
for name, group in s:
group.to_excel(f"{name}.xls")
Use lise comprehension with groupby on agent column:
dfs = [d for _,d in df.groupby('Agent')]
for df in dfs:
print(df, '\n')
Output
Agent Business Name Business ID Revenue
4 Jane Doe Fresh Ice Cream 9271499 $2143
5 Jane Doe Breezy Air 123801 $3412
Agent Business Name Business ID Revenue
0 John Doe Bobs Ice Cream 12234 $400
1 John Doe Car Repair 445848 $2331
2 John Doe Corner Store 243123 $213
3 John Doe Cool Taco Stand 2141244 $8912
Agent Business Name Business ID Revenue
6 Steve Smith Big Golf Range 12938192 $9912
7 Steve Smith Iron Gyms 1231233 $4133
8 Steve Smith Tims Tires 82489233 $781
Grouping is what you are looking for here. You can iterate over the groups, which gives you the grouping attributes and the data associated with that group. In your case, the Agent name and the associated business columns.
Code:
import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])
# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
# you probably have more complicated naming logic
# use index = False if you have not set an index on the dataframe to avoid an extra column of indices
data.to_excel(group_name + '.xlsx', index = False)
Use the unique values in the column to subset the data and write it to csv using the name:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")
if you need excel:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")
I have a program that open and read a file in csv format that contains large data such as:
State Crime type Occurrences Year
CALIFORNIA ROBBERY 12 1999
CALIFORNIA ASSAULT 45 2003
NEW YORK ARSON 9 1999
CALIFORNIA ARSON 21 2000
TEXAS THEFT 30 2000
OREGON ASSAULT 10 2001
I need to create 3 filters by user input. For example:
Enter State:
Enter Crime Type:
Enter Year:
If I enter:
Enter State: CALIFORNIA
Enter Crime: ASSAULT
Enter Year: 2003
Crime Report
State Crime type Occurrences Year
CALIFORNIA ASSAULT 45 2003
This needs to happen.
I have no clue on how to tackle this problem.. I was only able to open and read the data file in csv format into a table in Python that will just print out every line. However, I need to incorporate search filter to narrow the result such as shown above. Anyone familiar with this? Thank you all for your help.
The Pandas library in Python allows you to view and manipulate csv data. The following solution imports the pandas library, reads the csv using the read_csv() function and loads it into a dataframe, then ask for input values, keeping in mind that State and Crime should be string values and cast as str and Year should be integer and cast as int, then applies a simple query to filter the results you need from the dataframe. We build this query keeping in mind that all three conditions should be met and that the input strings can be lowercase too.
In [125]: import pandas as pd
In [126]: df = pd.read_csv('test.csv')
In [127]: df
Out[127]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999
1 CALIFORNIA ASSAULT 45 2003
2 NEW YORK ARSON 9 1999
In [128]: state = str(input("Enter State: "))
Enter State: California
In [129]: crime_type = str(input("Enter Crime Type: "))
Enter Crime Type: robbery
In [130]: year = int(input("Enter Year: "))
Enter Year: 1999
In [131]: df.loc[lambda x:(x['State'].str.lower().str.contains(state.lower()))
...: & (x['Crime type'].str.lower().str.contains(crime_type.lower())) & (x
...: ['Year'] == year)]
Out[131]:
State Crime type Occurrences Year
0 CALIFORNIA ROBBERY 12 1999
I have a text file like this:
PERSONAL INFORMATION
First Name: Michael
Last Name: Junior
Birth Date: May 17, 1999
Location: Whitehurst Hall 301. City: Stillwater. State: OK
Taken on July 8, 2000 10:50:30 AM MST
WORK EXPERIENCE
Work type select Part-time
ID number 10124
Company name ABCDFG Inc.
Positions Software Engineer/Research Scientist
Data Analyst/Scientist
As you could see the first column is feature names and the second column values. I read it using this code:
import pandas as pd
import numpy as np
import scipy as sp
df=pd.read_table('personal.txt',skiprows=1)
pd.set_option('display.max_colwidth',10000)
pd.set_option('display.max_rows',1000)
df
But it merges columns and outputs:
PERSONAL INFORMATION
0 First Name: Michael
1 Last Name: Junior
2 Birth Date: May 17, 1999
3 Location: Whitehurst Hall 301. City: Stillwater. State: OK
4 Taken on July 8, 2000 10:50:30 AM MST
5 WORK EXPERIENCE
6 Work type select Part-time
7 ID number 10124
8 Company name Google Inc.
9 Positions Software Engineer/Research Scientist
10 Data Analyst/Scientist
I should escape from those titles PERSONAL INFORMATION and WORK EXPERIENCE as well. How can I read in a way that it gives me results appropriately in two columns?
I have a two column CSV:
Name, Sport
Abraham Soccer
Adam Basketball
Adam Soccer
John Soccer
Jacob Tennis
Jacob Soccer
What is the simplest way to convert this into something openable in Excel that is either in XLS or CSV such that when opening up in MS Excel, it looks something like:
Basketball, Soccer, Tennis
Abraham X
Adam X X X
John X
Jacob X X
I would consider pandas to be a suitable package for this kind of application. The centerpiece of pandas is a dataframe object (df), which is in essence a table of your data. csv files can be read into pandas using read_csv.
import pandas as pd
df = pd.read_csv('filename.csv')
In [3]:df
Out[3]:
Name Sport
0 Abraham Soccer
1 Adam Basketball
2 Adam Soccer
3 John Soccer
4 Jacob Tennis
5 Jacob Soccer
There is a pandas method crosstab that does what you want as simply as
table = pd.crosstab(df['Name'], df['Sport'])
In [4]:table
Out[4]:
Sport Basketball Soccer Tennis
Name
Abraham 0 1 0
Adam 1 1 0
Jacob 0 1 1
John 0 1 0
Then you can convert back to a csv file with
table.to_csv('filename.csv')