Grouping Together CSV Data Using Python

Grouping Together CSV Data Using Python - python

I have a question regarding how to perform a specific task using python. I feel like this should be pretty straightforward but I’m new to python and I can’t seem to locate a post which specifically addresses what I need to do, which is to group items together from a .csv file. Below I give a description of the csv file that I’m working with and the initial steps that I’m hoping to perform with the data contained in the form using python.
The csv file
The file was generated from a program that I have been using to administer questionnaires. Each ‘column’ has a header; each ‘row’ contains a Participant number and their responses to 100 questions: the questions are in the same order for each participant.
The steps
As mentioned, the data that I need to work with are answers to 100 questions. I need to group the questions into 10 sets of 10 questions each. I need to perform some simple analysis on 5 sets of the ten. After the analysis is complete I need to add 1 of each set of the 5 analyzed to 1 of each set of the 5 that have not been manipulated in any way.
The problem
The problem that I am having is that I cannot figure out a way to group data together from a csv file (such as 10 specific columns from a row) to make up a specific group. I’ve done a little reading and searching on the csv.dictReader_Writer mod/class but I have not come across an example of how to do this as of yet. Has anyone here performed this type of task before and if so, does one go about doing this? Unfortunately, I don’t have any code started as of yet as this would be one of the very first things that I would need to do, besides importing modules of course, in order to manipulate the data in any way.

Related

How to create pdfs from rows in a dataset and save them

Very general question here. Need ideas. I have a dataset with about 20 rows. I want to use python or R to automatically take each of these rows and create 1 pdf per row. The pdf is formatted in a particular way that I need to be able to play around with.
Imagine each row is a student's name, and I need to make a pdf "Report Card" for each student. The report card will have a designated spot that says "Math Grade" and then the value will come from the dataset.
I want to be able to hit run, and have all 20 of the pdfs save to a folder on my machine. Eventually, I may try to have this run on a server or something so it is fully automatic. The pdfs ultimately get emailed out to a distribution list.
I am very pretty familiar with R, and mildly familiar with Python. I have no experience in HTML, but is that what I need here?
Any tutorials, ideas, explanations of the process I should use would be appreciated.
I thought about using plot.ly.dash. But I think that is mostly for viewing in a web browser. I want pdfs, so I don't know if that will work.

I need a starting point to code an app to extract text from pdf to excel

To start I just want to state that I'm an Electrical Engineer with basic knowledge of programming.
My requirement is as follows:
I want to create an app where I can load and view PDF files that
contain tables.
These PDF files tables are of irregular shapes and in a different
position on every page. (that's why tools like tabular couldn't help
me)
Each table entry is multiline and of irregular dimensions (I cannot
select a whole row at a time it has to be each element alone. simply
copying the lines to excel won't work either because it will need a
lot of formatting)
So I want to be able to select each table entry individually from the
table (like a selection or cropping box over the required text),
delete new line if there is a new line in the text and just keep spaces.
The generated excel (or access database I do not really mind any)
should be reviewable and saveable (if those are even words XD).
I have a good knowledge of python and a very elementary knowledge of Django and I'm seeking some expert who can tell me what do I really need to learn (and if possible where to learn it) to execute my project.
Is it very much for me to execute and if I can dedicate 10 hours a week, how much would it take me to execute such a project.
Thanks all for your help in advance.

Don't use Python, use Word. Open the pdf, then step through the tables collection to collect the data and put it into excel. See this for an example

Here are the advises i can provide you :
first of all, ask internet for questions :
https://lmddgtfy.net/?q=python%20library%20tabular%20pdf
-> Camelot , which is mentioned multiple time seems to be relevant
For the use of excel sheet, i present you one of the most famous library for manipulating DataFrame: Pandas
You can use small courses on internet which will offer you a quick ability to manage your project easier.
for the application, you can easily find on youtube courses on a library made by someone who will explain you how to do a basic application. It could offer you the entry point you are talking about. Then, You can just wonder what else do you need or simply want for making it better.
for the time needed, it depends on how much time do you need to understand the basics, how much time you spend on having a deeper comprehension. I think in one week, working during your free time with a real interest, it could be working( not perfect, but working, which is a good beginning)
PS: I am not sure if your question is relevant for the aims of stackoverflow. I suggest you to read this file. ( https://stackoverflow.com/help/how-to-ask)

table for organizing students registration of subjects

I have to create my own subjects table, I have an excel file which contain subjects groups and dates that are available, I want to create a python program to run over all the combinations of subjects to give me all the available dates of subjects which I want to register in .
actually, I have no idea even how to start,
now I have four subjects let's call them F,N,S and G.
each one has four groups with different times along the week
so I want to generate all the available combinations which there is no overlap between subjects .
all I want is any hint, I don't want the whole solution just any intial thoughts to start.
I'm really a beginner python programmer and I can't think of any thing to launch this project
how to arrange them into matrices????????

Save the excel file as a csv, or "comma-separated values" file. This format is simple plaintext, and easy for programs to use.
In your program, read in the file using open()
Use the csv module to extract the opened file into a list of lists. Each element of the outer list should be another list: [subject, group, date] (or whatever columns are in your table.
Now that you have your information read into the program, look into solutions for the actual algorithm. You can google various scheduling algorithms, but this StackOverflow question gets at what you're looking for, I think, and might serve as a good starting point

Parsing a CSV into a database for an API using Python?

I'm gonna use data from a .csv to train a model to predict user activity on google ads (impressions, clicks) in relation to the weather for a given day. And I have a .csv that contains 6000+ recordings of this info and want to parse it into a database using Python.
I tried making a df in pandas but for some reason the whole table isn't shown. The middle columns (there's about 7 columns I think) and rows (numbered over 6000 as I mentioned) are replaced with '...' when I print the table so I'm not sure if the entirety of the information is being stored and if this will be usable.
My next attempt will possible be SQLite but since it's local memory, will this interfere with someone else making requests to my API endpoint if I don't have the db actively open at all times?
Thanks in advance.

If you used pd.read_csv() i can assure you all of the info is there, it's just not displaying it.
You can check by doing something like print(df['Column_name_you_are_interested_in'].tolist()) just to make sure though. You can also use the various count type methods in pandas to make sure all of your lines are there.
Panadas is pretty versatile so it shouldn't have trouble with 6000 lines

Anyone familiar with data format of Comfirmit?

I recently asked about accessing data from SPSS and got some absolutely wonderful help here. I now have an almost identical need to read data from a Confirmit data file. Not finding a ton of confirmit data file format on the web. It appears that Confirmit can export to SPSS *.sav files. This might be one avenue for me. Here's the exact needs:
I need to be able to extract two different but related types of info from a market research study done using ConfirmIt:
I need to be able to discover the data "schema", as in what questions are being asked (the text of the questions) and what the type of the answer is (multiple choice, yes/no, text) and what text labels are associated with each answer.
I need to be able to read respondents answers and populate my data model. So for each of the questions discovered as part of step 1 above, I need to build a table of respondent answers.
With SPSS this was easy thanks to a data access module available freely available by IBM and a nice Python wrapper by Albert-Jan Roskam. Googling I'm not finding much info. Any insight into this is helpful. Something like a Python or Java class to read the confirmit data would be perfect!
Assuming my best option ends up being to export to SPSS *.sav file, does anyone know if it will meet both of my use cases above (contain the questions, answers schema and also contain each participant's results)?

You can get the data schema from Excel definition export from Confirmit
You can export from Confirmit txt file with the same template

I was recently given a data set from confirmit. There are almost 4000 columns in the excel file. I want to enter it into a mysql db. There is not way they are just doing that output from one table. Do you know how the table schema works for confirmit?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.