Posts

Showing posts from November, 2023

Simulating 5k Runner's Data

Image
Introduction In this tutorial, we'll explore how to create a simulated dataset for a running event that spans multiple years using Python and the powerful pandas library. This step-by-step guide is designed for beginners and will help you understand the process of data simulation and manipulation using Python. Scenario Imagine you're tasked with simulating a 5k running event that takes place annually for five years, from 2023 to 2028. Each year, the event sees participants of various ages and running times, and your goal is to generate a dataset that represents this dynamic event. Nuances, if a person participated in a previous year, we don't want an random age, we want their previous age. The age minimum is 16 and the assumed fastes time is 15 minutes - flat! For anyone who ran a 5k, you know this is fast! Setting Up Before we dive into the code, make sure you have Python and the necessary libraries installed. You can use a Jupyte

5 Year Journal

Image
Creating a 5 Year Journal with Python and ReportLab Creating a 5 Year Journal with Python and ReportLab If you're new to Python or ReportLab and interested in creating PDF documents, this tutorial is for you. We'll walk through the process of creating a "5 Year Journal" PDF using Python and the ReportLab library. This journal will have a page for each day of the year, with sections to write entries for each of the next five years. Setting Up Your Environment First, ensure you have Python installed on your computer. Then, install ReportLab, which is a powerful and versatile PDF generation library. You can install it using pip: pip install reportlab Starting the Project Begin by importing the necessary modules from ReportLab, along with Python's datetime module for handling dates: from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas from datetime import datetime, timedelta Creating the PDF File Set up t

Grants.gov data analysis

Image
 Grants.Gov Data Download and Analysis Project Description A friend approached me with a data task that required 1) downloading a dataset from Grants.gov, 2) cleaning the data for preparation, and 3) providing starter code so another group could query the data and come to general conclusions. To make the project more manageable, the project was first broken up into two general activities.  The first group listed everything that could be done without thought- accessing and cleaning the data.  The second group of tasks was more interactive - filtering and searching the data for insight.  As a result,  I like to give the user more power to detail where they store the raw data and where they would like to store data for archiving.  Some folks skip the project location space, but I don't like to leave it up to change.  I like to be explicit whenever possible.   Below is the "main" program.  I don't use Jupyter Notebooks, but "cells" in VS Code are defined by # %%

Course Data

Image
Generating and Manipulating Random Data in Pandas Generating and Manipulating Random Data in Pandas Creating a DataFrame with random data is a common task in data analysis for testing and simulation. This example demonstrates the generation and manipulation of a Pandas DataFrame that represents a dataset of college courses. The dataset includes course prefixes, locations, modes, and a mapping for course names and numbers. The first method in the provided code employs Python's random module to generate random data for the DataFrame. This module is crucial for simulating real-world variability in datasets. In our example, random.choice() is used to randomly select elements from predefined lists. Specifically, the course prefixes are chosen from a list of department abbreviations (pre_list), and the location for each course is randomly selected from a list of campus locations. This random selection process is repeated for each row of the DataFrame, crea

Pandas str

Image
Exploring String Operations in Pandas with Movie Genres Exploring String Operations in Pandas with Movie Genres In data analysis, string manipulation is a frequent requirement, and Pandas offers powerful tools to make this task more efficient and intuitive. In this post, we delve into the .str accessor in Pandas, utilizing a dataset of movie genres to showcase its capabilities. Dataset Overview The dataset used in this example contains information about movies, specifically their genres. We'll be using Pandas to manipulate and extract information from the 'genres' column. Here's a preview of the dataset: import pandas as pd import numpy as np loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv' m = pd.read_csv(loc) pd.set_option('display.max_columns', None) pd.options.display.min_rows = 10 m.head() String Operations Let's explore various string operations usi

Advanced List Comp in a DataFrame with Movie Data

Image
Understanding Pandas and Data Manipulation in Python Exploring Movie Data with Python and Pandas As a beginner in Python, diving into data analysis can be a bit intimidating. However, with libraries like Pandas, it becomes a lot more manageable and fun. In this blog post, we'll explore a movie dataset using Pandas, demonstrating how to perform various data manipulation tasks. Getting Started with the Dataset First, let's import our required libraries, Pandas and NumPy. We'll then load our movie dataset from a CSV file: import pandas as pd import numpy as np loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv' m = pd.read_csv(loc) pd.set_option('display.max_columns', None) pd.options.display.min_rows = 10 m.head() This code block does several things: Imports Pandas and NumPy: Essential libraries for data manipulation. Reads the CSV file: The pd.read_csv function loads the data from the given URL

Fun Friday Code Blog

Image
Understanding Python Code: Data Manipulation with Pandas Understanding Python Code: Data Manipulation with Pandas Let's dive into a detailed explanation of the provided Python code which involves manipulating movie data using the Pandas library. This is especially useful for a new Python programmer looking to understand the intricacies of data manipulation. Code Breakdown: import pandas as pd import numpy as np loc = 'https://raw.githubusercontent.com/aew5044/Python---Public/main/movie.csv' m = pd.read_csv(loc) pd.set_option('display.max_columns', None) pd.options.display.min_rows = 5 col = m.columns redefine = (m[col] .loc[:,'genres'] .str.split('|', expand=True) .stack() .value_counts() .reset_index() .rename(columns={'index':'Genre', 0:'Count'}) .style.bar(subset=['Count'], color='lightblue