North Carolina University List
I needed a list of public universities in North Carolina. What does anyone do? Start Googling. The first return was a Wikipedia page. I clicked on it and saw this beautiful table with all the information I wanted. Then I remembered there is a great method in python and pandas to pull tables from websites very easily.
All I need to do is use read_html and sort the data after that. Below is a short description for each step.
This code uses the Python programming language and the pandas library to read a table from a Wikipedia page about colleges and universities in North Carolina.
The code begins by importing the pandas library and assigning the URL of the Wikipedia page to a variable called URL.
Next, the pd.read_html() function is used to read the HTML table from the URL and return a list of tables. The first table in the list is selected using tables[0] and assigned to a new variable called df.
The .query() method is then used to filter the rows of the DataFrame df based on the value of the "Control" column. Specifically, the code selects only rows where the value of the "Control" column is "Public." The resulting filtered DataFrame is assigned to a new variable called filtered_df.
Finally, the code outputs the resulting DataFrame filtered_df to the console.
In summary, this code reads a table from a Wikipedia page, filters the rows based on a specific criterion, and outputs the filtered table to the console.
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_North_Carolina'
tables = pd.read_html(url)
filtered_df = tables[0].query('Control == "Public"')
filtered_df
Chaining style
The code below takes the same idea, utilizes chaining, sorts the values, keeps the original index, and creates a new index based on date founded. Now, we have a sort index for alphabetical and by date. Very efficient!
import pandas as pd
url = 'https://en.wikipedia.org/wiki/List_of_colleges_and_universities_in_North_Carolina'
tables = pd.read_html(url)
(tables[0]
.query('Control == "Public"')
.sort_values(by='Founded')
.reset_index()
.rename(columns={'index':'alph_sort'})
)
There we go. We have our list ready for review!
Comments
Post a Comment