Austin Animal Center Exploratory Data Analysis (EDA)

Austin Animal Center

Outlines of my Notebook

  • Introduction

Introduction

As I was looking through hundreds of datasets from Kaggle.com, https://www.kaggle.com/datasets, I happen to stumble on an animal shelter data from the Austin Animal Center in Texas.

It quickly took my attention away as I am a huge fan of animals and I can’t help but to choose this dataset for my Exploratory Data Analysis (EDA) for this course project. This animal shelter data can be taken directly from the https://data.austintexas.gov but I downloaded the dataset from Kaggle instead, https://www.kaggle.com/jackdaoud/animal-shelter-analytics/tasks?taskId=3654.

In this EDA course project, I am going to analyze data collected from the Austin Animal Center in Texas through Data Visualization to give meaningful insights on the data as well as create awareness about animal shelters.

Data Retrieval (Downloading Dataset)

Data Retrieval is the first step and most important step in every analysis because without any datasets, Data Analysis cannot be performed. Data is the main ingredient in Analysis.

For this Course Project, I am going to download my dataset from Kaggle, https://www.kaggle.com/jackdaoud/animal-shelter-analytics/tasks?taskId=3654. Upon downloading the dataset needed for my project, I uploaded it to this project file so that I can retrieve it from the files to here.

Let’s begin by downloading the dataset using the urlretrieve function from the urllib.request module.

from urllib.request import urlretrieve
urlretrieve('https://jovian.ai/elissammi/austin-animal-center-eda-course-project/v/27/files?filename=Austin_Animal_Center_Intakes.csv')

Data Preparation and Cleaning

Data Preparation and Cleaning is an important step before any analysis. This is to make sure that the data is clean and ready to be analysed.

This process is a section where we can do some processing to understand our data better.

To-do List:

Step 1: Installing Pandas and reading CSV files

  • install Pandas library

Step 2: Working with Dates

  • using pd.to_datetime method to change the datatype from object to datetime64[ns]

Step 3: Querying and Sorting Dataframe

  • query the dataframe by removing unneccessary and redundant data from the dataframe by selecting the desired column names

Step 4: Renaming Column Names

  • using .rename to change multiple column names in a dict so that all column names follow the Python Naming Conventions

Step 1: Installing Pandas Library and Reading the CSV file

To read the file, we need to use the read_csv method from Pandas. Therefore, let's install the Pandas library first!

!pip install pandas --upgrade -q
import pandas as pd
animal_intakes_df = pd.read_csv('Austin_Animal_Center_Intakes.csv')

Step 2: Working with Dates

we will need to change the datatype from object to the approriate datatype using pd.to_datetime

animal_intakes_df['DateTime'] = pd.to_datetime(animal_intakes_df.DateTime)

This new and correct data type allow us to extract different parts of data into separate columns, year, month, day, and time using the DatetimeIndex class.

animal_intakes_df['year'] = pd.DatetimeIndex(animal_intakes_df.DateTime).year
animal_intakes_df['month'] = pd.DatetimeIndex(animal_intakes_df.DateTime).month
animal_intakes_df['day'] = pd.DatetimeIndex(animal_intakes_df.DateTime).day
animal_intakes_df['time'] = pd.DatetimeIndex(animal_intakes_df.DateTime).time

Step 3: Querying and Sorting Dataframe

Because we added three new columns, there are now redundant columns and data in the dataframe. Therefore, we will remove it by querying through selecting the columns that we desire.

query_animal_intakes_df = animal_intakes_df[['year','month','day','time','Animal ID','Name','Found Location','Intake Type','Intake Condition','Sex upon Intake','Age upon Intake', 'Animal Type', 'Breed','Color']]

Now, we will only have 14 important columns that we need to analyse. As result, redundancy is removed.

Next, we would like to identify the start date of this dataframe which is, at when the data is provided in this dataset. As I realise the dataset given is not arranged according to the year so sorting is required.

sorted_animal_intakes_df = query_animal_intakes_df.sort_values('year', ascending=True)

After sorting using the .sort_values method, we are able to arrange the year column ascendingly.

With this, we are able to see from the DataFrame that:

  • The Data provided is from Dec 4, 2013 to March 3, 2021.

Step 4: Renaming Column Names

But, before performing Data Analysis, we need to make sure that all the column names follow the Python Naming Convention so that we can easily retrieve data from the dataframe. Column Names that have spaces and first captalized letters are not allowed in the Python Naming Convention.

Therefore, let’s rename some of the column names to approriate ones using the .rename method.

rename_animal_intakes_df = sorted_animal_intakes_df.rename(columns={'Animal ID': 'ID', 
'Name': 'name',
'Found Location' : 'found_location',
'Intake Type' : 'intake_type',
'Intake Condition' : 'intake_condition',
'Sex upon Intake' : 'sex',
'Age upon Intake' : 'intake_age',
'Animal Type' : 'animal_type',
'Breed' : 'breed',
'Color' : 'color'})

After renaming all the columns to approriate naming convention, we will finalise it as `final_animal_intakes_df` dataframe for further analysis.

final_animal_intakes_df = rename_animal_intakes_df

Exploratory Analysis and Visualization

Now that we have done Data Preparation and Cleaning, we will start with Exploratory Analysis and Visualization on the dataframe to explore the distributions of numeric columns and the relationship between columns as well as create interesting insights from this exploratory analysis.

Data visualization here is very important because it is the graphic representation of data for viewers to see the data insight visually. In this section, we will be using the Matplotlib and Seaborn Python libraries.

Let’s begin by importingmatplotlib.pyplot and seaborn.

import seaborn as sns
import matplotlib
import matplotlib.pyplot as plt
%matplotlib inline
sns.set_style('darkgrid')
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (6,9)
matplotlib.rcParams['figure.facecolor'] = '#00000000'

Q1. How did the animals get accepted into the shelter?

The Top animal intake methods in Austin Animal Center

In an animal shelter, animals normally get accepted into shelter through various reasons.

Through this analysis, we will be able to determine how most of the animals end up in the Austin Animal Center from the Year 2013 to March 2021.

First of all, we will group the dataframe by the intake_type and count the number of animals using .count on the ID column. This is because, the ID column is uniquely identified and each and every animal has an unique ID.

Creating Pie Chart for the Top 4 Animal Intake Methods

Conclusion

From this analysis, we are able to see meaningful insights by identifying the Top 4 Animal Intake Methods in Austin Animal Center.

The Top 4 Animal Intake Methods according to the rankings are:

  1. Stray

Based on the Pie Chart above, it is known that majority of the animals in the center are taken for shelter as Stray which stands a percentage of 69.96% in the Top 4 list.

From this data, we can see that, the Austin Animal Center mostly takes in Stray in to their shelter as an effort to give these animals shelter and find their forever homes through adoption from the public which is a great effort. This means that, most Strays in Austin are given shelter instead of being shelter-less.

At the same time, it is also very sad and heartbreaking to see that Owner Surrender is the second largest percentage in this shelter’s intake method. It is shown that, among the Top 4 list, 19.91% of the animals in the shelter are surrendered by their owners due to various reasons.

As a conclusion, through this analysis, I would like to create awareness about animals care in Austin as well as around the world. Owner Surrender should not be the second largest factor of animals being in the shelter!

Q2. From where do most of the animals come from?

This dataset is retrieved from the Austin Animal Center in Texas, USA. Therefore, most of the animals sheltered are from Texas but where? Which City and which Town do they mostly come from?

Through this analysis, we will be able to determine where most of the animals in this Animal Center come from.

intake_location_df = final_animal_intakes_df.groupby('found_location')[['ID']].count()intake_location_df.reset_index(inplace=True)
top5_location_df = intake_location_df.sort_values('ID', ascending=False).head(5)

Creating Pie Chart for the number of animals found in the Top 5 Locations

Conclusion

From this analysis, we are able to see meaningful insights by identifying the Top 5 Locations at where the Animals are found and receive shelter from the Austin Animal Center.

The Top 5 Locations at where the Animals were found according to the rankings are:

  1. Austin (TX)

Based on the Pie Chart above, it is known that majority of the animals in the center are found from Austin, Texas which is an amount of 22,859 animals. This is because the Animal Shelter Center is located in Austin.

Therefore, from this insight, we can know that the Animal Center focus on getting most of animals around their center to be sheltered. This is very notable because the first step starts from the one that is closest.

It is great to see that the Animal Center try their best to get all the animals nearby to be sheltered first before other areas like Travis, 7201 Levander Loop, Manor and etc.

As a conclusion, through this analysis, the Animal Center can easily identify where most of their animals are from and this can help in future procedures like adoption.

Q3. What is the Top 20 animal names in the Austin Animal Center?

Most of us as pet owners often find yourself having difficulties trying to come out with a name for your little furry pet. Having to name your own furry friend the most popular name in Texas is a bonus point!

Through this analysis, we will determine the Top 20 animal names from the Year 2013 to March 2021.

intake_name_df = final_animal_intakes_df.groupby('name')[['ID']].count()
intake_name_df.reset_index(inplace=True)
top20_names_df = intake_name_df.sort_values('ID', ascending=False).head(20)

Creating Bar Chart on the Top 20 Animal Names

Conclusion

From this analysis, we are able to see meaningful insights by identifying the Top 20 Animal names in the Austin Animal Center.

The Top 20 Animal names in the Austin Animal Center according to the rankings are:

  1. Max

Based on the Pie Chart above, it is known thats the top most used animal names in the center is Max !

As a conclusion, through this analysis, there is no doubt that Max has always been a popular name for animals.

Q4. What is the type of animal accepted into the Austin Animal Center?

The animals accepted and given shelter in the Animal Center are not only Cats and Dogs but instead there are many more type of animals getting shelter in the center.

We often overlook other animals like birds, bats and etc which also require care and shelter under certain circumstances.

Q4. What is the type of animal accepted into the Austin Animal Center?

The animals accepted and given shelter in the Animal Center are not only Cats and Dogs but instead there are many more type of animals getting shelter in the center.

We often overlook other animals like birds, bats and etc which also require care and shelter under certain circumstances.

Q4. What is the type of animal accepted into the Austin Animal Center?

The animals accepted and given shelter in the Animal Center are not only Cats and Dogs but instead there are many more type of animals getting shelter in the center.

We often overlook other animals like birds, bats and etc which also require care and shelter under certain circumstances.

animal_type_df = final_animal_intakes_df.groupby('animal_type')[['ID']].count()
animal_type_df.reset_index(inplace=True)
top4_type_df = animal_type_df.sort_values('ID', ascending=False).head(4)

Creating Pie Chart on the Type of Animal in Austin Animal Center

Conclusion

From this analysis, we are able to see meaningful insights by identifying the Number of Animals based on their type in Austin Animal Center.

Based on the Pie Chart above, it is known that majority of the animals in the center are Dog which is an amount of 70,447 dogs and there are 46,455 cats.

As a conclusion, through this analysis, the Animal Center can easily identify the statistic data of the type of animals available in their center.

Q5. What is the Top Breeds Animals in the Austin Animal Center

It is identified that majority of the animals in the Animal Center are dogs but it is important to have an insight about the breed of the animals as well.

Therefore, in this analysis, we will be determining the top breeds animals in the center.

animal_breed_df = final_animal_intakes_df.groupby('breed')[['ID']].count()animal_breed_df.reset_index(inplace=True)
top10_breed_df= animal_breed_df.sort_values('ID', ascending=False).head(10)

Creating Pie Chart on the Percentage of Breed based on the Top 10 Breeds

Conclusion

From this analysis, we are able to see meaningful insights by identifying the Top 10 Animal Breeds in Austin Animal Center.

The Top 10 Animal Breeds according to the rankings are:

  1. Domestic Shorthair Mix

Based on the Pie Chart above, it is known that majority of the animal breed in the center are Domestic Shorthair Mix which stands a percentage of 44.00% in the Top 10 list.

From the previous data analysis, we can see that, the Austin Animal Center mostly takes in Stray in to their shelter and now through this analysis, we can see that in the center, most of the breed is Domestic Shorthair Mix.

As a conclusion, through this analysis, it is known that Domestic Shorthair Mix is the most popular breed in the Animal Center.

Summary

Exploratory Data Analysis(EDA)

In EDA we will do analysis by looking at five important questions:

  1. How did the animals get accepted into the shelter?

Thank you for taking your time to read my medium blog!

Let’s save and upload our work to Jovian before continuing so that we won’t lose it!

I am Sammi Chia Li, currently 19 years old in Year 2021, a Year 1 Degree in Data Science student in a University College in Malaysia.