Published on

Data Analysis and Visualisation - Mapping the Kenyan Startup Ecosystem of 2022 using Python

9 min read

Authors
banner

Introduction

Hello, I Am David Marko. I came across a report on the start-up ecosystem in Kenya graciously provided by DisruptAfrica https://disrupt-africa.com. By the end of the article, I hope to get a few insights into the Kenyan Startup Ecosystem in 2022.

Why?

xkcd I don't have a good answer to that. I got curious to see if there would be any patterns or relevant insights that could be of help sometime in the future. I also could have analyzed in MS Excel, but as a Python developer, my response to anyone who asks would be the famous quote

To a man with a hammer, everything looks like a nail

Requirements

  • Basic Knowledge of Python
  • Basic Knowledge of Using Jupyter or Google Colab
  • Python Libraries (Matplotlib, pandas, NumPy, Jupyter Lab)

Note: You can also download the excel file for the report from here. https://disruptafrica.gumroad.com/l/bqyjt

Installing Libraries

The first step was to open my terminal and install the following libraries by typing the following commands

We will be using Python3.0 and above

pip3 install jupyter
pip3 install numpy matplotlib pandas plotly

Jupyter Labs will help set up an environment for me to run python commands for cleaning and to visualize the provided data. Optionally, you can use Google Colab (https://colab.research.google.com/)

Set up Environment

  1. I created a jupyter notebook instance by using the following command

    jupyter notebook
    

    This opens your default browser.

  2. Create a new notebook in the folder you selected as indicated below. This will open a new tab for your notebook where we can start running python code.

    jupyter

Note:

Copy and paste the Excel file onto the same folder that you put your notebook in.

Initiating libraries and importing Excel files

First, let's import the libraries onto the notebook

install

I loaded up the excel file we downloaded into pandas as a Dataframe. To ensure that the file was loaded properly, run df to view the structure of the dataframe.

df = pd.read_excel('Kenyan_Startup_Database_Kenyan_Startup_Ecosystem_Report_2022.xlsx')

report

The next step was to clean up the dataframe into something that would be easier to plot. This is by removing any empty rows or columns, excel headers or other irrelevant information. After a glance, the only modifications I had to make were as follows:

  • Drop any empty row or column with no data or NaN values
  • Maintain the first row as the index
  • Remove all Sector headers indicated
df.drop('Unnamed: 7', axis=1, inplace=True) #Remove Unnamed: 7 Column without reassigning the dataframe
df = df[df['Sector'].notna()] # Override the dataframe with a new dataframe where the Sector column is not NaN
df

Now I have a clean dataframe df that is ready for plotting.

xkcd

Data Visualization

Here is where I started to visualize the report using matplotlib while checking for specific insights. Below, I will showcase the insight I got, the code used and the output.

  1. Location

    I plotted a simple bar graph where I grouped all locations by size and plotted them onto a bar chart. As seen below, most Startups have headquarters in Nairobi.

    # Group by locations
    locations = df.groupby(['HQ']).size().to_frame('Companies')
    # Initiate the Pie Chart
    fig_locations = px.pie(locations, values='Companies', names=locations.index, title='Locations')
    # Include labels to show the location and percentage
    fig_locations.update_traces(textposition='inside', textinfo='percent+label')
    # Visualize the data
    fig_locations.show()
    

    xkcd

  2. Year of Launch

    To visualize the year created, I chose a scattered bubble chart, grouped the data by sector and plotted. As seen, there was a huge surge in Fintech startups in 2022 and 2021

    # Create a new dataframe grouping data by sector and year of launch
    year_created = df.groupby(['Year of launch', 'Sector']).size().to_frame('Number').reset_index()
    # Convert the Year column into an integer
    year_created['Year of launch'] = year_created['Year of launch'].astype('int')
    # Plot onto a scattered graph
    fig = px.scatter(year_created, x="Year of launch", y="Number",
    	         size="Number", color="Sector",
                     hover_name="Sector", log_x=True, size_max=40)
    fig.show()
    

    xkcd

  3. Female Founded Startups.

    Here, I extracted only the female-founded companies and below are some interesting insights:

    • 18% of all startups have Female Founders
      # Group the columns
      female_led = df.groupby(['Female founder?']).size().to_frame('Number')
      # Plot onto a graph
      fig_locations = px.pie(female_led, values='Number', names=female_led.index, title='Female Founded?')
      fig_locations.update_traces(textposition='inside', textinfo='percent+label')
      # Preview
      fig_locations.show()
      
      xkcd
    • The oldest female-founded company is Komaza (https://komaza.com) which works in the Conversation Tech and hires the most employees (242)
      oldest = df[df['Female founder?'] == 'Yes'].sort_values(['Year of launch'], ascending=[True])
      
    • The highest percentage of female founders are in the Fintech industry, followed by E-Health, Recruitment and HR. The least being Construction and AI/IoT
      # Group sector where Female founder is Yes
      female_sectors = df[df['Female founder?'] == 'Yes'].groupby(['Sector']).size().to_frame('Number')
      # Create pie chart object
      fig_sectors = px.pie(female_sectors, values='Number', names=female_sectors.index, title='Female Founders by Sector')
      # Add labels onto the pie chart
      fig_sectors.update_traces(textposition='inside', textinfo='percent+label')
      # Preview
      fig_sectors.show()
      
      xkcd
    • 2016 had the highest spike in the number of female founders compared to any other year. It has been a stagnant trend for the following 6yrs.
      # Group by year of launch only when Female founder? is Yes
      female_founded = df[df['Female founder?'] == 'Yes']
      female_led = female_founded.groupby(['Year of launch']).size().to_frame('Number')
      # Plot onto a line graph
      fig = px.line(female_led, x=female_led.index, y="Number", title='Trend in Female Founded Companies')
      # Preview
      fig.show()
      

    xkcd

  4. Gender Distribution

    Tech startups are majorly male-dominated, but on a sector-to-sector basis, there are variations as some have more dominance than others.

    # Create new dataframes for each gender group
    female_founded = df[df['Female founder?'] == 'Yes']
    male_founded = df[df['Female founder?'] == 'No']
    # Group each dataframe by sector
    female_founded = female_founded.groupby(['Sector']).size().to_frame('Number').reset_index()
    male_founded = male_founded.groupby(['Sector']).size().to_frame('Number').reset_index()
    # Left merge to create a new dataframe on shared sectors
    merged = female_founded.merge(male_founded, how='left', on='Sector')
    # rename columns
    merged.rename(columns = {'Number_x':'Female Founders', 'Number_y':'Male Founders'}, inplace = True)
    merged.sort_values(['Female Founders', 'Male Founders'], ascending=[False, False])
    # Plot on graph
    fig = px.bar(merged, x="Sector", y=["Female Founders","Male Founders"], title="Gender Distribution by Sector")
    fig.show()
    

    xkcd

    This graph however doesn't show sectors with no female founders. I merged and compared to get the results as indicated below.

    only_female = female_founded['Sector'][~female_founded['Sector'].isin(male_founded['Sector'])]
    # There were no sectors exclusively run by Female founders
    
    only_male = male_founded['Sector'][~male_founded['Sector'].isin(female_founded['Sector'])]
    # This gave me a list of sectors that dont have any female founders
    only_male
    

    xkcd

  5. Accelerator

    A surprising number of startups start from Accelerators and Incubators as seen from the graph and code below

    accelerated = df.groupby(['Accelerated?']).size().to_frame('Number')
    fig_locations = px.pie(accelerated, values='Number', names=accelerated.index, title='Accelerated?')
    fig_locations.update_traces(textposition='inside', textinfo='percent+label')
    fig_locations.show()
    

    xkcd

    There is also an increase in the number of startups joining accelerators over the years, with the most common Startup Sector being Fintech and the most active year being 2020

    accelerator = df[df['Accelerated?'] == 'Yes'].groupby(['Year of launch', 'Accelerated?', 'Sector']).size().to_frame('Number').reset_index()
    accelerator['Year of launch'] = accelerator['Year of launch'].astype('int')
    fig = px.scatter(accelerator, x="Year of launch", y="Number",
    	         size="Number", color="Number",
                     hover_name="Sector", log_x=True, size_max=40)
    fig.show()
    

    xkcd

  6. Employees

    Kenyan Startups have hired a total of 13,140 employees with Conservation, Agritech and Ecommerce being the dominant sectors in recruitment

    total_employees = df['Number of employees'].sum()
    total_employees
    
    average_employees = df.groupby(['Sector'])['Number of employees'].mean().to_frame('Average')
    average_employees['Average'] = average_employees['Average'].astype('int')
    # average_employees
    fig = px.bar(average_employees, x=average_employees.index, y='Average')
    fig.show()
    

xkcd

Conclusion

Thanks for reading along, I hope to update the article or create a second part once I can curate more information like Funding, Salaries, Grants, News, Mergers or Acquisitions and more.

You can download the notebook and relevant files from here

© 2024 David Marko