This is a list of resources that I keep in mind when building data platforms. I also include some day-to-day tools I use to help manage projects, time, and communication.


Development tools I use when building data solutions.

Development Tools Description
Visual Studio Code A code editor that supports a wide variety of languages and uses-cases with extensions
Google Colab Hosted jupyter notebooks for running python. Has access to GPU/TPU if you want to explore ML.
tmux A terminal utility to allows running command line commands server-side.
Docker Package software code with a complete environment.
Serverless Utility that streamlines deploying serverless applications in cloud platforms such as AWS.
AWS Python CDK Python Cloud Development Kit (CDK) allows for deploying AWS infrastructure using Python code.
data build tool(dbt) Utility that supports for creating ELT data pipelines run within a database. Used for normalizing data into a data warehouse and curating data marts. Supports data testing. Auto-generated documentation allows for making it easy to understand how data is flowing through the data platform.
GitHub Source control platform of choice.
GitHub Actions Supports Continuous Integration/Continuous Deployment (CI/CD)
dbeaver Tool for interacting with SQL databases
pre-commit pre-commit is a tool that can run a series of checks when committing files to a git based source control

Programming Languages

Programming languages I consider when building data platforms, doing statistical analysis, machine learning, etc.

Language Purpose
Python A general usage language that supports building data platforms, data analytics, statistical analysis, and machine learning.
R A language built for performing statistical analysis, machine learning, and data analysis.
Julia A high performant language for scientific computing.
Scala For writing performant Spark data applications.
Go Building efficient data microservices.
SQL Structured query language for pulling data from relational databases.
bash A shell scripting language that can be helpful in automating tasks.

Python Development Tools

Tool Type Purpose
black Clean Code black is a python code formatter that helps keep code consistently formatted across teams
isort Clean Code isort helps organize python imports
flake8 Clean Code flake8 is a code linter that enforces coding styles such as flagging unused imports
mypy Clean Code mypy is a static type checker that helps solve the problem in python with dynamic typing
pytest Testing pytest is my unit testing tool of choice for it’s extensibility for building test harnesses
great-expectations Testing Supports data quality testing and validation
Test-Driven Data Analysis Testing A package that supports integrating data quality tests with unit tests such as pytest
poetry Environment Management A python package that supports managing virtual environments
data version control(dvc) ML Ops Utility that provides a streamlined approach for managing machine learning models from development to production.
awswrangler AWS SDK AWS Development

Favorite Python Analysis Tools

Tool Type Purpose
jupyter notebooks Analysis Tool Jupyter notebooks allow for running python in a cell format and being able to immediately see the results
PySpark Data Processing Provides a Python interface to creating Spark applications
pandas Dataframe pandas allows for importing data into a tabular data structure and can perform cleaning and analysis activities in python
scipy Calculations Python package for scientific computing
numpy Calculations Python package for mathematical computing
streamlit Interactive Data Visualization Streamlined method for creating interactive dashboards in python
plotly Interactive Data Visualization Streamlined method for creating interactive visualizations in python
plotnine Static Data Visualization Allows the creation of ggplot visualizations in python
scikit-learn Machine Learning Machine learning in python
keras Deep Learning A high level deep learning library for creating deep learning models

R Development Tools

Tool Type Purpose
renv Environment Management A tool for managing R packages used for a project.
RStudio IDE A IDE tailored to running R code. It can also run Python but I have not tested this out.

R Analysis Tools

Tool Type Purpose
tidyverse Analysis An ecosystem of r-packages that support doing a wide variety of data science/analytics tasks
r-shiny Interactive Dashboard A framework for creating interactive dashboard sites with R.

IT Tools

This section includes a wide variety of tools and platforms used to help build and deploy data platforms.

Item Purpose
nomachine Tool for remotely connecting to Linux desktop.
VMWare Workstation Virtual Machine for running a Windows VM on a Linux computer.
Amazon Web Services Cloud computing platform of choice for building and deploy data solutions.
Quarto Used for blog website.
GitHub Pages For serving up personal blog site.
Google Domains Used for custom domains
Squarespace Website builder for managing business website. Free tool for creating ERD diagrams, process flows, and AWS Architectures
Juicebox Analytics Data Storytelling tool
Markdown Markup language for writing documentation.
oh-my-bash Framework for managing bash configuration. There is also a zshell equivalent, oh-my-zsh equivalent.
Cookiecutter A project templating tool.

Visual Studio Code Extensions

Visual Studio Code Extensions I find useful.

Visual Studio Code Extension
GitHub Pull Requests and Issues
Project Manager
Remote Development
Code Spell Checker
Edit csv

Productivity and General Use

Productivity tools and general use programs.

Item Purpose
Trello Productivity tool used for planning out work
Google Workspace Frequently use Google Docs and Google Sheets
Brave A privacy first browser built on Chrome
Firefox Firefox has a Multi-Account Container supports logging into multiple accounts in a single Window.
Calendly Website to streamline scheduling meetings.
Feedly RSS Feed Tool that has a good free tier.
zoom Video Conferencing tool of choice.