Resources
Overview
This is a list of resources that I keep in mind when building data platforms. I also include some day-to-day tools I use to help manage projects, time, and communication.
Development
Development tools I use when building data solutions.
| Development Tools | Description |
|---|---|
| Visual Studio Code | A code editor that supports a wide variety of languages and uses-cases with extensions |
| Google Colab | Hosted jupyter notebooks for running python. Has access to GPU/TPU if you want to explore ML. |
| tmux | A terminal utility to allows running command line commands server-side. |
| Docker | Package software code with a complete environment. |
| Serverless | Utility that streamlines deploying serverless applications in cloud platforms such as AWS. |
| AWS Python CDK | Python Cloud Development Kit (CDK) allows for deploying AWS infrastructure using Python code. |
| data build tool(dbt) | Utility that supports for creating ELT data pipelines run within a database. Used for normalizing data into a data warehouse and curating data marts. Supports data testing. Auto-generated documentation allows for making it easy to understand how data is flowing through the data platform. |
| GitHub | Source control platform of choice. |
| GitHub Actions | Supports Continuous Integration/Continuous Deployment (CI/CD) |
| dbeaver | Tool for interacting with SQL databases |
| pre-commit | pre-commit is a tool that can run a series of checks when committing files to a git based source control |
Programming Languages
Programming languages I consider when building data platforms, doing statistical analysis, machine learning, etc.
| Language | Purpose |
|---|---|
| Python | A general usage language that supports building data platforms, data analytics, statistical analysis, and machine learning. |
| R | A language built for performing statistical analysis, machine learning, and data analysis. |
| Julia | A high performant language for scientific computing. |
| Scala | For writing performant Spark data applications. |
| Go | Building efficient data microservices. |
| SQL | Structured query language for pulling data from relational databases. |
| bash | A shell scripting language that can be helpful in automating tasks. |
Python Development Tools
| Tool | Type | Purpose |
|---|---|---|
| black | Clean Code | black is a python code formatter that helps keep code consistently formatted across teams |
| isort | Clean Code | isort helps organize python imports |
| flake8 | Clean Code | flake8 is a code linter that enforces coding styles such as flagging unused imports |
| mypy | Clean Code | mypy is a static type checker that helps solve the problem in python with dynamic typing |
| pytest | Testing | pytest is my unit testing tool of choice for it’s extensibility for building test harnesses |
| great-expectations | Testing | Supports data quality testing and validation |
| Test-Driven Data Analysis | Testing | A package that supports integrating data quality tests with unit tests such as pytest |
| poetry | Environment Management | A python package that supports managing virtual environments |
| data version control(dvc) | ML Ops | Utility that provides a streamlined approach for managing machine learning models from development to production. |
| awswrangler | AWS SDK | AWS Development |
Favorite Python Analysis Tools
| Tool | Type | Purpose |
|---|---|---|
| jupyter notebooks | Analysis Tool | Jupyter notebooks allow for running python in a cell format and being able to immediately see the results |
| PySpark | Data Processing | Provides a Python interface to creating Spark applications |
| pandas | Dataframe | pandas allows for importing data into a tabular data structure and can perform cleaning and analysis activities in python |
| scipy | Calculations | Python package for scientific computing |
| numpy | Calculations | Python package for mathematical computing |
| streamlit | Interactive Data Visualization | Streamlined method for creating interactive dashboards in python |
| plotly | Interactive Data Visualization | Streamlined method for creating interactive visualizations in python |
| plotnine | Static Data Visualization | Allows the creation of ggplot visualizations in python |
| scikit-learn | Machine Learning | Machine learning in python |
| keras | Deep Learning | A high level deep learning library for creating deep learning models |
R Development Tools
| Tool | Type | Purpose |
|---|---|---|
| renv | Environment Management | A tool for managing R packages used for a project. |
| RStudio | IDE | A IDE tailored to running R code. It can also run Python but I have not tested this out. |
R Analysis Tools
| Tool | Type | Purpose |
|---|---|---|
| tidyverse | Analysis | An ecosystem of r-packages that support doing a wide variety of data science/analytics tasks |
| r-shiny | Interactive Dashboard | A framework for creating interactive dashboard sites with R. |
IT Tools
This section includes a wide variety of tools and platforms used to help build and deploy data platforms.
| Item | Purpose |
|---|---|
| nomachine | Tool for remotely connecting to Linux desktop. |
| VMWare Workstation | Virtual Machine for running a Windows VM on a Linux computer. |
| Amazon Web Services | Cloud computing platform of choice for building and deploy data solutions. |
| Quarto | Used for blog website. |
| GitHub Pages | For serving up personal blog site. |
| Google Domains | Used for custom domains |
| Squarespace | Website builder for managing business website. |
| draw.io | Free tool for creating ERD diagrams, process flows, and AWS Architectures |
| Juicebox Analytics | Data Storytelling tool |
| Markdown | Markup language for writing documentation. |
| oh-my-bash | Framework for managing bash configuration. There is also a zshell equivalent, oh-my-zsh equivalent. |
| Cookiecutter | A project templating tool. |
Visual Studio Code Extensions
Visual Studio Code Extensions I find useful.
| Visual Studio Code Extension |
|---|
| GitHub Pull Requests and Issues |
| Python |
| Quarto |
| markdownlint |
| Project Manager |
| Remote Development |
| Code Spell Checker |
| Docker |
| Edit csv |
Productivity and General Use
Productivity tools and general use programs.
| Item | Purpose |
|---|---|
| Trello | Productivity tool used for planning out work |
| Google Workspace | Frequently use Google Docs and Google Sheets |
| Brave | A privacy first browser built on Chrome |
| Firefox | Firefox has a Multi-Account Container supports logging into multiple accounts in a single Window. |
| Calendly | Website to streamline scheduling meetings. |
| Feedly | RSS Feed Tool that has a good free tier. |
| zoom | Video Conferencing tool of choice. |