This page has some for-fun projects that I've worked on.
In 2016 I started a side project aimed at estimating radiation doses for cancer therapy using machine learning algorithms instead of physics based Monte Carlo.
Over the span of a few months I gave some introductory talks on neural networks and deep learning for the Houston Data Science Meetup group. The slides are linked below.
slots is a Python library that lets you explore and use several strategies for the multi-armed bandit problem. slots is available for installation from PyPI via "pip install slots".
Klackers (a.k.a. Shut the Box) is a dice game, often played in bars. The Klackers box has nine "tiles" numbered 1-9. A player rolls two dice, then flips down tiles that sum to the value of their roll. The player continues to roll the dice and flip tiles until they are no longer able to find a combination of tiles that sum to the dice or they have flipped all of the tiles. The player's score is the sum of the un-flipped tiles.
To determine the best simple strategy for Klackers I ran a series of Monte Carlo simulations. I created the simulation in Python. The code is found here on GitHub.
Markov chains can be used to model probabilistic processes, such as financial markets or, in this case, children's games. This project is a visualization of the Markov chain model described by Nick Barry in a popular post contrasting the Monte Carlo and Markov chain methods.
As someone who enjoys getting around town by bicyle, I thought it would be interesting to try to quantify how "bikeable" different parts of Albuquerque are. Using data made available by the city of Albuquerque, Samat Jain and I put together this "bikeability" map as part of the 2013 ABQ Hack Day.
Using Python and pandas, we converted XML files to JSON, extracted the values of interest, and calculated a score based on presence and type of biking infrastructure. The map was created with Leaflet.js via Folium, OpenStreetMaps data, and Jinja2. The code is found here on GitHub.
In 2013 I gave a talk with Steve Koch at the ABQ Tech Fiesta titled "Data Science, Big Data, and other buzzwords". We decided to make a more generic version (PDF) of that talk and open source it. This talk was aimed at a general technical audience and discussed big data, its history, and data science and its components, including data munging, statistics, machine learning, and visualizations. Hopefully others will find the slides and graphics useful for their purposes.
These slides were made using the Beamer package for LaTeX. Our original graphics were created as SVG's in Inkscape. The presentation is primarily licensed under the CC BY-SA 4.0 terms (see the License.txt file for full details). The source code is found here on GitHub.
As a grad student I often wondered how salaries varied across and within departments. I was able to take UNM's public salary data and do a little data crunching to come up with some comparisons.
This project was 95+% data munging/cleaning. The data went from paper → PDF → OCR → semi-structured text. Tools used included: Python, pdftk, IPython, OpenRefine (a.k.a Google Refine), pandas, matplotlib, numpy, and Inkscape.
Medical physics has been slow to join the open access movement, but it's catching on. In this project I took a look at the papers submitted to the medical physics category (physics.med-ph) on arXiv.org.
To get the data I used the arXiv's public API. Data munging, analysis, and visualization was done with Python, IPython, feedparser, fuzzywuzzy, numpy, scipy, matplotlib, NetworkX, and Inkscape.
Not strictly a data project, but rather an information design project, sparkmeters are small graphics that are inline with text showing the relative value of the item they follow. They are inspired by and reminiscent of sparklines.