January 16, 2016

Migrating Vagrant setup from Puppet 3 to Puppet 4 (manifestdir)

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 10:11 pm

I like Puppet with Vagrant. Puppet 4 removed an option I really liked: manifestdir

You see, often when I’d start a greenfield project, I’d include a Vagrantfile so getting a new developer set up is one command. I’ve talked about this in the past.

Now-a-days I want to keep my puppet scripts in a folder, organized and slightly away from the main code.

Because I’m boring I call this puppet/.

manifestdir let me do that: shove everything into puppet/ and not see it. One simple flag passed into Puppet from Vagrant.

I knew Puppet 4 was going to remove manifestdir, but I could ignore the problem as long as Vagrant base boxes shipped with Puppet 3.7. Which they no longer do – it seems to be 4.x now-a-days.

Bitrot is rough in the DevOps world.

It also means I had to revisit territory from the first half of an earlier blog post.

I figured out how to solve my problem by abusing Puppet’s Environment system

Skip ahead and see the diff

In my Vagrant setup I’ll have Puppet modules specific to my app: telling Puppet to install this version of Ruby, that database, this Node package, whatever. I’ll also have third party modules: actually doing the heavy lifting of downloading the right Linux package for whatever, etc.

So, I’m building on some abstraction.

I use puppet module install to pull in third party modules. Puppet 4 puts them in a new place, I specifying the environment, to keep the cognitive dissidence low. I don’t strictly have to do this, but I think it’s good.

Note that we don’t want our third party modules to be in the same places as our specific modules: if installed them in the same place then we’d have to deal with these extra files in our source tree.

You see, when Vagrant starts up it creates a folder: /tmp/vagrant-puppet/ – it’s a shared folder so anything extra put in there shows up in our source directory.

So puppet module install installs third party modules in one place, and Vagrant installs our modules in another place.

Here’s where environments come in:

  1. We set our environment_path in the Vagrantfile to be ./. This is where Puppet will go looking for environments to load
  2. We set our environment – aka the environment Vagrant will tell Puppet to use – to a folder named puppet in the source directory. (remember that?)
  3. Puppet environments can contain three things: a config file, a modules folder and a manifests folder

Our config file sets the path for modules: we tell Puppet to look in our puppet/modules file, then look in the directory where puppet module install downloads its modules to, then look at the base module directory.

We need our config file because by default Puppet will look for modules in our environment’s module path, and the base module directory… and not where puppet module install puts things. (or so it seems…)

So that’s how you mis-use environments to get manifestdir “working” again.

May 24, 2015

Rapid System Level Development in Groovy

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 2:53 pm

Introduction: Setting the stage

Lately I’ve found myself turning to Groovy for — oddly enough — system level development tasks. This is an unexpected turn of events, and seemmingly mad choice of technologies to say the very least.

Why Groovy?:

  1. I can’t assume OS (so Unix command line tools are out). One of my recent tasks involved something pretty complex, so shell script was out anyway.
  2. I can’t assume Ruby or Python is installed on any of these machines, but I can assume the JVM is installed.
  3. Groovy is a not bad high level language that I’ve also been using for other larger (non system level) programs.
  4. Since I’m on the JVM I can make executable jars, bundling up Groovy and all other dependancies into an easy to run program.

That last point is the real kicker. I want these programs to be easy to run. Even with that, as much as three days ago I wouldn’t have imagined doing programming like this in Groovy.

But this article isn’t about my poor choices in system languages: it’s about a workflow for small Groovy tools, from inception to ending up with an executable jar.

“But, but, what about?”

But, but, what about Go?“. I hear you, and I almost wrote my scripts in Go. Especially with the new easy cross-compilation stuff coming in Go 1.5. I expect to write tools like this in Go in the latter half of 2015 (or: whenever Go 1.5 is released + probably a couple months). I don’t have the patience to learn how cross compilation works today (Go 1.4).

But, but what… executable jars with Groovy?! Aren’t your jars huge?” Yeah, about 6MB a pop. I’ll admit this feels pretty outrageous for like a 50 line Groovy program. I’m also typing this on a machine with 3/4rds of a TB of storage… so 6MB is not a dealbreaker for me at current scale. But it does make me sad.

But, but what about JVM startup costs?” Yup, this is somwhat of a problem, even in my situation. Especially when in rapid development mode. This is another place where I almost wish I was writing in Go (cheap startup and compile times).

But this article is about rapid development in Groovy: going from an idea to releasing an embedable jar – maybe for systems programming, maybe for other things.

Fast, initial development of small Groovy scripts

As a newcomer to the Groovy scene I’ve Googled for this information, and found a couple of disjointed (and in some cases bitrotted) pieces on how to do these things (primarily packaging executable jars created from Groovy source). I hope another person (newcomer or otherwise) finds it useful.

Create your maven structure (which we’ll promptly ignore)

$ mvn archetype:generate -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false -DgroupId=com.wilcoxd -DartifactId=RapidDev (with values appropriate for your groupId and artifactId)

Dig into the generate src/main/java/com/wilcoxd and:

  1. make a new Groovy file
  2. Open the new Groovy file in your editor
  3. Add package com.wilcoxd as the first line of the file. Substitute com.wilcoxd with the groupId you specified in the mvn archetype:generate command.

While semantically, you should rename your java folder to groovy, that doesn’t seem to work with packaging process to create the executable jar. Just leave it be (I guess).

Rapidly develop your Groovy project (with two tricks)

The nice thing about Groovy is that you can write your Groovy program just like you would expect to write a Ruby or Python or Javascript: just type code into a file and Groovy will Figure It Out(TM).

Trick One: develop running your script directly

  1. cd into /src/main/groovy/com/wilcoxd/
  2. Write your script in the body of your .groovy file.
  3. Occasionally pop out to the command line and run groovy RapidDev.groovy (or whatever your script is called)

Groovy does a fair bit of work to execute your (even unstructured!) code. There’s some magic here that I don’t fully understand, but whatever.

$ vi RapidDev.groovy
.... type type type...

$ cat RapidDev.groovy
package com.wilcoxd

println "hey world"

$ groovy RapidDev.groovy
hey world

Crazy talk! No class needed, I don’t even need a public static void main style function!

Trick Two: dependency “management” with @Grab

If you find yourself needing a third party module, use @Grab to get it.

We’ll set things up with Maven, properly, later. Right now we’re concentrating on getting our program working, and turns out we need to make a RESTful API request (or whatever). We just need a third party module.

$ cat RapidDev.groovy
package com.wilcoxd

@Grab(group='com.github.groovy-wslite', module='groovy-wslite', version='1.1.0')
import wslite.rest.*

println("hello world!!!")

@Grab pulls in your dependancies even without Maven. I don’t want to introduce Maven here, because then I have to build and run via Maven (I guess?? Again, newbie in JVM land here…). Magic-ing them in with @Grab is probably good enough.

I’m sure Grab is not sustainable for long term programs. In fact, this isn’t a long term proposition: in fact we’re going to remove comment out grab the second we get this script done.

... iterate: type type, pop out and run, type type type...

$ groovy RapidDev.groovy

... IT WORKED! ...

We’re done coding! Now time to Set up pom.xml!

Yay, we’re done. Our rapid, iterative development cycle let us quickly explore a concept and get a couple dozen or a couple hundred lines of maybe unstructured code out. Whatever, we built a thing! Development in the small is nicer, sometimes, than development in the large: different rules apply.

But now we need to set up pom.xml, so it builds a jar for us.

Specify your main class property

Add this as a child of the <project> in your pom.xml:


Adjust the value of start-class as appropriate for your class / artifact ID from the mvn artifact:generate part of this.

Add Groovy and other third party modules you @Grabed into your <dependancies> section

Something like this (with a translated dependency, @Grab syntax to Maven syntax for the wslite module we previously grabbed above), to the <dependencies> section:


Once you have this in your pom, comment out the @Grab declaration in your source file

Add build plugin dependancies (another child of <project>):




            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">

Know your Groovy code will be groovy, unstructured and all

As mentioned before, Groovy goes to some magic to implicitly wrap a class around unstructured code. In fact, it will use the name of the file as the name of the class (so name your files like you would name Java classes!).

In our example, we’ve been editing RapidDev.groovy, which Groovy will wrap up in a class RapidDev declaration… or something. That package com.wilcoxd means Groovy will actually wrap our unstructured code into a class com.wilcoxd.RapidDev… which is a fine name and what we specified in our pom’s start-class property.


With a simple mvn package we can bundle our Groovy script up to an executable jar. A java -jar target/RapidDev-1.0-SNAPSHOT.jar runs it.

Which is awesome! I can take this and run it on any system with the JVM! I can write my “complex” systems level program once and run anywhere! I can reach deep into the Java ecosystem for spare parts to make my development easier, and still have a rapid development cycle one expects out of Python or Ruby.

Pretty neat!

March 2, 2015

A Deep Dive Into Vagrant, Puppet, and Hiera

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 1:08 am

A Vagrant setup that supports my blog entry on “Vagrant / Puppet and Hiera Deep Dive”. Below is a reproduction of that article.


This weekend I spent far more time than I’d like diving deep into Puppet, Hiera and Vagrant.

Puppet is a configration/automation tool for installing and setting up machines. I prefer Puppet to other competitors (such as Chef) for Reasons, even though I also use Chef.

Hiera is an interesting tool of Puppet (with no equivalent I’ve found in Chef, at least that I’ve found): instead of setting variables in your configuration source, do it in YAML (or JSON or MYSL or…) files. This ideally keeps your Puppet manifests (your configuration sourcecode) more sharable and easier to manage. (Ever had a situation in general programming where you need to pass a variable into a function because it’s passed to another function three function calls down the stacktrace? Hiera also avoids that.)

However, documentation on Puppet, Hiera is pretty scarce – especially when used with Vagrant, which is how I like to use Puppet.

This article assumes you’re familiar with Vagrant.

My Vagrant use cases

I use (or have used) Vagrant for two things:

  1. To create local development VMs with exactly the tools I need for a project. (Sample)
  2. To create client serving infrastructure (mostly early stage stuff).

For use case #2, usually this is a client with a new presence just getting their online site ramped up. So I’m provisioning only a couple of boxes this way: I know this wouldn’t work for more than a couple dozen instance, but by then they’re serving serious traffic.

My goal is to use Vagrant and 99% the same Puppet code to do both tasks, even though these are two very different use cases.

Thanks to Vagrant’s Multi VM support I can actually have these two VMs controlled in the same Vagrantfile

First, general Vagrant Puppet Setup Tricks

File Organization

I set my Vagrantfile’s puppet block to look like this:

config.vm.provision "puppet" do |puppet|
  puppet.manifests_path = "puppet/manifests"
  puppet.manifest_file  = "site.pp"

  puppet.module_path   = "puppet/modules"

Note how my manifests and modules folder are in a puppet folder. Our directory structure now looks like:


Why? Vagrant, for me, is a tool that ties a bunch of other tools together: uniting virtual machine running with various provisioning tools locally and remotely. Plus the fact that the Vagrantfile is just Ruby means that I’m often pulling values out into a vagrantfile_config pattern, or writing tools or something. Thus, the more organization I can have at the top level the better.

Modules vs Manifests

I tend to one module per project I’m trying to deploy. By that I mean if I’m deploying a Rails bookstore app, I’ll create a bookstore module. This module will contain all the manifests I need to get the bookstore up and running: manifests to configure mysql, Rails, redis, what-have-you.

Sometimes these individual manifests are simple (and honestly probably could be replaced with clever hiera configs, once I dig into that more), and sometimes a step means configuring two or three things. (a “configure mysql” step yes, needs to use an open source module to install MySQL, but also may need to create a mysql user, create a folder with the correct permissions for the database files, set up a cron job to backup the database, etc)

I also assume I’ll be git subtree-ing a number of community modules directly into my codebase.

My puppet/manifests/ folder than ends up looking like a poor man’s Roles and Profiles setup. I take some liberties, but it’s likely the author is dealing with waaaaay more Puppet nodes than I’d ever imagine with this setup.

Pulling in third party Puppet modules

The third party Puppet community has already created infrastructure pieces I can use and customize, and has created a package manager to make installation easy. Except we need to run these package managers before we run Puppet on the instance!

Vagrant to the rescue! We can run multiple provisioning tasks (per instance!) in a Vagrantfile!

Before the config.vm.provision "puppet" line, we tell puppet to install modules we’ll need later:

    config.vm.provision :shell, :inline => "test -d /etc/puppet/modules/rvm || puppet module install maestrodev/rvm"

Because the shell provisioner will always run, we want to test that a Puppet module is not installed before we try to install it.

There are other ways to manage Puppet modules, but this simple shell inline command works for me. I’ll often install 4 or 5 third party modules this way, simply copy/pasting and changing the directory path and module name. As long as I’m before the puppet configuration block these modules will be installed before that happens.

Uninstalling Old Puppet Versions (and installing the latest)

This weekend I discovered a Ubuntu 12 LTS box with a very old version of Puppet on it (2.7). I have a love/hate relationship with Ubuntu LTS: The LTS means Long Term Support, so nothing major changes over the course of maybe 5 years. Great for server stability. However, that also means that preinstalled software that I depend on may be super old… and I may want / need the new version.

I ended up writing the following bash script:

#!/usr/bin/env bash
# This removes ancient Puppet versions on the VM - if there IS any ancient
# version on it - so we can install the latest.
# It is meant to be run as part of a provisioning run by Vagrant
# so it must ONLY delete old versions (not current versions other stages have installed)
# It assumes that we're targeting Puppet 3.7 (modern as of Feb 2015...)

INSTALLED_PUPPET_VERSION=$(apt-cache policy puppet | grep "Installed: " | cut -d ":" -f 2 | xargs)
echo "Currently installed version: $INSTALLED_PUPPET_VERSION"

if [[ $INSTALLED_PUPPET_VERSION != 3.7* ]] ; then
  apt-get remove -y puppet=$INSTALLED_PUPPET_VERSION puppet-common=$INSTALLED_PUPPET_VERSION
  echo "Removed old Puppet version: $INSTALLED_PUPPET_VERSION"

It assumes your desired Puppet version is 3.7.x, which should be good until Puppet 4.

I also have a script that installs Puppet if it’s not there (maybe it’s not there on the box/instance, OR our script above removed it). I got it from the makers of Vagrant themselves: puppet-bootstrap.

Again, added before the config.vm.provision :puppet bits:

config.vm.provision :shell, path: "vagrant_tools/remove_puppet_unless_modern.sh"  # in case the VM has old crap installed...
config.vm.provision :shell, path: "vagrant_tools/install_puppet_on_ubuntu.sh"

Notice that both these shell scripts I store in a vagrant_tools directory, in the same folder as my Vagrantfile. My directory structure now looks like:


Puppet + Hiera

Using Hiera and Vagrant is slightly awkward, especially since many of the Hiera conventions are meant to support dozens or hundreds of nodes… but we’re using Vagrant, so we may have one – or maybe more, but in the grand scheme of things the limit is pretty low. Low enough where Hiera gets in the way.


The way I figured out how to do this is create a hiera folder in our puppet folder. My directory structure now looks like this:


A reminder at this point: the VM (and thus Puppet) have their own file systems disassociated with the file system on your host machine. Vagrant automates the creation of specified shared folders: opening a directory portal back to the host machine.

Implicitly Vagrant creates a shared folder for manifest_path and module_path folders. (In fact, these can be arrays of paths to share, not just single files!!!)

Anyway, our hiera folder must be shared manually.

Note here that Vagrant throws a curveball our way and introduces a bit of arbitraryness to where it creates the manifest and module folders. You’re going to have to watch the vagrant up console spew to see where this is: with the vagrant_hiera_deep_dive VM the output was as follewed:

==> default: Mounting shared folders...
    default: /vagrant => /Users/rwilcox/Development/GitBased/vagrant_hiera_deep_dive
    default: /tmp/vagrant-puppet-3/manifests => /Users/rwilcox/Development/GitBased/vagrant_hiera_deep_dive/puppet/manifests
    default: /tmp/vagrant-puppet-3/modules-0 => /Users/rwilcox/Development/GitBased/vagrant_hiera_deep_dive/puppet/modules 

Notice the /tmp/vagrant-puppet-3/? That’s your curveball: it may be different for different VM names (but is consistant: it’ll never change)

So, create the shared folder in the Vagrantfile:

config.vm.synced_folder("puppet/hiera", "/tmp/vagrant-puppet-3/hiera")

Likewise, we’ll want to add the following lines to the puppet block

puppet.hiera_config_path = "puppet/hiera/node_site_config.yaml"
puppet.working_directory = "/tmp/vagrant-puppet-3/"

Important notes about the hiera config

It’s important that Hiera only likes .yaml extensions, not .yml.

It’s also important that yes, having both the node_site_data.yml and node_site_config.yml files do feel a bit silly, especially at our current scale of one machine. Sadly this is not something we can fight and win, but a limitation of the system. Hiera’s documentation goes more into config vs data files.

But also note that the node_site_config file points to node_site_data, via Hiera’s config file format.


I’ve been using Vagrant and Puppet at a very basic level a very long time (something like 5 years, I think). From best practices I’ve been using for years, to new things I’ve just pieced together today, I hope this was helpful to someone.

Explore this article more by looking at the Vagrant setup on Github

July 21, 2014

Using the CSV NPM Module

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 2:39 pm

Today I had to use the CSV Node Module. It looked like the best and most mature of the alternatives.

The disadvantage about it is that the examples – especially for taking Javascript objects and getting a CSV string out – really leave something to be desired.

To combat this I wrote a simple Node program to illustrate how to write CSV files with the module. Enjoy!

CSV 0.4.0 example

For an example that uses the old – CSV 0.2.0 syntax – see below. Yes, I only provide an example of the callback syntax. You should really upgrade to CSV 0.4.0.

CSV 0.2.0 example


February 10, 2014

My Base Rails Setup, 2013 Edition

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 10:32 am

In 2010 ago I wrote My Base Rails Setup.

I looked at it again and it looks pretty dated. The Rails community changes a lot in just 3 years.

Here are the tools I run into with frequency on Rails projects, and some notes where that has changed from the 2010 tools of choice.

Over the last 3 years I’ve done full time, then part time, consulting, mostly in Rails. The advantage here is that I see a lot of Rails apps written by other people. Occasionally I also get to start a Rails app from scratch.

So, my choices tend to be relatively conservative: tools that are either so common that I pick them because “everyone” uses them, or tools that introducing some potentially odd tool to the Rails app is negated by the clear win it has over the “competition” (be that existing gems or roll-your-own.

I’m very thankful that I’ve gotten to see a lot of Rails apps, of all sizes, in my nearly 5 years doing Ruby on Rails.

Reviewing Old Picks

Looking back at my tools from 2010:

  • will_paginate: I mostly see and use Katamari these days
  • formtastic: I’m still a big Formtastic fan, but I mostly am brought into already existing projects, and the court of public opinion has chosen simple_form.
  • DataMapper: in the dozens and dozens of Rails projects I’ve been on I’ve seen DataMapper once.
  • sentient_user There’s still no substitute, and I’m surprised this doesn’t get more love than it does.
  • show_for Not used enough in my opinion.
  • annotate_models: The community “solution” to “what columns does this table have in it?” appears to be, “open up schema.rb in another window”. So that’s what I do.
  • inherited_resources: Responders in Rails 3 cut down some of the cruft that I would use inherited resources for. I always assume Responders will be a good 80% solution and keep my eyes out for when to refactor the action into old style respond\_to blocks.
  • Data Migrations: There’s still no substitute – db/seeds.rb sucks for serious data migration tasks. There is a gem that provides similar abilities to the Rails plugin I linked to a few years ago: active_data_migrations. Sadly another place where worse (seeds.rb) seems to be better.
  • shoulda: RSpec won the war.
  • Timecop Still the best.
  • Macinist Factory Girl won. Thankfully FactoryGirl has gotten better syntax over the years (or I’ve gotten accustomed to it).
  • Mailtrap still maybe the best. An interesting up and comer is Mailcatcher, but I’m not 100% sold.
  • rr: Rspec’s mocking framework seems to have won the war here too.
  • jQuery’s timeago plugin Still the best.
  • BlueprintCSS: Now-a-days the winner is Twitter Bootstrap. I like the bootstrap-sass gem with the twitter-bootstrap-markup-rails gem abstracting away common constructs (alerts, tabs, etc).
  • utility_belt: All Hail Pry

So, we have better tools for a lot of those. The new picks are certainly on my “base rails tools” list.

New Base Gems/tools

Here are the new gems/tools I’ve either seen in common use, or always apply when I come into projects:

  • foreman every Rails app pulls in at least 1, if not 3 or 4, additional services. Redis, memcached, or solr are the usual suspects here, but it varies. Foreman lets me launch all those services with a simple command, stopping them all when I need to.
  • pry is the top debugger for Ruby. Especially when coupled with the pry-stack_explorer to explore the callstack and plymouth to open up pry when a test fails.
  • A Rails VM setup with Puppet. Every project I’m on I use this to create a VM and set up packages required for the system to operate.
  • cancan. While it needs some love, cancan is (sadly) still the authorization solution solution most used on Rails projects I see. I wrote a blog entry on organizing complex abilities. Cancan goes firmly in the “solutions in common use in apps I see”, not in the “tools I like using”.
  • rack-pjax. Useful if you’re using PJAX. If you need to optimize for speed you could write a method that sets or unsets the layout depending on if the request is PJAX or not… but rack-pjax is “drop in and you’re done”. I’m pretty sold on using PJAX for returning partial page content
  • rabl define your JSON/BSON/XML structure in a view file. jbuilder is the default in Rails 4, doing the same kind of thing, but I haven’t used it.

Right time, right place tools

Sometimes tools aren’t everyday tools. Sometimes the perfect tool, used in just the right place, is a godsend. Here is a list of tools like that, tools I’ll apply when the opportunity presents itself, although it only occasionally does:

  • pessimize writes gem versions to your Gemfile, so you can be very liberal with gem versions in development but then very conservative when time comes for production.
  • versioncake If I’m intentionally writing an API I want to version that API. Version Cake loads up the proper RABL or Jbuilder file for the specified version. (Yes, yes, see also Designing Hypermedia APIs).
  • SASS variables and the asset pipeline an approach when we need SASS global variables, working around limitations of the asset pipeline.
  • my ‘s’ helper. If I know I’m going to be on a project for a long while, and I join HTML sets together twice, I’ll bring this snippet in.


I’m excited to see what the next 3 years bring in Rails tools changes, and to see how this list stacks up then!

December 28, 2013

Rails Project Setup Best Practices

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 1:26 am

As a long time Rails consultant, every new Rails project I come on to I go through the same dance:

  1. Does this project have a useful README.markdown? Maybe with setup instructions?
  2. No? Just Rail’s default README.markdown? Shucks.
  3. Does this project have a database.yml file?
  4. No? Great. Does this project have a sample database.yml file I can copy and get sane defaults for this project?
  5. Does this file have a .ruby-gemset file?
  6. Yes? Great. Does this .ruby-gemset have sane defaults, or is the gemset named project or something non-obvious?
  7. Is there a redis or solr config file I need to edit?
  8. Do I need to set up machine specific variables or settings anywhere? (For example, in .env, config/settings.yml, or config/secrets.yml, or even just in environments/development.rb?).
  9. No? Ok, great, does the app assume I’m running it on OS X with packages installed via Homebrew? (Because I’m usually not. And if I am running your project on bare metal, I prefer Macports.)
  10. Is there a Procfile?
  11. Yes? Great. Does that Procfile work for development, or is it a production-only Procfile?
  12. No Procfile? What services do I need to run to use the app? Redis? Solr? Some kind of worker queue mechanism?
  13. How do I run all the tests?
  14. rake db:setup
  15. rake spec
  16. Did something fail because I missed a config setting or some service isn’t running? If true, fix and GOTO 15.
  17. Awesome, it worked.
  18. Are there Cucumber or Selenium tests?
  19. Run those, somehow.
  20. Fire up Rails server
  21. When I visit the development version of the site, is there a special account I log in as? Or do I register a user account then browse through the code figuring out how to make myself an admin or registered user or what, then do that via rails console?

You could split these questions up into configuration questions and runtime questions. This blog entry will show best practices I try to install on (almost) every Rails project I touch.


Runtime is the easiest, so I’ll tackle it first.

In my mind this is mostly solved by Foreman and a good Procfile, or set of Procfiles.

Setup with Procfiles and Foreman

A Procfile will launch all the services your app needs to run. Maybe you need Solr, Redis, the Rails server, and MongoDB up: you can set up a Procfile to launch and quit those services all together when you’re done.

Heroku uses Procfiles to make sure everything’s up for your app. Heroku’s usually my default, “pre-launch to mid-traction point” hosting choice because of its easy scaling and 2 minute setup process.

Heroku also provides tons of addons, adding features to your app. Sometimes these features are bug reporting or analytics, and sometimes the Heroku addons provide additional services. Two addons that do this are Redis 2 Go, and ElasticSearch from Bonsai.io.

If an app uses Redis, is deployed to Heroku, and uses the Redis 2 Go addon, then the app doesn’t need to have Redis in its Procfile.

However, when I’m developing the app I need these services running locally.

Foreman takes care of this, reading a Procfile (a default Procfile or one I specify) and firing up all of the services just like Heroku does. Don’t Repeat Yourself in action.

When I’m setting up a project that’s getting deployed to Heroku I create two Procfiles: one Procfile and one Procfile.development.sample. (I add Procfile.development to the .gitignore file in Git).

The Procfile.development.sample is important for two reasons:

  1. It lists all the services I’ll need to be running as a developer
  2. It can be used as is, or if a developer has say Mongo already running via a startup script, but the Procfile.development.sample tries to launch it again, they can copy the file, rename it to Procfile.development, and remove the line about starting up Mongo.

When I’m not deploying to Heroku I’ll still create a Procfile.development.sample for ease of starting up servers.

Running all the tests

Testing is big in the Rails world, and there’s a lot of ways to test Rails apps. RSpec with maybe Cucumber is usually what I see, but sometimes there’s a Selenium suite or Steak or something.

When I’m setting up a Rails project I write a quick Rake task to run all the test suites. For RSpec + Cucumber it looks something like this:

namespace :test do

  desc "Run both RSpec test and Cucumber tests"
  task "all" => ["spec", "cucumber"]

As a developer on the project – especially a new developer – I just want to type in one command and know I’ve tested all the app there is to test.


When I’m setting up a project I create sample files for each configuration file that might be modified by a developer. So, files with names like:

  • config/database.sample.yml
  • ruby-gemset.sample
  • config/redis.sample.yml
  • .env.sample
  • config/secrets.sample.yml

But this still doesn’t solve our song and dance from the beginning of the blog entry: there’s still a lot to configure, even if I have sample files to copy and rename!

Like any good geek, I’ve replaced this frustration with a small shell script (template). Each project is different, and so each bin/install.sh will look a little different, but here’s a recent one I made for a non-trivial project:


# If you want to go fancier, see some prompts in
# <http://stackoverflow.com/questions/226703/how-do-i-prompt-for-input-in-a-linux-shell-script>

if [ ! -e Procfile.development  ]
    cp Procfile.development.sample Procfile.development

    echo "Do you wish to edit Procfile.development?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) $EDITOR Procfile.development; break;;
        No ) break;;

if [ ! -e config/database.yml  ]
    cp config/database.yml.example config/database.yml
    echo "See the default database.yml?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) cat config/database.yml.example; break;;
        No ) break;;

    echo "Do you wish to edit this database.yml?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) $EDITOR config/database.yml; break;;
        No ) break;;

if [ ! -e config/redis.yml  ]
    cp config/redis.yml.example config/redis.yml
    echo "Do you wish to edit redis.yml?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) $EDITOR config/redis.yml; break;;
        No ) break;;

if [ ! -e .ruby-gemset ]
    echo "Do you wish to create a .ruby-gemset file and edit it?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) cp .ruby-gemset.copy .ruby-gemset; $EDITOR .ruby-gemset; break;;
        No ) break;;

if [ ! -e .env ]
    cp .env.sample .env
    echo "Do you wish to edit .env?"
    select yn in "Yes" "No"; do
    case $yn in
        Yes ) $EDITOR .env; break;;
        No ) break;;

It’s not the prettiest example of a shell script ever, but it’s easy and fast to modify and should run in all shells (I avoided fancy zsh tricks, even though zsh is my primary shell).

Run this and it will guide you through all the files you need to copy, asking you if you want to edit the config file when it’s in place. For opinionated files, like .ruby-gemset, the script will ask what you want to do.

Each of my sample files contain sane default values, which should work for the developer, but they don’t have to.

Thoughtbot has some initial thoughts on project setup too (they call it bin/setup), but they take a slightly different approach (and automatically set up different things). You could use there shell script along with mine if you wished.

My Ultimate New-To-This-Project Developer Experience

Since we’re talking about developer automation and project setup, I’d like to share my own dream experience:

  1. checkout code from Git repo
  2. “Oh, look, a Vagrantfile”
  3. $ vagrant up
  4. (15 minute coffee break while Vagrant boots up box and provisions it all for me, including Ruby setup)
  5. (During 15 minute coffee break, glance through the project’s README.markdown, see mention of running bin/install.sh)
  6. $ vagrant ssh
  7. $ cd $PROJECT
  8. $ bin/install.sh
  9. (Answers questions in install.sh and gets settings tweaked for this VM)
  10. $ rake db:setup
  11. $ foreman start -f Procfile.development
  12. $ rake test:all in a new tab. All tests pass.

Low barriers to entry, very automated project setup – help me get it set up right the first time. Help me be more productive faster.

You notice I called rake db:setup which creates a new database, loads the schema, and loads db/seeds.rb. Replace this step with “run migrations from 0” and “load in initial bootstrap data” if you wish. I’m usually in the “migrate from 0” camp, but usually find myself in a minority.

Anyway, If you compare the top list with this list you’ll see that the steps followed are very different. The first set of steps is hesitant: does this thing exist? Do I need to do X? The second set of setups is confident: The machine set this up for me and so hopefully everything is right.

In Summary

Here’s the best practices to take away from this blog entry:

  1. Consider creating a Vagrant setup for your project, including provisioning.
  2. Documentation in the README.markdown with basic “how to setup this project” instructions.
  3. Sample config files with values that are opinionated, but since they’re copied into place, easily changable.
  4. A bin/install.sh script like mine, or bin/setup script, like Thoughtbot’s.
  5. A Procfile just for developers
  6. A way to run all the tests, easily
  7. Load just enough sample data on a developer’s machine to allow them to get to major sections of your app without having to learn how to use it on day 1.

The easier it is for a developer to get up to speed on your project, the faster they can start getting real work done!

January 7, 2013

Develop For Good with Open Source (Sandy Disaster Recovery)

Filed under: General Information — Ryan Wilcox @ 10:03 am

A client of mine contacted me the other week. He lives in NYC, and was hit as part of Hurricane Sandy.

The trouble with all the relief efforts is that it’s hard to know what to do. What is your organization doing to help, and are there things you can do to get involved.

He explains it best:

Over the past several months a few developers and I have
created a collaborative work order system for disaster
recovery. We are making the
project open source, and providing it as a gift to the
disaster recovery community, for use in future disasters. The
platform implements a “Craigslist” philosophy to recovery
efforts-organizations that are aware of work orders enter them
into the system, and organizations with capacity to
help can claim and perform the work without any centralized
organization getting in the way. This should minimize
duplication and maximize efficiency.

Interested? He also Created a video

What’s awesome about this project is that it’s open source, on Google Code.

Or read the introductory blog post on the Disaster Recovery Work Order System

If you have some time, and Google App Engine experience, consider jumping in and helping!

October 28, 2012

Modern Cocoa Concurrency / Asynchronous Processing Patterns

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 9:35 pm


In the last several years Apple has been pushing us Cocoa developers away from thread based concurrency. Thread based concurrency tends to be bug-ridden, ceremony filled, and just unpleasant code.

There have also been some changes in the wider world of development that have changed how we as developers thing about concurrency.

Today there are six different patterns to choose from: NSOperationQueue, callbacks, futures, subscribers, GCD, and making your async APIs synchronous anyway.

This is meant to me a world-wind tour/survey of the various techniques Cocoa developers use to solve the thorny problems of asynchronicity and/or concurrency.


A really nice Cocoa class for doing The Right Thing when given N tasks to perform. I like it with blocks.

It’s a simple API which means it’s great for quick things, but it also lacks power. But sometimes you don’t need power!

You can also limit the number of operations running at a time. For example to limit the number of concurrent downloads.


Imagine some code that looks like this:

[myObject makeANetworkRequestWithCompletionHandler: ^(id result, NSError* err) {

  NSLog(@"I am done");

A lot of network APIs/Frameworks for Cocoa look this this (cough, coughFacebookcough, cough).

Callbacks are fine when you can decouple time from your program. “Whenever I get this data that’s fine, it’s just background data anyway”. Except imagine if it’s a list of documents stored on some server somewhere – parts of your UI might want that information to display data in a table or a count.

Yes, in some cases you can use interface feedback to tell the user to wait… Certainly a more responsive UI is better. But if you’re waiting for data from the network in order to parse and make another request on the network, you end up with callback/block soup.


Getting an object now that I can check and get my result “later” is appealing to me.

The most promising is DeferredKit – it returns you a defered object which you can easily add or chain callbacks too.

RESTKit uses the future pattern and allows you to assign callback blocks to properties on the object you are creating. These blocks are then called as part of the RESTKit object callback chain-of-events. Example:

record.onDidLoadResponse = ^(RKResponse* response) {
  // ...

Mike Ash also has a Friday Q&A about Futures

Reactive (Subscribers) model

Github has a ReactiveCocoa framework that does a bunch of crazy allowing you to subscribe to event completions and event changes.

I don’t yet completely understand the framework or mindset behind this. However, when my projects require a more comprehensive, and unified approach to async or concurrency – and I have a week to spare to properly grok this framework – I’ll be looking into ReactiveCocoa.

Grand Central Dispatch

Grand Central Dispatch is Apple’s approach at C level concurrency. There’s a lot in GCD, and I usually prefer using NSOperationQueue or something higher in the stack, but sometimes you really want a low level tool.

Like today. I needed to make 25 HTTP requests to a server, parse them, and let some other part of the app know when these requests had been parsed.

Grand Central Dispatch has two tools I tried:

  1. dispatch_async(dispatch_get_main_queue(), ^{...});
  2. dispatch_group_async
  3. dispatch_semaphore


When given a task (the code in a block) it will run it at some point in the future. Coupled with dispatch_get_main_queue you’ll be able to run things on the main loop at some point in the future.

Wait, why would you want to do that? Answer? cough, coughFacebookcough, cough.


You can create a group of dispatched tasks using dispatch_group_create(), dispatch_group_async(). You can even wait for all the tasks in the queue to complete with dispatch_group_wait().


Apple also provides a cheap, reference counting semaphore. dispatch_semaphore_create, dispatch_semaphore_signal and dispatch_semafore_wait are the important methods here.

You can even Construct your own dispatch_sync out of these primitives

dispatch_group_wait(), dispatch_semaphore and the current thread

the *_wait methods have one disadvantage: they are blocking. So, use case here is to do something like:

dispatch_group_t group = dispatch_group_create();

    dispatch_queue_create("MINE", DISPATCH_QUEUE_SERIAL),
    ^ {
      NSLog(@"hello world");

dispatch_group_wait(group, DISPATCH_TIME_FOREVER);

Notice that dispatch_group_async above creates a new dispatch queue. This means that “hello world” will be printed out in some other thread. The current thread will halt, waiting for your dispatch to be complete, then it will print out “done”.

Now, this behavior is bad if you’re already on the context you are dispatching to. For example, imagine a function that is called on the main thread, like a button outlet.

- (IBOutlet) handleButton: (id) sender {
  dispatch_group_t group = dispatch_group_create();

  dispatch_group_async(group, dispatch_get_main_queue(), ^{

    NSLog(@"Hello world");

  dispatch_group_wait(group, DISPATCH_TIME_FOREVER);

This will actually cause a deadlock – you’ve told the main thread to sit there, waiting, until something finishes, but you’ve also scheduled something to happen on the main thread. Nothing will happen.

Apple’s documentation isn’t exactly explicit about this, and I had to prove this to myself with some experimentation and some stack overflow reading. However, all is not lost.

Synchronous APIs in an Async World

My main reason for doing this is interacting with Facebook. Facebook’s network APIs are cranky when they’re not on the main thread.

I also really want to avoid the ceremony that comes with asynchronous callback handling. Setting up something to process the data whenever it shows up, some kind of wait indicator for the users, and potentially confused users when they look at an unexpectedly blank table (because we haven’t gotten the data yet).

So, for very limited but very “save myself a lot of typing and re-architecting of this whole chunk of code” situations I actually want to stop the flow of code at the callsite. I don’t want to block a thread waiting for results (especially since I may need to execute GCD dispatches on this thread)… but I do want to defer execution of the next lines of code until the dispatch is complete.

I ended up implementing ASYNC_WAIT_ON_SAME_THREAD for exactly this purpose.


In order to wait for an operation to complete we’ll want to avoid using GDC semaphores or waiting mechanisms. I eventually came up with this implementation of ASYNC_WAIT_ON_SAME_THREAD

void ASYNC_WAIT_SAME_THREAD(dispatch_queue_t queue, dispatch_block_t users_block) {

    __block BOOL allDone = NO;

    dispatch_async(queue, ^{

        allDone = YES;

    while( !allDone ){
        // Allow the run loop to do some processing of the stream
        [[NSRunLoop currentRunLoop] runUntilDate:[NSDate dateWithTimeIntervalSinceNow:0.5]];

Yes, it polls for a result. It’s inelegant, but the best solution I could come up with given the constraints.


Although things are happening in the main thread (your dispatches are running) the UI might feel very unresponsive to the user if you have many small dispatches lined up.

In fact, if you have an operation that takes 1 second, the user will see that lag. They press a button, your code uses ASYNC_WAIT and the app just sits there for 1 second. So ASYNC_WAIT is not a panacea, but in certain cases the code advantages outway the UI disadvantages.

Consider use of ASYNC_WAIT a code smell. It’s not that use of it is bad, but it represents a potentially dangerous area of code.

Then again, code smells are sometimes OK. IN the C++ world we have reinterpret_cast which tells the compiler, “Trust me, I know that this bit of data is actually this thing”. Use of a few reinterpret_casts to get yourself out of hairy situations is fine, but not every nail should be screwed with that particular socket wrench.


I hope this survey of the concurrency / asynchronous design patterns in Cocoa has been useful. I look forward to more and more work being done in this area by the community.

July 30, 2012

Testing URLs in Django (like Rails route testing)

Filed under: General Information — Ryan Wilcox @ 9:10 pm

I’m doing more Django work and find myself contrasting how Rails does things and how things are done in Django.

Routing is one of those things.

Both Django and Rails want you to use their systems to dynamically create URLs to other places on the site, instead of hard-coding the path in the href part of the a tag. This makes life easier both now and in the future.

In Django routes are configured manually through matching regular expressions to view functions. In Rails routing happens automatically (by convention) by a domain specific language and suffixing and prefixing various parts of the object and call graph together.

Rails has this interesting feature called route testing. The idea being that you’re testing the rest of your application, you should make sure that Rails is handling your URL paths the way you expect them to.

Django doesn’t have a testing best practice for this, and this article attempts to create one.

First, let’s see what URL paths we have defined

The first time I played with Django I was confused. In Rails I’m used to running rake routes and getting a list of my routes and the URL paths they might match. I couldn’t find such a tool for Django at the time.

Now the Django community has the django-extensions app. Django-Extensions adds new commands to manage.py, one of which is show_urls.

Let’s see part of show_urls in action, for a simple Django app:

$ python manage.py show_urls

/admin/logout/ django.contrib.admin.sites.logout logout
/blogs home.views.blog_list home.views.blog_list
/blogs/<slug>/ home.views.blogs_show home.views.blogs_show

I’m only showing you the most interesting parts of show_urls, but yes I have the Django admin turned on and I have a blog app.

Next, let’s test against those URLs

The slightly annoying thing about Django is that since you’re building up your URLs by configuring regular expressions (which, by the way, are order specific as Django goes with the first expression found)… the match is dependent on the data fed into the path.

In our case we have a /blogs/SLUG route. But perhaps your regular expression forgets something (like perhaps it doesn’t handle URL escaped text, which your slug might be made up of). /blog/today+was+a+good+day should match the home.views.blogs_show route just the same as /blog/todaywasa

This seems like the thing automated testing was made for – making sure that a simple test URL path goes to the view we want, and testing a more complicated match, and testing that Django doesn’t accidentally pick the “wrong” view because us failable humans screwed up some regex or placement.

So, you want me to make a ton more client requests?!!!

We want to do this quickly – we don’t want to build up huge test cases to test obscure URL path names. Thankfully Django provides the tools we need to test our paths:

from django.core.urlresolvers import reverse, resolve

So, no – “just add URL related tests to your existing tests” is not the best answer here

Requirements for URL testing in Django

Let’s think about how we want to test URLs and their patterns:

  1. We want to have a hard coded URL path: as if a browser or a user had typed it in
  2. We, as humans, know which URL pattern name we expect that to match to
  3. We know what (keyword) arguments should be extracted from the URL string
  4. It has to be super fast – ideally without having to instantiate test data or make a single request to the Django application server.

We also know we want to test this backwards and forewards: first taking the URL path and seeing if we get our URL pattern name out, then trying to construct our URL (with Django’s automatic URL creation tools) and seeing if we get our hard coded URL path out again.

Defining an API

Let’s imagine for a minute and create a test:

class TestURLs(TestCase):
    def test_blog_routes(self):
        routes_to_test = (
            dict(url_path = "/blogs"pattern_name="home.views.blog_list"),



        for stringOnestringTwo in test_paths(routes_to_test):

Here we have a list of routes to test and the attributes of each route: the url_path (what we would type into a browser address bar), the pattern_name (the name of the pattern / the pattern name we would use when creating our model’s get_absolute_url method, and lastly the kwargs we expect to be passed into our view by Django.

Implementing test_paths

test_paths ends up being quite simple – simple enough to put in a helper library!

from django.core.urlresolvers import reverseresolve

def test_paths(routes_to_test):
    for route in routes_to_test:
        path    = route["url_path"]
        pattern = route["pattern_name"]
        kwparams = route.get("kwargs")

        if kwparams:
            yield reverse(patternkwargs=kwparams), path
            yield reverse(pattern), path

        yield resolve(path).url_namepattern


Testing URLs in Django apps is simple with test_path!

July 14, 2012

Rails 3.2 attr_accessible, RailsAdmin, and “accessible by admins”

Filed under: ResearchAndDevelopment — Ryan Wilcox @ 10:59 pm

First of all, my blog is now back up! Instead of self hosting my WordPress blog, now the fine people at ZippyKid host it. These guys are awesome: taking a mess of an import and making it Just Work. A+++ WOULD DO AGAIN

Now, back to real news…

The Problem: Security in Rails: Say hello to the secure boss (different from the old boss)

Because of some high profile Rails sites being hacked, Rails 3.2 changed the default Rails model behavior to “only let users (and developers) update attributes in this list”. This list varies by model.

A refresher into the hack

Rails has this clever feature where you can say, “update this record with the form data passed in”. A semi-clever hacker could use this ability to change fields that the Rails developer didn’t intend to be changed (“shove this value into the POST params, even though there’s no field named that on the HTML for this form”)

What Rails 3.2 did about it

Before Rails 3.2 You used to have models that look like:

class User < ActiveRecord::Base


In Rails 3.2, now you have models that look like:

class User < ActiveRecord::Base
# Setup accessible (or protected) attributes for your model

attr_accessible :name, :email, :password, :password_confirmation

The attr_accessible block says, “hey, that ‘update this record with form data passed in’ feature? That’s only allowed to touch these fields”.

In this example the name, email, password, and password_confirmation, but is NOT allowed to edit anything else. Perhaps you store access keys, or middle name in the User model. You have to explicitly change these values, and not use the ‘update this record with form data’ shortcut.

Enough background: You said something about RailsAdmin?

RailsAdmin is a clever piece of software that automates creating an admin interface for your Rails site. You have simple access to create, read update or delete records in your site

… but how does it play with that attr_accessible thing?

Glad you asked

Normally it works very well. You can see all the fields in your model, and if the attribute is not attr_accessible, then RailsAdmin will display the value as read-only.

Read-only you say? But I have values I want admins to be able to edit, but I only want admins (not everyone) to edit them. How do I?

Returning to the access_keys example, you want admins to be able to edit this value in RailsAdmin. You don’t want to make that attr_accessible because then anyone can edit that setting (introducing a security hole).

The solution: attr_accessible + as (a user)

attr_accessible has an oft-forgot as parameter. This allows you say, “this is allowed, only if I I’m doing this as a ______________ user

Using this feature you can declare models like

class User < ActiveRecord::Base
# Setup accessible (or protected) attributes for your model

attr_accessible :name, :email, :password,
:password_confirmation, :as => [:default, :admin]

attr_accessible :access_key, as: :admin

access_key will only be changable when you’re doing something as the admin role, and the other attributes will be enable both for the default role and the admin role

Configuring RailsAdmin to use the admin role

In your config/initializers/rails_admin.rb file, add the following line in the RailsAdmin.config do |config| block

config.attr_accessible_role { :admin }


And that’s all there is to it: use as: :admin, and configure RailsAdmin to post things “as an admin”. Good to go!

Next Page »