Workflow Protocols¶
These are guidelines that the Connext team uses internally to coordinate our development. Suggestions for improvements are welcome!
Goals¶
Coordinate several devs working across several time zones to maximize energy spent on useful stuff & minimize wasted work.
“Useful” work to be maximized:
Building new features
fixing existing bugs w/out introducing new ones
Reviewing & providing feedback on work other teammates have done
“Wasteful” work to be minimized:
Putting out fires caused by new bugs making it to prod
Re-doing any work that’s already been done in another branch
Fixing new bugs that someone else pushed/merged w/out noticing (the person who introduces a bug probably has more context and is the one who can fix it the fastest)
Resolving ugly merge conflicts
Protocols¶
Merging code to sync up development (aka CI)¶
Merge feature -> staging frequently. CI must pass before merging and either CD must pass or someone else must have reviewed your code ie
CI && (CD || Review)If staging CD is failing, branch off to fix it (ie don’t push commits directly to staging). These CD-hotfix branches should be merged into staging as soon as their ready w/out necessarily waiting for a review.
Code reviews: at the start of each day, everyone should review and merge other people’s “pending review” PRs. Once the “pending review” queue is cleared (or only contains our own PRs), then we’re free to work on our own feature branches for the day.
Deploying to staging/prod (aka CD)¶
Note: Staging CD tests a local copy of the code (doesn’t use any
@connext/*npm packages). Master CD tests/deploys connext packages from the npm registry.run
bash ops/npm-publish.shafter staging CD has passed & before merging into master to trigger deploying to prod.Don’t manually merge staging into master, use this helper script:
bash ops/deploy-indra.sh. This script will:Detect & abort if any merge conflicts exist between staging & master.
Ask you for a version number & run some sanity checks to make sure this version is valid
Update the version in the root package.json
Merge staging into master & then fast-forward merge master into staging so they’re at the same point.
Tag & push this new merge commit
Once master has been pushed, the CD pipeline will automatically be activated.
Handling Backwards Incompatibilities¶
A set of checks are in place to partially check for backwards incompatible changes. As we build out our integration tests, future checks will be more complete.
There’s a version hard-coded in Makefile (look for a variable called
backwards_compatible_version). If this variable is set to2.3.20, for example then the CD test suite for indra v2.4.5 will test it against both of:indra_test_runner:2.3.20indra_test_runner:2.4.5
To get the backwards compatibility check to pass after introducing a breaking change, you must change the
backwards_compatible_versionin Makefile to a compatible version. Ideally, you’d increment the major version eg 2.4.4 ->3.0.0 and then set thebackwards_compatible_versionto 3.0.0 so that, moving forward, all 3.x.x versions are tested for compatibility w 3.0.0. (an automated calculation is commented out but later could be activated to always run test-runner version a.0.0 for any version a.b.c)Continuing the wild-west-style version management: if you want to increment 2.4.4 -> 2.4.5 after introducing a breaking change, you can also set
backwards_compatible_versionto be 2.4.5.
Rolling back prod¶
Situation: recent prod release is broken & we want to roll-back to an old version
ssh onto prod server &
cd indragit fetch --allgit checkout <target-version>make restart-prodImportant note: restarting locally on the server (instead of automatically deploying via CD) means the repo’s secrets (see github -> indra -> settings -> secrets) don’t get injected. In this situation, the env vars used come from~/.bashrcinstead so verify that this file has the env vars we need.
Hot-fixing prod while staging is broken (via CD)¶
Situation: staging needs some repairs before it’s ready to be deployed but there’s something urgent we need to fix in prod.
Create hotfix branch directly off of master & develop/cherry-pick the hotfix here
Push hotfix branch & make sure it passes all CI/CD steps
Merge hotfix branch into master if it looks good.
Wait ~20 mins for CD to deploy change to prod.
Check out hotfix & make sure it does what you want. See rollback instructions if it makes things more broken.
Hot-fixing (skip CD)¶
Situation: we need to get a change deployed to prod as quickly as possible.
Write hotfix and push change to master.
Build/push images: If you have a beefy laptop & fast internet you can do this manually via
make prod && make push-release(or usemake push-stagingto hotfix staging). Otherwise, it’s probably faster to just wait for CD to build/push stuff as part of the build step.Once images are pushed:
ssh onto the target server & cd into indra clone
git fetch--``allgit checkout <target commit/release>make restart-prod
Check to make sure your hotfix got deployed & does what we expect. Be aware that CD is probably still testing this commit in the background and it will be redeployed maybe 10 mins after you manually deployed it.
Tagging docker images¶
Docker images are tagged & pushed automatically during CD, you shouldn’t ever have to push/pull images manually. There are 3 important image tag types:
Commit tags eg
3dffdc17, these are built & pushed during the first step of either feature or staging CD. Later steps of feature/staging CD will pull & run tests against these images (and deploy them if staging and tests pass). These are built using local code (ie local modules aren’t pulled from the npm registry).Release tags eg
1.2.3, these are built & pushed during the first step of master CD and then tested/deployed during later steps. These images use code from the npm registry, not local code.latesttag is always pushed when pushing either commit or release tagged images, these latest images are only used in CD as a cache to make building go faster so if they’re corrupted then everything’ll be fine but building will take longer. These images will be overwritten frequently so don’t pull them expecting anything specific (if you want specific images, use commit-tagged ones). Local images built & run bymake startwill be taggedlatestso beware: they will be overwritten if youmake pull-latest(which shouldn’t ever be necessary to do during normal dev workflows) Under the hood: the helper scriptsops/push-imagesandops/pull-imageare used bymakecommands, they:Both accept one arg: the version to push or pull eg
3dffdc17or1.2.3Contain a list of all the images to push/pull (eg node, database, proxy, etc)
Push latest images too whenever we ask to push commit or release tagged images
Protect us from overwriting an already pushed image eg can’t push images tagged
3dffdc17more than once.
Setting up CI/CD¶
All auto-deployment config can be found in .github/workflows/. See GitHub Actions Documentation for docs.
The auto-deployer needs an ssh key so it can login to the prod server, we should use a fresh one instead of re-using existing ssh keys. Run this command to generate a new ssh key pair: ssh-keygen -t rsa -b 4096 -C "autodeployer" -m pem -f .ssh/autodeployer. The ops/setup-ubuntu.sh script will look for a public key called $HOME/.ssh/autodeployer.pub and try to add it to the server’s ~/.ssh/authorized_keys. If we ever change the autodeployer’s ssh key, we can add the new keys to our servers by re-running bash ops/setup-ubuntu.sh $SERVER_IP.
Env vars controlling CD are store in: GitHub -> Indra Repo -> Settings -> Secrets. The following env vars are used:
DOCKER_USER&DOCKER_PASSWORD: Login credentials for someone with push access to the docker repository specified by theregistryvars at the top of the Makefile &ops/start-prod.sh.INDRA_ADMIN_TOKEN: an admin token for controlling access to sensitive parts of the dashboard.INDRA_AWS_ACCESS_KEY_ID&INDRA_AWS_SECRET_ACCESS_KEY: Login credentials for an AWS storage repo, if provided the database will automatically send DB snapshots to AWS.INDRA_LOGDNA_KEY: Credentials for LogDNA. If provided, all logs will be sent to this service for further analysis.MAINNET_ETH_PROVIDER&RINKEBY_ETH_PROVIDER: Ethereum RPC Urls eg Alchemy or Infura that let us read from/write to blockchain.SSH_KEY: The autodeployer private ssh key