




This article shares some history of DevOps testing in France during the last few years. How it looked like when I started in the industry in 2004 and almost 15 years later today.
Just a 1337c0d3 that somehow made it so far with the help of a lot of shadow hacking, now somewhat known as “DevOps” in some places …
In 2004 i was hacking for Jouve which had high tech digital printers which meant that they could make small batches of books, at the time where the market was filled with offset printers that made the minimal batch have to be 1500 or something.
Under the lead of Juan Arias, coaching from Jerome Saada, I coded a PHP website in 3 months where visitors could make their own quote to autopublish their book in small quantities.
Needless to say, it was with the help of hundreds of hackers answering my questions on internet. So, that explains why I still give back to the great internets and answer questions myself - and I’m still pretty active on git :)
Anyway, the last day, we send a zip folder with the source code and then it was online after a little time. I had absolutely no idea of what was happening being the email. After I was assigned on WinDev development and after a while decided to quit and start to practice exclusively with Open Source Software and their community !
In 2005 it was the time where Dedibox started to offer dedicated servers for 15 bucks a month or something. I was already using Linux and compiling kernels because some pirate told me to use Debian, and then to not try Gentoo Linux because I was a noob, I said oh ok I will Gentoo Linux you !!!!!
At some point with the help of @neofutur (who is now with YourLabs !), and another hacker called m3t4ph4z3, and after spending sometimes a week installing a Gentoo Linux server, I had a little hosting business going on hardened gentoo with grsecurity and linux vserver, which meant sometimes merging the patches myself to get latest version of Linux+GrSec+VServer … at 19 years old without any college degree it sure meant lots of determination even though I learned C at 9 years old, I was honnestly a big noooob :D
I had a build server building binpkgs for three distros derived from Gentoo:
In 2009 ended up refactoring my bash scripts into a mini framework with a bunch of modules (including one for cracking wifis lawls) and of course: one for dealing with my little VPS, that repo had only one pull request and wow, what a PR, I guess I had no idea how to comment back then and just merged it when I could without understanding most of it :D
The first deployment integration testing framework I used was test-kitchen when I was doing Chef code at a hosting provider back in early 2014s.
The developer experience was amazing with Chef cookbooks, they have rubocop, test-kitchen, and also cookbooks can be unit tested with mocks and the like. After a month, I hadn’t find a good way to orchestrate Chef in Chef for some reason and at that time our coworkers suggested using chef-metal but not yet because it wasn’t ready.
That’s how I ended using Ansible to automate deployment for all OpenStack services + my own (NATaas, Firewallaas, LBaas, VPNaas were the 4 services I had developed based on Numergy’s specifications for high security). In my first month in Ansible, I had coded the infra based on the configuration that other level 3 were maintaining on a wiki.
Unfortunnately, the rest of the level 3 team didn’t keep up and decided to veto the first of a month-full planning of deployments that we had baked with the level 2 team. Also there was a policy to use Chef despite that nobody could orchestrate it as well as a Human being or Ansible would.
Ansible really gave me the passion for automation, and I started automating everything even for my own laptop, I would do everything in a playbook and apply it. Today I recon this was large overkill, but that was so much fun I can’t regret it !
Anyway, a couple of years later they invited me to a party ! and level 2 friend told me they had acheived to have the same software versions and configurations on both production and staging … thanks to the playbooks ! you can’t imagine how much joy I had for them to make it to have a “preprod iso prod” !
Another anecdote from this wicked party about shadow IT. A friend from sales told me they loved a software that I did in “shadow” back then, which means without receiving a specific order for that software, it just automated my own work … a command line to manage the customer infrastructure, users, projects/tenants. Like the 4 OpenStack agents I developed there (NATaaS, FWaaS, LBaaS, VPNaaS), it was inheriting from the OpenStack Python API like a framework, with our overrides, to create/update/drop complete tenants with a single command. And the drop one took care of the dependencies in the appropriate order … (tenant networks, interfaces, vms…).
In late 2015 I joined PeopleDoc, one of the most hacker-friendly companies in France, with the mission to maintain a core Continuous Delivery pipeline given a SaltStack recipe repository. The stack: jenkins, saltstack, LXC which I was also in charge of training the hackers in the company for.
After a couple of months we had a CI server and a pipeline to test every role against LXC, and I kept adding optimization and documentation over time. I believe we took the situation we had as far as possible, with notable help from a dear friend and former collegue, one of the best hackers I know. That means, we had wrapped SaltStack in our own command to ensure there would be no false-positive nor false-negatives in production pipelines and reached 99.9% of success in that, “the truthness of pipeline outcome”.
But we were somewhat still blocked for several reasons:
So, I investigated amongst happy salt users I had the chance to meet at conferences. One demonstrated code they have to make an HTTP API to orchestrate salt-runs, that had shell-injection vulnerabilites (but they told me it was ok because the API consumers will sanitize input for them …)
Anyway, back at PeopleDoc I proposed to write a simple web service to monitor Salt runs and orchestrate them, because that seemed to be the only way to move forward. We didn’t proceed with this proposal, and instead started orchestrating our salt-runs with Ansible itself …
An interresting anecdote is that one saltstack core-dev I pushed to hire as a contractor was maintaining his PaaS in Salt, but he was still there when we started orchestrating our salt in Ansible …
Needless to say, he’s migrated what he most maintains to Ansible and appears to have left the SaltStack core development team.
So, at this point we kept on moving forward to adopting ansible: I would visit each product team one by one and pair code with one of them to migrate their salt call from the company-wide recipe repository into their own project repository.
Of course we wanted to keep having deployment pipelines over several environments. There was no test tool at the time, well, except Molecule that was 1 month old or something like that. We already had an LXC infrastructure to actually do the work, and the official Ansible Testing Strategies guide describes the concepts for testing Ansible with Ansible, but does not demonstrate a specific pattern, except in Ansible’s own test code.
So I put together a quick ansible role that would create system containers with LXC on the fly, and another one for lxd that works as such, with a test.yml file containing:
hosts: localhost become: true become_user: root become_method: sudo pre_tasks:
hosts: testboot roles:
Note that instead of “yourroletotest” you can have a relative path, such as “.” if the above test.yml playbook is at the root of a role to test.
In the talk about Molecule they make the following list of pros and cons for Testing Ansible with Ansible (minute 18 of the talk):
Benefits:
Issues:
I have to say that I don’t find it very persuasive to be honest.
For me, your deployment code should always try to prevent failure, or fail without leaving the infra broken. So there’s two things:
In this case two solutions are possible, implement another engine, or use different modules from the engine to validate previous module execution.
Of course, there are still chances that all modules are bugged, but there’s less chance to have a bug overall because there is less code involved with only Ansible.
That’s specially a why deployment should repeat healthcheck until they work and display logs anyway.
BTW, I love the feature of displaying logs after the healthchecks, this has saved me so much time because most of the time the error is in the logs, so having them in your deployment job output is awesome. I suggest you try and let me know how that works for you ;)
The cost of developing Ansible is un-cuttable, and so is the cost of supporting containerized deployment since that’s what we used for developer machines.
As such, running the playbooks on containerized hosts in CI has an extra cost of 0. It’s all about repeating the interface you have when deploying in production, if it’s an SSH server then an SSH server in LXD will do fine.
To compare the cost of using Molecule overall and testing Ansible with Ansible, let’s overview the SLOC number for each solution.
Consider a hundred SLOCs provisionner role:
Well, there’s just requirement for the boot role method: have ansible and lxc or lxd working, for which we had a standalone bash script that works both to setup a developer machine, a control host, or to provision CI, with custom ansible versions or branches, and that contained so dirty optimisations that could easily have been dropped by the next maintainer, in 239 SLOC:
$ cloc ansible-setup
1 text file.
1 unique file.
0 files ignored.
Another requirement is to have a functional ssh-agent, but that’s something for another mini role peopledoc.ssh-agent in 90 SLOCs:
Compare to Molecule:
Talking about development cost, we’re comparing a 180+239+90=509 SLOC solution composed of three loosely coupled components that made sense on their own, with a tightly coupled 6286 SLOC solution, that’s an x12 ratio for the test stack, without counting testinfra.
So from that perspective it seems testing ansible with ansible is still relevant.
But that’s only for the testing tool. We have the chance that Molecule was added to that provisioning role that was used both for development and automated testing in a little role: peopledoc.boot-lxc, we will study this use case too.
This case is a little bit particular because we’re testing the provisionner itself, this is what we had, a test.yml:
hosts: localhost become: true become_user: root pre_tasks:
hosts: testboot tasks:
So, at the end of execution we test that we can SSH on the host that we created, and that we can find the word “inception” that comes from the “grep inception” itself. Then, test.yml was removed and instead we can see in the repo, a whole molecule/default directory:
$ tree . ├── create.yml ├── destroy.yml ├── INSTALL.rst ├── molecule.yml ├── playbook.yml ├── prepare.yml ├── setup_dnsmasq.sh └── tests ├── canary └── test_default.py
create.yml
configures molecule:
destroy.yml
as well:
The molecule.yml file defines a couple of linter calls that could as well just have been added to .travis-ci.yml:
dependency: name: galaxy driver: name: delegated options: ansible_connection_options: connection: local lint: name: yamllint enabled: False platforms:
In playbook.yml, we find the test.yml code again that shows the role tasks execution:
add_host: name: delegated-travis-instance ansible_python_interpreter: ‘/home/travis/virtualenv/python2.7/bin/python’ changed_when: False
add_host: name: testboot.lxc groups: testboot lxc_container_config:
role: ansible-boot-lxc
A whole new prepare.yml file is also present to replace ansible-setup which had the advantage of being able to test against a specific branch of ansible in a virtualenv like tox.
This allowed to have test matrix with multiple ansible versions including devel. I prefer to test against current stable and master, so that you can fix BC breaks as they arrive in master, and you are ready when the new release is out.
Note that the new prepare.yml also calls a new bash script to finish LXC setup.
Next, there is a test directory, which contains a test written in Python:
import os
import yaml
import testinfra
def test_lxc_container():
# testinfra uses a plain file for Ansible inventory: create it
updated_inventory = '%s.new.yml' % os.environ['MOLECULE_INVENTORY_FILE']
inventory = {
'all': {
'children': {
'lxc': {
'hosts': {
'testboot.lxc': {
'ansible_user': 'root'
}
}
},
'ungrouped': {}
}
},
}
with open(updated_inventory, 'w') as output:
output.write(yaml.dump(inventory))
host = testinfra.get_host('ansible://testboot.lxc?ansible_inventory=%s' %
updated_inventory)
f = host.file('/srv/tests/canary')
assert f.exists
assert f.contains("inception")
assert not f.contains("not there")
So it also works, all this to replace the following maybe was a bit overkill:
hosts: localhost pre_tasks:
hosts: testboot tasks:
My golden rules for ansible testing, is more about a practice than a tool, have a deployment in this order:
When you have this, testing on containers, with Molecule, or Ansible, it doesn’t matter as long as you take preventive action when you discover new errors during the life of your pipeline.
Molecule is also pretty nice, so is ansible-container, but at the end of the day only the practice of CI/CD on CI/CD code itself matters.
So, having an Open Source test-kitchen in Python is cool of course, but they are going to have to pull more tricks to attract people that weren’t using test-kitchen despite that they had no problem with Ruby.
In 2017 I decided I wanted to use Ansible as a container orchestrator to deal with immutable images as suggested by Bruno Dupuis (dear friend and former PeopleDoc collegue) rather than maintain live hosts: Pet vs. Cattle.
I started using Ansible as an orchestrator for Docker, which means that all existing roles that were setting up a host, with config files became useless: it was time to deploy containers with env vars as much as possible.
After Summer 2018 I had rewriten the playbooks in an Open Source version. It’s still pretty early and most roles have not been patched for the update yet. But the update makes it a lot cleaner, we have an ansible CLI wrapper that deploys roles with like playlabs install docker,nginx,netdata,firewall user@host somevar=foo
.
From the CLI wrapper emerged two core products, processmanager a Python 3 POSIX compatible alternative to pexpect, and clitoo a simple CLI to execute python callabs ie. clitoo your.mod:callback.name yourkwarg=1 arg0 --foo --bar=test
to call your.mod:callback.name("arg0", yourkwarg=1)
with a clitoo.context.args
of ['foo']
and a clitoo.context.kwargs
of {'bar': 'test'}
.
From clitoo emerged another product call djcli, which is simply a full featured CRUD for Django, that does not require any installation besides pip install … and of course it’s also part of the yourlabs/python distribution so we can always have it in CI.
playlabs init user@host
for example will setup my user, with ssh key etc … all automaticallyplaylabs install
to deploy roles, playlabs deploy
to execute the “project” role, playlabs backup/restore/logs …Note that playlabs is not ready for public use yet, it deserves another epic iteration on tests and documentation ! it’s a lot of work and keeps challenging me to pull new tricks ! I want to get rid of pexpect in favor of processmanager, remove the CLI parser in it and call clitoo instead just like djcli which is also unfinished because … well I have to deliver just a bit more features on the webdev part I’m currently doing get that backlog back under control…
Well, Kubernetes of course ! It won’t solve all your problems, but i find that the “Mastering Kubernetes” book gives a good idea of what k8s is and what it is not.
We have our own containerized distribution of Python built on kubernetes that includes ansible, as well as Playlabs, our distribution of Ansible (wip!!!). This integrates very well with any CI that runs commands in containers, such as GitLab-CI, CircleCI, Drone-CI …
Playlabs also supports k8s, and has such as a playlabs install k8s
command to automate creation of users/certificates/project namespaces based on a versioned inventory … so far validated by what I happen to have read in “Mastering Kubernetes”, I’m glad to finnaly be on the right path to not ever write HA/ZDD orchestration in Ansible again, but use Ansible for infra network automation only, k8s being part of that network …
Looking forward to write the next chapter of that story in 2019 and propose an alternative to k8s for hobbyists that should serve me during the following 15 years in my practice of CD ;)
Yours Sincerely
Shadow Hacking to serve you since 2005
With LOVE
From POITOU CHARENTE !