




The last article demonstrated how we could simply make any https flow going out a virtual bridge network interface through mitmproxy to cache anything.
In this article, we’ll focus on the logic to “cache anything” from within our mitmproxy script.
We’d like each fetched file to be stored in a directory as is to make it easy to tweak the cached content:
So, for example:
# for apt
http://http.debian.net/debian/pool/main/r/rtmpdump/librtmp-dev_2.4+20111222.git4e06e21-1_amd64.deb
# would be:
cache_directory/http.debian.net/debian/pool/main/r/rtmpdump/librtmp-dev_2.4+20111222.git4e06e21-1_amd64.deb
# for pip
https://pypi.python.org/simple/python-memcached
# would be
cache_directory/simple/python-memcached/index.html
# for elasticsearch plugin installer
https://github.com/mobz/elasticsearch-head/archive/master.zip
# would be
cache_directory/github.com/mobz/elasticsearch-head/archive/master.zip
We also want HTTP error codes to be reproduced. A quick hackish way to allow our PoC to do that is to symlink the file to the error code. For example:
# for apt
http://debian.saltstack.com/debian/dists/wheezy-saltstack-2014-07/main/i18n/Translation-en.lzma
# would be a symlink to the non-existent 404 file
cache_directory/debian.saltstack.com/debian/dists/wheezy-saltstack-2014-07/main/i18n/Translation-en.lzma -> 404
So, we’re going to implement:
/index.html
,There you go:
I’ve noticed that some times mitmproxy hangs during an apt-get update
call. I’m waiting for this to be reproducible again to debug it, but this
might have to do with the IP stack.
It seems like all our tests are passing with this (our job matrix - deployment of 13 major products - is still running at the time of writing).
I like this kind of solution because it’s:
The downsides are:
I think I’m going to continue on this path to make an awesome generic caching proxy for CI purposes.