Problem Faced:
I was trying to reduce the docker size of a python application. In those one of the methods, i was struck by an idea of remove pip caching. So was trying to reduce the size of the docker image in total.
What is pip caching ?
As per docs,
When making any HTTP request, pip will first check its local cache to determine if it has a suitable response stored for that request which has not expired. If it does then it returns that response and doesn’t re-download the content.
If it has a response stored but it has expired, then it will attempt to make a conditional request to refresh the cache which will either return an empty response telling pip to simply use the cached item (and refresh the expiration timer) or it will return a whole new response which pip can then store in the cache.
While this cache attempts to minimize network activity, it does not prevent network access altogether. If you want a local install solution that circumvents accessing PyPI, see Installing from local packages.
Changed in version 23.3: A new cache format is now used, stored in a directory called http-v2
(see below for this directory’s location). Previously this cache was stored in a directory called http
in the main cache directory. If you have completely switched to newer versions of pip
, you may wish to delete the old directory.
These files will get stored inside
%LocalAppData%\pip\Cache
- on Windows~/Library/Caches/pip
- on Mac~/.cache/pip
- on Linux
Do we need these cache inside a docker image ?
These caches are stored, such that for future installations, instead of going to hit the network, we can get it from local (to reduce network calls). But as we know, Docker container is a short lived, immutable box
where we wont be installing it often. So we don't need it. We can remove it and lets see whether it reduces the size.
How to remove the cache during the installation ?
In PIP, we can proved --no-cache-dir
flag to avoid storing the files in the image/container.
Lets see with an example,
requirements.txt
Flask==2.0.1
gunicorn==20.1.0
Dockerfile
# Use a minimal base image
FROM python:3.8-slim
# Set the working directory
WORKDIR /app
# Copy the requirements file
COPY requirements.txt .
# Install dependencies
RUN pip install --no-cache-dir -r requirements.txt
CMD ["hostname"]
ensures that the --no-cache-dir
option is passed to pip
during the installation of the Python dependencies listed in requirements.txt
.
Comparison: with and without --no-cache-dir
Tried building the image with and without --no-cache-dir. On checking the below comparison,
It seems, we don't have any drastic reduction in the image size. It adds a minimum support to lean docker image.
But, by using --no-cache-dir
, you instruct pip
not to use or create a cache directory, which can be useful in the context of Docker builds to avoid potential issues related to cached packages and ensure a more reproducible build.
Final Thoughts
By using --no-cache-dir helps a little in image size.
Its helpful in producing more reproducible builds, helps in resolving one of the anti-pattern.