The Internet Censorship Dashboard is a project that aggregates data fetched from the OONI API, to provide an overview of the current state of Internet Censorship experienced by users mainly in Southeast Asia. The current form was built a couple of years ago, and recently got funded to get it updated to work better with new APIs.
I would likely cross post the technical report it is finalized, so this post is mainly on the #TMI tech side of the project as a note2self.
So currently there are 3 different components of the project, namely
So there was a couple of security warnings from Github over the years regarding the frontend part of the project. However, upgrading dependencies was never straight forward in the frontend part for some reason. Therefore, in the current iteration, one of the main goals was to get everything updated properly.
First things first, we updated Yarn to Yarn 2
$ yarn set version berry
Also, after a lot of painful checking, we needed to re-enable the node-modules
mode instead of the new and shiny pnp mode. (Add the following line to .yarnrc.yml
)
nodeLinker: node-modules
I spend quite some time rebuilding the project front scratch, only to find out the new pnp mode was the reason why the project wouldn’t build (some dependencies were not made compatible at the time).
Then we needed to upgrade all the dependencies, first we gotta install a plugin
$ yarn plugin import interactive-tools
Then the upgrade-interactive
command was made available
$ yarn upgrade-interactive
There were quite a number of huge changes brought into ReactJS in recent years. One of them is the introduction of effect hooks. On the other hand, as I rely on Redux mostly for state management, they also provide some sort of hook similar to the one in ReactJS through Redux toolkit. So no more confusion on container widget and all the horrible class definition. Not only the markup/code for a widget is greatly simplified, a lot of boilerplate was gone too.
Also, no more complicated code to rebuild state in functional style, which greatly help with readability.
On the other hand, the backend part was having a slightly more involved reorganization. In the previous iteration I was still using my own venv wrapper script (and was transitioning to pipenv), so the dependencies was directly written inside the docker file. Now that I use Poetry in nearly everything, I ported my crawler and backend API into sub-projects managed by Poetry.
The main benefit of using Poetry is that the virtual environment can be reused while building the project. I used the opportunity to figure out how to do multistage building while learning how to properly build a container from a project managed by Poetry.
I ended up contributing an answer to the linked question above, which is also posted below
FROM python:3.9-slim as base
ENV PYTHONFAULTHANDLER=1 \
PYTHONHASHSEED=random \
PYTHONUNBUFFERED=1
RUN apt-get update && apt-get install -y gcc libffi-dev g++
WORKDIR /app
FROM base as builder
ENV PIP_DEFAULT_TIMEOUT=100 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_NO_CACHE_DIR=1 \
POETRY_VERSION=1.1.3
RUN pip install "poetry==$POETRY_VERSION"
RUN python -m venv /venv
COPY pyproject.toml poetry.lock ./
RUN . /venv/bin/activate && poetry install --no-dev --no-root
COPY . .
RUN . /venv/bin/activate && poetry build
FROM base as final
COPY --from=builder /venv /venv
COPY --from=builder /app/dist .
COPY docker-entrypoint.sh ./
RUN . /venv/bin/activate && pip install *.whl
CMD ["./docker-entrypoint.sh"]
This line was added
RUN . /venv/bin/activate && poetry install --no-dev --no-root
because without it Poetry would attempt to create a virtualenv in a folder name with a hash string. Therefore to override that behavior we needed to do the venv creation and activate it before calling a poetry install
.
The --no-dev
can be removed if a developer-tools friendly image is required (with formatter, lint and pytest). The benefit of using virtualenv in a multistage build became more apparent in the final image creation. Instead of having to run pip install again, we just had to copy the whole environment over.
Also, as Poetry made exporting a script easier, the docker-entrypoint.sh
could be just as simple as
#!/bin/sh
set -e
. /venv/bin/activate
exec crawler
Even for the hug web application, it was just
#!/bin/sh
set -e
. /venv/bin/activate
exec gunicorn -w 4 --bind 0.0.0.0:8000 backend.index:__hug_wsgi__
While it works for now, I am sure in a couple of years I would probably discover new things and apply them into this project given the opportunity (feel free to contact SinarProject if one is interested in sponsoring the project).