Dataflow - No such file or directory: 'ffprobe': 'ffprobe'

Hello!

I am using a Python library called Pydub to work with audio. It works very well in Colab Enterprise, but when I try to run it in a Dataflow job, I get the following error:

No such file or directory: 'ffprobe': 'ffprobe'

After searching on the internet and in the issues on the official repository of this library (here is the link: https://github.com/jiaaro/pydub/issues?page=3&q=not+found), I saw that the recommended solution is to add /usr/bin/ffprobe to a PATH variable.

Given that the Dataflow flex template works with a Dockerfile, I am adding the ffprobe path to the PATH environment variable in the Dockerfile, at build time. However, I still get the same error message.

What else can I do to fix this error?

This is my Dockerfile:

FROM gcr.io/dataflow-templates-base/python3-template-launcher-base

ARG WORKDIR=/template
RUN mkdir -p ${WORKDIR}
WORKDIR ${WORKDIR}

ARG PYTHON_PY_FILE=insights_interpreter.py

COPY . .

ENV PYTHONPATH ${WORKDIR}

ENV FLEX_TEMPLATE_PYTHON_PY_FILE="${WORKDIR}/${PYTHON_PY_FILE}"
ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="${WORKDIR}/requirements.txt"
ENV FLEX_TEMPLATE_PYTHON_SETUP_FILE="${WORKDIR}/setup.py"

RUN apt-get update \
&& apt-get install ffmpeg libavcodec-extra libav-tools -y \
&& pip install --upgrade pip \
&& pip install google-cloud-texttospeech pydub \
# Download the requirements to speed up launching the Dataflow job.
&& pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE

ENV PATH="/usr/bin/ffprobe:$PATH"

RUN echo $PATH # Verification

# Since we already downloaded all the dependencies, there's no need to rebuild everything.
ENV PIP_NO_DEPS=True

ENTRYPOINT ["/opt/google/dataflow/python_template_launcher"]

 

I read you in the comments.

--
Best regards
David Regalado
Web | Linkedin | Cloudskillsboost

0 1 497
1 REPLY 1

I don't have a solution but I have some thoughts.  In your post of your Dockerenv, I see you coded:

ENV PATH="/usr/bin/ffprobe:$PATH"

A couple of thoughts on this.  Looking here, the syntax appears to be:

ENV PATH "/usr/bin/ffprobe:$PATH"

I don't know if the "=" throws something off.

The other thing is ... is the ffprobe command in /usr/bin?  You point to Linux executables by including the containing directory in the path, not the path to the executable itself.  Might this be better ...

ENV PATH "/usr/bin:$PATH"

I'd also suggest that you spin up a local copy of the image but cause it to run /bin/bash.  Open a shell to the inside of the container and go find the ffprobe executable.  Convince yourself that it is indeed present inside the container.