this is the installation for a locally used package in the task fyi, so it's imported from the training script
Hi NaughtyFish36
c++ module fails to import, anyone have any insight? required c++ compilers seem to be installed on the docker container.
Can you provide log for the failed Task?
BTW: if you need build-essentials
you can add it as the Task startup scriptapt-get install build-essentials
AgitatedDove14 Yeah I added it into the initial bash script to test whether that would fix the issue. The task is created using the SDK in the model training script i.e. Task.init()
. I was under the impression the local package would be installed due to replication of the environment I initialised the task under, however I've tried the add_requirements("leap")
function and just seem to be getting an "isadirectory" error? I also tried manually adding leap==0.4.1
in the task UI which didn't work. The environment in the logs does show that leap is being installed potentially from a cache? - leap @ file:///opt/keras-hannd
AgitatedDove14 fyi I do install build-essential manually in the logs I just sent you, and it still fails
build-essentials didn't work unfortunately through installing it at startup
AgitatedDove14 Unfortunately that didn't work either, I agree that should run the setup.py correctly but something still seems to be breaking, I've sent you the most recent logs
So could it be that pip install --no-deps .
is the missing issue ?
what happens if you add to the installed packages "/opt/keras-hannd" ?
I think it is to do with the build-essential
issue. Let me talk you through the process:
Run a docker image locally called keras-hannd-cml (i.e. the one that is then being used by the agent as the base image later on) Run the training script to register the task, which works fine, all dependencies work i.e. the c++ packages are working correctly on that container Execute the task on an agent running in docker mode with the same image that the task was registered with i.e. keras-hannd-cml. Task fails since it's missing the C++ module somehow
i've sent you the most recent logs. Can you see anything incorrect with the above work process?
We haveext_modules=[ Extension( 'leap.learn.data_tools.file_io.extio', sources=['leap/learn/data_tools/file_io/extio.cpp'], depends=['leap/learn/data_tools/file_io/samples.h'], define_macros=[('NPY_NO_DEPRECATED_API', 'NPY_1_9_API_VERSION')], extra_compile_args=['-std=c++11'], libraries=['rt'] if platform.system() == 'Linux' else [], include_dirs=[GetNumpyIncludeDirectoryLazy()], optional=True ),
in our setup.py
which I believe isn't being built correctly when the task is running on the agent.
Manually I was installing the leap
package through python -m pip install .
when building the docker container. My thinking was that when the tasks environment was then replicated on the agent, the leap
package would be installed correctly through it's setup.py
with the Extension
which I've listed above
AgitatedDove14 DM's you the log file for the failed task. I have tried using a task startup script to install G++, gcc etc. but it didn't seem to work, I'll try build-essentials
too. I'm also interested in the way that the environments are set up in clearml, I read in the docs that the task looks for a requirements.txt
file to construct the env, but does this prevent a local package being built correctly i.e. through setup.py
when running a remote task?
So I see this in the build, which means it works , and compiles, what is missing ?
` Building wheels for collected packages: leap
Building wheel for leap (setup.py) ... [?25l- \ |
1667848450770 UH-LPT371:0 DEBUG / - \ | / - done
[?25h Created wheel for leap: filename=leap-0.4.1-cp38-cp38-linux_x86_64.whl size=1052746 sha256=1dcffa8da97522b2611f7b3e18ef4847f8938610180132a75fd9369f7cbcf0b6
Stored in directory: /root/.cache/pip/wheels/b4/0c/2c/37102da47f10c22620075914c8bb4a9a2b1f858263021ca437
Successfully built leap
Installing collected packages: leap
Attempting uninstall: leap
Found existing installation: leap 0.4.1
Not uninstalling leap at /usr/local/lib/python3.8/dist-packages, outside environment /root/.clearml/venvs-builds/3.8
Can't uninstall 'leap'. No files were found to uninstall.
Successfully installed leap-0.4.1 `
Manually I was installing the
leap
package through
python -m pip install .
when building the docker container.
NaughtyFish36 what happnes if you add to your "installed packages" /opt/keras-hannd
? This should translate to "pip install /opt/keras-hannd" which seems like exactly what you want, no ?
AgitatedDove14 . sorry what .so are you referring to here? I can't see that in the logs. The docker image installs the package via first installing requirements i.e. RUN pip install --no-cache-dir -r /tmp/requirements.txt
the repo is copied locally, and then leap is installed through RUN cd /opt/keras-hannd && pip install --no-deps .
.
containing the
Extension
module
Not sure I follow, what is the Extension module ? what were you running manually that is not just pip install /opt/keras-hannd
?
AgitatedDove14 The issue seems to be that the setup.py
containing the Extension
module we need isn't being run in the clearml virtual environment within the docker container. What is the correct process for installing local packages so they're replicated correctly when running remotely on an agent?
NaughtyFish36
No module named 'leap.learn.data_tools.merge_data.merge_data'
This seems to be the error but I cannot see leap
in the installed packages , Notice that if the Task has "Installed Packages" section then the agent will use that Not the "requirements.txt" , Only if this section is Empty it will revert to the "requirements.txt" in the repo.
How did you create the Task in the first place?
I see that you added "leap" into the initial bashscript, actually you should add it into the requirements withTask.add_requirements("leap") task = Task.add_requirements
The point is, " leap"
is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
function and just seem to be getting an "isadirectory" error?
Can you post here what you are getting ? which clearml version are you using ?!
also tried manually adding
leap==0.4.1
in the task UI which didn't work.
That has to work, if it did not, can you send the log for the failed Task (or the Task that did not install it)?
The environment in the logs does show that leap is being installed potentially from a cache?
- leap @ file:///opt/keras-hannd
This is true I have double checked your logs and you are correct, it seems to be installed
So I do not get how come you get, ModuleNotFoundError: No module named 'leap.learn.data_tools.merge_data.merge_data'
Could it be you are installing the wrong version? or maybe the wrong package?
is this is the leap you need? where do you install it from?
lastly, is this still relates to the " build-essentials" issue? it seems that we are talking about a whole diff issue?!