Reputation
Badges 1
25 × Eureka!Basically it hooks into any torch.save function (monkey patching in realtime)
Ohh, yes, we need to map the correct clearml.conf, sorry, try (I fixed both clearml.conf mapping and ,ssh folder mapping):
` docker run -t --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /home/dwhitena/clearml.conf:/root/clearml.conf -v /home/dwhitena/.ssh:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/arc...
Can you send the console output of this entire session please ?
Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed
Yes my bad 😞
Let's try again:
` docker run -it --gpus "device=1" -e CLEARML_WORKER_ID=Gandalf:gpu1 -e CLEARML_DOCKER_IMAGE=nvidia/cuda:11.4.0-devel-ubuntu18.04 -v /home/dwhitena/.git-credentials:/root/.git-credentials -v /home/dwhitena/.gitconfig:/root/.gitconfig -v /tmp/.clearml_agent.7rjdh80a.cfg:/root/clearml.conf -v /tmp/clearml_agent.ssh.ppsd9sze:/root/.ssh -v /home/dwhitena/.clearml/apt-cache.1:/var/cache/apt/archives -v /home/dwhitena/.clearml/pip-cache:/root/.cache/pip ...
Also in the same open docker session, can you try:$LOCAL_PYTHON -m clearml_agent execute --disable-monitoring --id <task_id_here>
Where the Task ID is one of the failed executions (only reset it before)
My typos are killing us, apologies :
change -t
to -it
it will make it interactive (i.e. you can use bash 🙂 )
CluelessFlamingo93 I would also fix the pip version requirements to:pip_version: ["<20.2 ; python_version < '3.10'", "<22.3 ; python_version >= '3.10'"]
BattyLion34 I have a theory, I think that any Task on the "default" queue qill fail if a Task is running on the "service" queue.
Could you create a toy Task that just print "." and sleeps for 5 seconds and then prints again.
Then while that Task is running, from the UI launch the Task that passed on the "default" queue. If my theory holds it should fail, then we will be getting somewhere 🙂
Hi RipeGoose2
when creating a task the default path is still there
What do you mean by "PATH" do you want to provide path for the config file? is it for trains
manual execution or the agent
?
hmm that would explain it failing
Let me take a look, what's the clearml-server version and clearml python version?
For example, opening a project or experiment page might take half a minute.
This implies mongodb performance issue
What's the size of the mongo DB?
You can see the class here:
https://github.com/allegroai/clearml/blob/9b962bae4b1ccc448e1807e1688fe193454c1da1/clearml/binding/frameworks/init.py#L52
Basically you do:
` def my_callback(load_or_save, model):
# type: (str, WeightsFileHandler.ModelInfo) -> WeightsFileHandler.ModelInfo
assert load_or_save not in ('load', 'save')
# do something
if skip:
return None
return model
WeightsFileHandler.add_pre_callback(my_callback) `
Yey 🙂 !
So now you can add some logic based on the model
object passed as the second argument (see WeightsFileHandler.ModelInfo)
The easiest is based on the model name see model.local_model_path
Found the issue, fix in the next RC (soon to be out)
RipeGoose2 you mean to have the preview html on S3 work as expected (i.e. click on it add credentials , open in a new tab) ?
Oh I do not think this is possible, this is really deep in a background thread.
That said we can sample the artifacts and re-register the html as a debug media:url = Task.current_task().artifacts['notebook preview'].url Task.current_task().get_logger().report_media('notebook', 'notebook', iteration=0, url=url)
Once the html is uploaded, it will keep updating on the same link so no need to keep registering the "debug media". wdyt?
Hi RoughTiger69
seems to not take the pacakges that are in the requirements.txt
The reason for not taking the entire python packages, it will most likely break when trying to run inside the agent.
The directly imported packages aill essentially pull their required packages, and thus create a stable env on the remote machine. The agent then will store the Entire env, as it assumes it will be able to fully replicate it the next time it runs.
If the "Installed Packages" section is empty...
RipeGoose2
HTML file is not a standalone and has some dependencies that require networking..
Really? I thought that when jupyter converts its own notebook it packages everything into a single html, no?
RipeGoose2 yes that will work 🙂
That said, we should probably fix the S3 credentials popup 😉
there is probably some way to make an S3 path open up in the browser by default
You should have a pop-up asking for credentials ...
Could you check that if you add the credentials in the profile page it works ?
That's the theory, I still see it is not there
Any chance you can zip the entire folder? I can't figure out what's missing, specifically "from config_files" , i.e. I have no packages nor file named config_files
Python3.8 I can quickly check, give me a minute
well at this point I'm not sure it is still essential, we have 3 run-modes offline, local-server, cloud-sever and this option made it work for all of them.. can be that it is not required anymore and its just legacy..
LOL, sure if you have so many setups, that makes sense 🙂
this is strange.. you ran it with the dataclass config I added?
Yes but I had to remove the:from config_files import cfg
and instead used:
` @hydra.main(config_path="config_files", config_name="confi...
Hi DepressedChimpanzee34
Why do you need to have the configuration added manually ? isn't the cleaml.conf easier ? If not I think OS environments are easier no? I run run above code, everything worked with no exception/warning... What is the try/except solves exactly ?
RipeGoose2 That sounds familiar. Could you test with the latest RC?pip install trains==0.16.4rc0
I have a client that runs clearml-session and i saw from the agent's logs that the installation of vscode fails.
That makes sense, it downloads the vscode in runtime, do you have an alternative location? or maybe it is easier to built a container with the vscode pre installed ?