What if I register the artifact manually?
task.upload_artifact('local folder', artifact_object=' ')This one should be quite quick, it's updating the experiment
if fails during
add_step
stage for the very first step, because
task_overrides
contains invalid keys
I see, yes I guess it it makes sense to mark the pipeline as Failed 🙂
Could you add a GitHub issue on this behavior, so we do not miss it ?
Hi NonchalantGiraffe17
You mean this documentation?
https://clear.ml/docs/latest/docs/references/api/tasks#post-tasksclone
However, the pipeline experiment is not visible in the project experiment list.
I mean press on the "full details" in the pipeline page
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init and when you are done call Task.close . This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent which could help you spin those experiments on as many machines as you l...
I believe AnxiousSeal95 is.
ElatedFish50 any specific reason for the question?
Hi @<1523702969063706624:profile|PoisedShark13>
However, INSTALLED PACKAGES of my task is misses many of installed packages (any idea why?)
It automatically detects the directly imported packages, literally analyzing your code base and looking for imports
The derivative packages (i.e. the one that any of the "main" packages need, will be listed after the first time the agent installs everything)
If something specific is missing, you can manually add it with:
Task.add_requiremen...
I can share some code
Please do 🙂
Thank you @<1689446563463565312:profile|SmallTurkey79> !!!
Hmm that is a good question, are you mounting the clearml.conf somehow ?
repeat it until they are all dead 🙂
Hi ApprehensiveFox95
I think this is what you are looking for:step1 = Task.create( project_name='examples', task_name='pipeline step 1 dataset artifact', repo=' ` ',
working_directory='examples/pipeline',
script='step1_dataset_artifact.py',
docker='nvcr.io/nvidia/pytorch:20.11-py3'
).id
step2 = Task.create(
project_name='examples', task_name='pipeline step 2 process dataset',
repo=' ',
working_directory='examples/pipeline',
script='step2_data_pr...
No worries, and I will make sure we output a warning if section names are not used 🙂
Okay, let's take a step back and I'll explain how things work.
When running the code (initially) and calling Task.init
A new experiment is created on the server, it automatically stores the git repo link, commit ID, and the local uncommitted changes . these are all stored on the experiment in the server.
Now assume the trains-agent is running on a different machine (which is always the case even if it is actually on the same machine).
The trains-agent will create a new virtual-environmen...
Wait I might be completely off.
Is this line "hangs" ?
task.execute_remotely(..., exit_process=True)
OutrageousGrasshopper93 is "--gpus all" working ?
Okay, I'll make sure we always qoute " , since it seems to work either way.
We will release an RC soon, with this fix.
Sounds good?
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
Cloud Access section is in the
Profile
page.
Any storage credentials (S3 for example) are only stored on the client side (never the trains-server), this is the reason we need to configure them in the trains.conf. When the browser needs to access those URL's (downloading an artifact) it also needs the secret/key, it automatically display a popup requesting them, and will store them in this section. Notice they are stored on the browser session (as a cookie).
BTW:
Error response from daemon: cannot set both Count and DeviceIDs on device request.
Googling it points to a docker issue (which makes sense considering):
https://github.com/NVIDIA/nvidia-docker/issues/1026
What is the host OS?
Hi OutrageousGrasshopper93
When the Task is executed on a worker, the presence of spaces breaks the URLs and from the UI I cannot access to the resources on the bucket
You are saying the URLs generated in a remote execution are "broken" and on local execution are working, even though it is the same project/task name ?
Maybe we should rename it?! it actually creates a Task but will not auto connect it...
Hmm seems like everything is working, can you check in the UI if you see the serving session ID in the DevOps project? maybe there are two, and you configured one an dthe docker-compose is running another ?
Hmm, Notice that it does store sym links to parent data versions (to save on multiple copies of the same file). If you call get_mutable_local_copy() you will get a standalone copy