Reputation
Badges 1
25 × Eureka!Hi ApprehensiveFox95
I think this is what you are looking for:step1 = Task.create( project_name='examples', task_name='pipeline step 1 dataset artifact', repo='
` ',
working_directory='examples/pipeline',
script='step1_dataset_artifact.py',
docker='nvcr.io/nvidia/pytorch:20.11-py3'
).id
step2 = Task.create(
project_name='examples', task_name='pipeline step 2 process dataset',
repo=' ',
working_directory='examples/pipeline',
script='step2_data_pr...
As I understand, providing this param at the Task.init() inside the subtask is too late, because step is already started.
If you are running the task on an agent (with I assume you do), than one way would be to configure the "default_output_uri" on the agnets clearml.conf file.
The other option is to change the task as creation time, task.storage_uri = 's3://...'
This is definitely a but, in the super class it should have the same condition (the issue is checking if you are trying to change the "main" task)
Thanks ApprehensiveFox95
I'll make sure we push a fix 🙂
Draft created successfully, but it doesn't contain property with docker command.
Could you help me?
ApprehensiveFox95 could you test with the latest RC, I think there was a fixpip install clearml==0.17.5rc5
the services queue (where the scaler runs) will be automatically exposed to new EC2 instance?
Yes, using this extra_clearml_conf
parameter you can add configuration that will be passed to the clearml.conf
of the instances it will spin.
Now an example to the values you want to add :agent.extra_docker_arguments: ["-e", "ENV=value"]
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
wdyt?
AFAIK that's the only way right now (see my comment here - https://clearml.slack.com/archives/CTK20V944/p1657720159903739?thread_ts=1657699287.630779&cid=CTK20V944 )
Or then if you have the ClearML paid service, I believe there is a "vaults" service, right AgitatedDove14 ?
Yep UnevenDolphin73 :)
VexedCat68 actually a few users already suggested we auto log the dataset ID used as an additional configuration section, wdyt?
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
Hi ExuberantParrot61 the odd thing is this, message
No repository found, storing script code instead
when you are actually running from inside the repo... (
is it saying that on a specific step, or is it on the pipeline logic itself?
Also any chance you can share the full console output ?
BTW:
you can manually specify a repo branch for a step:
https://github.com/allegroai/clearml/blob/a492ee50fbf78d5ae07b603445f4983feb9da8df/clearml/automation/controller.py#L2841
Example:
https:/...
Hi ExuberantParrot61
Is the pipeline logic code running from inside the repo?
Oh sorry, from the docstring, this will work:
` :param bool continue_last_task: Continue the execution of a previously executed Task (experiment)
.. note::
When continuing the executing of a previously executed Task,
all previous artifacts / models/ logs are intact.
New logs will continue iteration/step based on the previous-execution maximum iteration value.
For example:
The last train/loss scalar reported was iteration 100, the next report will b...
Hi VivaciousWalrus21
After restarting training huge gaps appear in iteration axis (see the screenshot).
The Task.init
actually tries to understand what was the last reported interation and continue from that iteration, I'm assuming that what happens is that your code does that also, which creates a "double shift" that you see as the jump. I think the next version will try to be "smarter" about it, and detect this double gap.
In the meantime, you can do:
` task = Task.init(...)...
VivaciousWalrus21 I took a look at your example from the github issue:
https://github.com/allegroai/clearml/issues/762#issuecomment-1237353476
It seems to do exactly what you expect. and stores its own last iteration as part of the checkpoint. When running the example with continue_last_task=int(0)
you get exactly what you expect
(Do notice that TB visualizes these graphs in a very odd way, and it took me a few clicks to verify it...)
Hi VivaciousWalrus21 I tested the sample code, and the gap was evident in Tensorboard as well. This is not clearml generating this jump this is internal (like the auto de/serialization and continue of the code base)
Hi MotionlessCoral18
You can set all mount points here:
https://github.com/allegroai/clearml-agent/blob/6e31171d314a6e9b276c36d45314025783956b00/docs/clearml.conf#L241
So far, i modified the code to set DOCKER_ROOT_CONF_FILE to what i want !!!
Interesting, do you think a PR is a good next step ? how one would configure it?
I think you are correct, we should move the definition so you can control it from the clearml.conf, make sense to you?
MotionlessCoral18 so did it solve the issue ?
GrumpySeaurchin29 you can pass s3 credential for the autoscaler, but all the tasks will have them. Are you saying two diff sets of credentials is the issue, or is it the visibility?
Hmm this is odd, when you press on the parent dataset in the UI, and go to full-details, then the INFO tab. Can you copy here everything ?
Everything seems correct...
Let's try to set it manually.
create a file ~/trains.conf , then copy paste the credentials section from the UI, it should look something like:api { web_server: http:127.0.0.1:8080 api_server: http:127.0.0.1:8008 files_server: http:127.0.0.1:8081 credentials { "access_key" = "access" "secret_key" = "secret" } }
Let's see if that works
BTW: I think it was fixed in the latest trains package as well as the cleaml package