Ohh I see.
In your web app, look for the "?" icon (bottom left corner), click on it, it should open the full platform documentation
Actually with
base-task-id
it uses the cached venv, thanks for this suggestion! Seems like this is equivalent to cloning via UI.
exactly !
But “cloning” via UI runs an exact copy of the code/config, not a variant,
You can override the commit/branch and get the latest ...
run exp tweak code/configs in IDE, or tweak configs via CLI have it re-rerun in exact same venv (with no install overhead etc)So you can actually launch it remotely directly from the code:
...
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
Thank you @<1523701949617147904:profile|PricklyRaven28> !!!
Let me see if we can reproduce and how to solve it
Hi ProudChicken98
How about saving it as a local YAML and upload the file itself as an artifact?
. So to conclude: it has to be executed manually first, then with trains agent?
Yes, that said, as you mentioned, you can always edit the "installed packages" once manually, from that point you are basically cloning the experiment, including the "installed packages" so it should work if the original worked.
Make sense ?
Hi PanickyMoth78
Yes i think you are correct, this looks like gs throttling your connection. You can control the number of concurrent uploads with max_worker=1
https://github.com/allegroai/clearml/blob/cf7361e134554f4effd939ca67e8ecb2345bebff/clearml/datasets/dataset.py#L604
Let me know if it works
Or you want to generate it from a previously executed run?
MelancholyElk85 if you are manually adding models OutputModel, then when you call update_weights(...) upload will start in the background (if the process ends it will wait until the upload is competed). You can also specify auto_delete_file which will delete the local copy once the upload completes
JitteryCoyote63 How can I reproduce it quickly?
Isn't that risky? not knowing you need a package ?
How do you actually install it on the remote machine with the agent ?
JitteryCoyote63
IAM role to the web app could access
you mean the web client key/secret to access S3 data ?
SweetGiraffe8
That might be it, could you test with the Demo server ?
So the naming is a by product of the many TB created (one per experiment), if you add different naming ot the TB files, then this is what you'll be seeing in the UI. Make sense ?
Will they get ordered ascending or descending?
Good point, I'll check the docs... but I think they do not specify
https://clear.ml/docs/latst/docs/references/sdk/task#taskget_tasks
From the code it seems the ordered is not guaranteed.
You can however pass '-last_update' : order_by which will give you the latest updated first
` task_filter = {
'page_size': 2,
'page': 0,
'order_by': ['last_metrics.{}.{}'.format(title, series), '-last_update']
}
Task.get_tasks(...
I'm glad you were able to solve the issue!
WackyRabbit7 I could not reproduce it, what did you pass in "GOOGLE_APPLICATION_CREDENTIALS" ?
is there something else in the conf that i should change ?
I'm assuming the google credentials?
https://github.com/allegroai/clearml/blob/d45ec5d3e2caf1af477b37fcb36a81595fb9759f/docs/clearml.conf#L113
Hi SubstantialElk6
Yes this is the queue the glue will pull jobs from and push into the k8s. You can create a new queue from the UI (go to the workers&queues page and to the Queue Tab and press on "create new" Ignore it 🙂 this is if you are using config maps and need TCP routing to your pods As you noted this is basically all the arguments you need to pass for (2). Ignore them for the time being This is the k8s overrides to use if launching the k8s job with kubectl (basically --override...
The log is missing, but the Kedro logger is print to sys.stdout in my local terminal.
I think the issue night be it starts a new subprocess, and that subprocess is not "patched" to capture the console output.
That said if an agent is running the entire pipeline, then everything is logged from the outside, so whatever is written to stdout/stderr is captured.
Hmm I assume it is not running from the code directory...
(I'm still amazed it worked the first time)
Are you actually using "." ?
Yes, in tandem with the experiments (because they constantly log to the server).
That said, with 0.16 we added offline mode, so you can run in offline mode, then import the experiment into the system.
SubstantialElk6
Hmm do you have torch in the "installed packages" section of the Task ?
(This what the agent is using to setup the environment inside the docker, running as a pod)
Sure set os environment 'CLEARML_NO_DEFAULT_SERVER=1`