Hi SubstantialElk6
32 CPU cores, 64GB ram
Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
Quite hard for me to try this right
π
How do I reproduce it ?
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.
unfortunately no π
Maybe if you have a script we can put it somewhere?
How is this different from argparser btw?
Not different, just a dedicated section π Maybe we should do that automatically, the only "downside" is you will have to name the Dataset when getting it (so it will have an entry name in the Dataset section), wdyt ?
LOL, Let me look into it, could it be the calling file is somehow deleted ?
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
Hi @<1523701949617147904:profile|PricklyRaven28>
I'm trying to figure out if i have a way to report pipeline-step artifact paths in the main pipeline task. (So i don't need to dig into steps to find the artfacts.
Basically this is the monitor_artifacts argument
None
:param monitor_artifacts: Optional, log the step's artifacts on the pipeline ...
Hi LazyFish41
Could it be some permission issue on /home/quetalasj/.clearml/cache/ ?
it works if I run the same command manually.
What do you mean?
Can you do:docker run -it <my container here> bashThen immediately get an interactive bash ?
NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)
Hi SharpDove45
whatΒ
Β suggested about how it fails on bad/missing credentials
Yes, this is correct, since you specifically set the hosts worst case you will end up with wrong credentials π
pass :task_filter=dict(system_tags=['-archived'])
Hi @<1692345677285167104:profile|ThoughtfulKitten41>
Is it possible to trigger a pipeline run via API?
Yes! a pipeline is at the end a Task, you can take the pipeline ID and clone and enqueue it
pipeline_task = Task.clone("pipeline_id_here")
Task.enqueue(pipeline_task, queue_name="services")
You can also monitor the pipeline with the same Task inyerface.
wdyt?
Is there any way to make that increment from last run?
pipeline_task = Task.clone("pipeline_id_here", name="new execution run here")
Task.enqueue(pipeline_task, queue_name="services")
wdyt?
SubstantialElk6 on the client side?
It seems to try to p[ull with SSH credentials, add your user/pass(or better APIkey) to the clearml.conf
(look for git_user /git_pass)
Should solve the issue
SoggyBeetle95 maybe it makes sense to configure the agent with an access-all credentials? Wdyt
SoggyBeetle95 you can configure the credentials in the clearml.conf running on the agent machines:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L320
(I'm assuming these are storage credentials)
If you need general purpose env variables, you can ad them here:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149
with ["-e", "MY_VAR=MY_VALUE"]
FierceRabbit20 it seems the Pipeline Task that was created is missing the "installed requirements" section. How are you creating the actual pipeline Task? is this from code?
EnviousPanda91 please feel free to PR if it works π
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/frameworks/catboost_bind.py#L114
Go to the workers & queues, page right side panel 3rd icon from the top
Hi NastyFox63 yes I think the problem was found (actually backend side).
It will be solved in the upcoming release (due after this weekend π )
https://clear.ml/docs/latest/docs/references/sdk/task#mark_stopped
Maybe we should add an argument so you could do:mark_stopped(force=False, message='it was me who stopped it')And we will automatically add the user name as well ?
Ohh, then yes, you can use the https://github.com/allegroai/clearml/blob/bd110aed5e902efbc03fd4f0e576e40c860e0fb2/clearml/automation/monitor.py#L10 class to monitor changes in the dataset/project