Reputation
Badges 1
25 × Eureka!Wait @<1715900788393381888:profile|BitingSpider17> are you passing it on a single Task? these values are read by the daemon (i.e. running on the host) which means it is not getting them from the Task context (which leads to zero effect on the mount points)
Notice that in new versions of the clearml-agent the SDK mount point was changed to: sdk_cache: "/clearml_agent_cache" exactly to solve for the non-root containers:
[None](https://github.com/allegroai/clearml-agent/blob/6b31883e4579...
That experiment says it's completed, does it mean that the autoscaler is running or not?
Not running, it will be "running" if actually being executed
I think we were able to fix it, let me check if it was pushed π
Are you doing from keras import ... or from tensorflow.keras import ?
Hmm so the concept of "company" wide configuration is supported in the enterprise version.
I'm trying to think of a "hack" to just pass these env/conf ...
How are you spinning the agent machines?
I have a task where I create a dataset but I also create a set of matplotlib figures, some numeric statistics and a pandas table that describe the data which I wish to have associated with the dataset and vieawable from the clearml web page for the dataset.
Oh sure, use https://clear.ml/docs/latest/docs/references/sdk/dataset#get_logger they will be visible on the Dataset page on the version in question
Hi IrritableJellyfish76
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_tasks
task_name
(
str
) β The full name or partial name of the Tasks to match within the specified
project_name
(or all projects if
project_name
is
None
). This method supports regular expressions for name matching. (Optional)
You are right, this is a bit confusing, I will make sure that we add in the docstring an examp...
I can see that the data is reloaded each time, even if the machine was not shut down in between.
You can verify by looking into the Task's Log, it will contain all the docker arguments, one of them should be the cache folder mount
(once you verify PR the fix, I'll make sure it is merged)
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
but not as a component (using the decorator)
Hmm yes, I think that component calling component as an external component is not supported yet
(basically the difference is , is it actually running as a function, or running on a different machine as another pipeline component)
I noticed that when a pipeline step returns an instance of a class, it tries to pickle.
Yes this is how the serialization works, when we pass data from one node to another (by design it supports multiple mach...
using this is it possible to add to requirements of task with task_overrides?
Correct, but you will be replacing (not adding) requirements
Hi SubstantialElk6
32 CPU cores, 64GB ram
Should be plenty, this sounds like network bottle neck issue, I can't imagine the server is actually CPU bounded
Yea the "-e ." seems to fit this problem the best.
π
It seems like whatever I add to
docker_bash_setup_script
is having no effect.
If this is running with the k8s glue, there console out of the docker_bash_setup_script ` is currently Not logged into the Task (this bug will be solved in the next version), But the code is being executed. You can see the full logs with kubectl, or test with a simple export test
docker_bash_setup_script
` export MY...
Quite hard for me to try this right
π
How do I reproduce it ?
clearml-agent daemon --detached --queue manual_jobs automated_jobs --docker --gpus 0
If the user running this command can run "docker run", then you should ne fine
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.
unfortunately no π
Maybe if you have a script we can put it somewhere?
How is this different from argparser btw?
Not different, just a dedicated section π Maybe we should do that automatically, the only "downside" is you will have to name the Dataset when getting it (so it will have an entry name in the Dataset section), wdyt ?
LOL, Let me look into it, could it be the calling file is somehow deleted ?
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
Hi @<1523701949617147904:profile|PricklyRaven28>
I'm trying to figure out if i have a way to report pipeline-step artifact paths in the main pipeline task. (So i don't need to dig into steps to find the artfacts.
Basically this is the monitor_artifacts argument
None
:param monitor_artifacts: Optional, log the step's artifacts on the pipeline ...
Hi LazyFish41
Could it be some permission issue on /home/quetalasj/.clearml/cache/ ?
it works if I run the same command manually.
What do you mean?
Can you do:docker run -it <my container here> bashThen immediately get an interactive bash ?
NastyFox63 ask SuccessfulKoala55 tomorrow, I think there is a way to change the default settings even with the current version.
(I.e. increase the default 100 entries limit)
Hi SharpDove45
whatΒ
Β suggested about how it fails on bad/missing credentials
Yes, this is correct, since you specifically set the hosts worst case you will end up with wrong credentials π
pass :task_filter=dict(system_tags=['-archived'])