WickedGoat98 Basically you have two options:
Build a docker image with wget installed, then in the UI specify this image as "Base Docker Image" Configure the trains.conf
file on the machine running the trains-agent, with the above script. This will cause trains-agent
to install wget on any container it is running, so it is available for you to use (saving you the trouble of building your own container).With any of these two, by the time your code is executed, wget is installed an...
It looks somewhat familiar ... ๐
SuccessfulKoala55 any idea?
Correct ๐
I'm assuming the Task object is not your Current task, but a different one?
Generally speaking I would say the Nvidia deep-learning AMI:
https://aws.amazon.com/marketplace/pp/prodview-7ikjtg3um26wq
Hi GrotesqueDog77
What do you mean by share resources? Do you mean compute or storage?
How come the second one is one line?
Hi @<1545216070686609408:profile|EnthusiasticCow4>
My biggest concern is what happens if the TaskScheduler instance is shutdown.
good question, follow up, what happens to the cron service machine if it fails?!
TaskScheduler instance is shutdown.
And yes you are correct if someone stops the TaskScheduler instance
it is the equivalent of stopping the cron service...
btw: we are working on moving some of the cron/triggers capabilities to the backend , it will not be as flexi...
Anyhow if the StorageManager.upload was fast, the upload_artifact is calling that exact function. So I don't think we actually have an issue here. What do you think?
JitteryCoyote63
So there will be no concurrent cached files access in the cache dir?
No concurrent creation of the same entry ๐ It is optimized...
but I have no idea what's behingย
1
,ย
2
ย andย
3
ย compare to the first execution
This is why I would think multiple experiments, since it will store all the arguments (and I think these arguments are somehow being lost.
wdyt?
okay this seems like a broken pip install python3.6
Can you verify it fails on another folder (maybe it's a permissions thing, for example if you run in docker mode, then the permissions will be root, as the docker is creating those folders)
TrickyFox41 are you saying that if you add Task.init inthe code it works, but when you are calling "clearml-task" it does not work? (in both cases editing the Args/overrides ?
Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init
function is ignored, because the Task already exists. Make sense ?
So it should cache the venvs right?
Correct,
path: /clearml-cache/venvs-cache
Just making sure, this is the path to the host cache folder
ClumsyElephant70 I think I lost track of the current issue ๐ what's exactly not being cached (or working)?
Notice that you can embed links to specific view of an experiment, by copying the full address bar when viewing it.
How much free RAM / disk do you have there now? How's the CPU utilization ? how many Tasks are working with this machine at the same time
For the on-prem you can check the k8s helm charts it case spin agents for you (static agents).
For the GKE the best solution is the k8s glue:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
Seems correct.
I'm assuming something is wrong with the key/secret quoting ?!
Could you generate another one and test it ?
(you can have multiple key/secretes on the same user)
is everything on the same network?
OutrageousGiraffe8 so basically replacing to:self.d1 = ReLU()
yes, so you can have a few options ๐
assume clearml has some period of time that after it, shows this message. am I right?
Yes you are ๐
is this configurable?
It is ๐task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
hit ctrl-f5 (reload the page) do you still ge the same error? Is it limited to a specific experiment?
Yep, everything (both conda and pip)
ThickDove42 you need the latest cleaml-agent RC for the docker setup script (next version due next week)pip install clearml-agent==0.17.3rc0