Reputation
Badges 1
25 × Eureka!I'm so glad you mentioned the cron job, it would have taken us hours to figure
JitteryCoyote63 any chance you have a log of the failed torch 1.7.0 ?
I see, so in theory you could call add_step with a pipeline parameter (i.e. pipe.add_parameter etc.)
But currently the implementation is such that if you are starting the pipeline from the UI
(i.e. rerunning it with a different argument), the pipeline DAG is deserialized from the Pipeline Task (the idea that one could control the entire DAG externally without changing the code)
I think a good idea would be to actually allow the pipeline class to have an argument saying always create from cod...
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
callbacks.append( tensorflow.keras.callbacks.TensorBoard( log_dir=str(log_dir), update_freq=tensorboard_config.get("update_freq", "epoch"), ) )
Might be! what's the actual value you are passing there?
LOL I keep typing clara without noticing (maybe it's the nvidia thing I keep thinking about)
Carla makes much more sense 😄
ReassuredTiger98 quick update, the issue was located, next RC will already contain a fix.
In the mean time, you can avoid it by using limiting pip version:
https://github.com/allegroai/clearml-agent/blob/715f102f6d98a44131d5bee909ee779b456c6229/docs/clearml.conf#L67pip_version: "<20.2"
ReassuredTiger98 do you know if tensorboard (not tensorboardX) also supports gif there ?
So was the issue solved?
so all models are part of the same experiment and has the experiment name in their name.
Oh that explains it, (1) you can use the model filename to control the model name in clearml (2) you can disable the autologging and manually upload the model, then you can control the model name
wdyt?
MagnificentSeaurchin79 are you using the latest RC ?
(I think this was exactly the issue)
EDIT:
try to create the version withe the file removed after you upgrade to the latest RC (0.17.5rc3) in the summary you should see 1 file removed.
LOL, if you can get it to run any python code, I can help with the rest. We just need to make sure we can capture the output, and then start the VScode remote debugging feature directly from the extension.
Is this reproducible? I tried to run the same example code on my machine, and it started training ...
Do you have issues with other pytorch examples? Could you try simple reporting example:
https://github.com/allegroai/clearml/blob/master/examples/reporting/scalar_reporting.py
No worries, just wanted to make sure it doesn't slip away 😉
https://hub.docker.com/layers/nvidia/cuda/10.1-cudnn7-runtime-ubuntu18.04/images/sha256-963696628c9a0d27e9e5c11c5a588698ea22eeaf138cc9bff5368c189ff79968?context=explore
the docker image is missing the cudnn which is a must for TF to work 🙂
now that I know I could use the right click I'll use it like in google drive etc.
That was the initial thought, but I think right clicking on a web page is not you "go to action", especially for Mac ppl ...
Correct 🙂
You can spin it in two modes, either venv or docker (notice that even in docker mode, it will still clone the code into the docker and install the packages inside the docker, but it also inherits from the docker preinstalled system packages, so that the installation process is a lot faster, but you have the ability to change packages without having to build an entire new docker image)
Hmm, what's the clearml-agent version ?
EnviousPanda91 the host checks if you have a .ssh folder on the machine, if you do, it will copy+mount it into the container, then it will delete the copy when the container is down.
Specifically /tmp/clearml_agent.ssh.rbw8o0t7
is the copy of the .ssh that the agent created, and now it is mounting it into the container
RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?
Do you mean it recently become part of enterprise version?
I do not think so, but it seems this the support for the open-source is more like a PoC
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
I see, is this what you are looking for?
https://allegro.ai/docs/task.html#trains.task.Task.init
continue_last_task='task_id'
Hi VexedCat68
The scheduler is set to run once per hour but even now I've got around 40+ anonymous running tasks.
Based on the screenshots these are the Datasets (which are also a Task with specific type etc).
I would actually name the Datasets you are creating You need to specify the parent version (i.e. how would it know it is a child dataset changeset) I'm assuming they are all uploading everything, hence still running?BTW: you can use the argument single_instance=True
maki...
trains-agent RC (which they tell me will be out tomorrow) will have a switch to do that, just so it is easier 🙂
(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?
Hmm, is there a way to do this via code?
Yes, clone the Task Task.clone
Then do data=task.export_task()
and edit the data object (see execution section)
Then update back with task.update_task(data)
Hi @<1523701168822292480:profile|ExuberantBat52>
What do you mean by:
- dataset_1 -> script_2 -> dataset_2a dataset creates a script ?
AstonishingSeaturtle47 I think there's a workaround for the GitHub multiple repo issue. See https://gist.github.com/gubatron/d96594d982c5043be6d4