I added the following to the
clearml.conf
file
the conf file that is on the worker machine ?
Hi NastySeahorse61
Did you archive And delete the experiments from the archive?
BTW: I think this question belongs to
Sounds great! I really like that approach, thanks GrotesqueDog77 !
Hmm I think you are correct:param auto_create: Create new dataset if it does not exist yet
it should have created it, this seems like a bug, I'll make sure to pass along π
Can you please tell me if you know whether it is necessary to rewrite the Docker compose file?
not by default, it should basically work out of the nox as long as you create the same data folders on the host machine (e.g. /opt/clearml)
Thanks PompousBaldeagle18 !
Which software you used to create the graphics?
Our designer, should I send your compliments π ?
You should add which tech is being replaced by each product.
Good point! we are also missing a few products from the website, they will be there soon, hence the "soft launch"
Hi GrotesqueDog77
What do you mean by share resources? Do you mean compute or storage?
Hi SkinnyPanda43
No idea what the ImageId actually is.
That's the ami image string that the new EC2 will be started with, make sense ?
I guess I just have to make sure that total memory usage of all parallel processes are not higher than my gpu's memory.
Yep, unfortunately I'm not aware of any way to do that automatically π
I think task.init flag would be great!
π
ZanyPig66 is this reproducible? This sounds like a bug, whats the TB version and OS you rae using?
Is this example working for you (i.e. you see debug images)
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
'relaunch_on_instance_failure'
This argument is Not part of the Pipeline any longer, are you running the latest clearml
python version?
Hope you donβt mind linking to that repo
LOL π
Hi ExcitedFish86
In Pytorch-Lightning I use DDP
I think a fix for pytorch multi-node / process distribution was commited to 1.0.4rc1, could you verify it solves the issue ? (rc1 should fix this specific issue)
BTW: no problem working with cleaml-server < 1
Hi PanickyMoth78
can receive access to a GCP project and use GKE to spin clusters up and workers or would that be on the customer to manage.
It does, and also supports AWS.
That said only the AWS is part of the open-source, but both are parts of the paid tier (I think Azure is in testing)
IrritableOwl63 in the profile page, look at the bottom right corner
The only workaround I can think of is :series = series + 'IoU>X'
It doesn't look that bad π
AbruptHedgehog21 the bucket and the full link are registered on the model object itself, you can see them in the ui, under the models tab. The only thing you actually need to pass inside is the credentials. Make sense?
how did you try to restart them ?
Yes, but how did you restart the agent on the remote machine ?
Do you accidentally know if there are any plans for an implementation with the logger variable, so that in case of something it would be possible to write to different tables?
CheerfulGorilla72 what do you mean "an implementation with the logger variable" ? pytorch-lighting defaults to the TB logger, which clearml will automatically catch and log into the clearml-server, you can always add additional logs with clearml interface Logger.current_logger().report_???
What am I mis...
No, clearml uses boto, this is internal boto error, which points bucket size limit, see the error itself
DefeatedOstrich93 can you verify lightning actually only stored once ?
Hmm I just tested on the community version and it seems to work there, Let me check with frontend guys. Can you verify it works for you on https://app.community.clear.ml/ ?
ReassuredTiger98 I β€ the DAG in ASCII!!!
port = task_carla_server.get_parameter("General/port")
This looks great! and will acheive exactly what you are after.
BTW: when you are done you can do :task_carla_server.mark_aborted(force=True)
And it will shutdown the Clara Task π
Hi OddAlligator72
itΒ
Β that they do not support PBT.
The optimization algorithm themselves are usually external (although the trivial stuff are in within Trains)
Do you have a specific PBT implementation you are considering ?
GiganticTurtle0 is it just --stop that throws this error ?
btw: if you add --queue default
to the command line I assume it will work, the thing is , without --queue it will look for any queue with the "default" tag on it, since there are none, we get the error.
regardless that should not happen with --stop
I will make sure we fix it
Just so we do not forget, can you please open an issue on clearml-agent github ?