Let say I donβt have the data on my local machine but only S3 bucket.
You can still register it, but make sure you do not delete it from the S3 bucket because it will keep a link to it
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known')': /
what did you put in output_uri
?
ERROR: Error checking for conflicts. ... AttributeError: _DistInfoDistribution__dep_map
Try to add '--network host' to the docker args on the task you are launching
Hi ScantChimpanzee51
having the ClearML auto scaler at all is super great and an impressive tool!
Thank you! π
As all data resides within the container, it is lost afterwards.
Nothing to fear there, if you are using the StorageManager, the destination is always the cache folder, which the agent automatically mounts to the host machine.
That said if the EC2 instance is taken down (i.e. idle) then the cache is lost with it.
Make sense?
I'm sorry JitteryCoyote63 No π
I do know that the enterprise addition have these features (a.k.a vault & permissions), basically to answer these types of situations.
quick video of the search not working
Thank you! this is very helpful, passing along to front-end guys π
and ctrl-f (of the browser) doesnβt work as lines below not loaded (even when you scroll it will remove the other lines not visible, so you canβt ctrl-f them)
Yeah, that's because they are added lazily
Exactly, thatβs my problem: I want to remove it to make sure it is reinstalled (because the version can change)
JitteryCoyote63 yes, this is definitely a pip bug... can you test with the latest pip version, maybe it was fixed? (i.e. git+https:// link)
I'm not sure this is configurable from the outside π
Awesome ! thank you so much!
1.0.2 will be out in an hour
Hi @<1635088270469632000:profile|LividReindeer58>
You mean the clearml.conf?
You can do:
from clearml.config import config_obj
you should have the entire configuration file as an object (dict interface)
fyi: under the hood it uses pyHOCON
you mean The Task already exists or you want to create a Task from the code ?
Wow, thank you very much. And how would I bind my code to task?
you mean the code that creates pipeline Tasks ?
(remember the pipeline itself is a Task in the system, basically if your pipeline code is a single script it will pack the entire thing )
ReassuredTiger98 I β€ the DAG in ASCII!!!
port = task_carla_server.get_parameter("General/port")
This looks great! and will acheive exactly what you are after.
BTW: when you are done you can do :task_carla_server.mark_aborted(force=True)
And it will shutdown the Clara Task π
Could you see if that makes a difference ?
my experiment logic
you mean the actual code doing the training ?
so that it gets lazily executed and not at task definition time
Task definition time -> when creating the Pipeline Task? remember the base_task_factory a the end creates a Task object (it does not run the code itslef).
BTW: if you have simple training logic you can use pipeline decorators , it might be a better fit?
https://clear.ml/docs/latest/docs/fundamentals/pipelines#pipeline-from-function-decorator
trains-agent runs a container from that image, then clones ...
That is correct
I'd like the base_docker_image to not only be defined at runtime
I see, may I ask why not just build it once, push it into artifactory and then have trains-agent
use it? (it will be much faster)
Are you sure you added the pytorch channel in clearml.conf ?
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L64
Run clearml-agent and enqueue the pipeline ? What am i missing?
Hmm could you try to upload to your files server (not the S3)
Maybe some credentials error ?
well it should fail, but I think the error message should be fixed π
maybe:ValueError: dataset 'tmp_datset' not found in project
lavi-testing' `wdyt?
Hi ReassuredTiger98
Are you running the agent in venv mode ?
. I guess this can be built in as a feature into ClearML at some future point.
VexedCat68 you mean referencing an external link?
I have to admit mounting it to a different drive is a good reason to bring this feature back, the reasoning was it means the agent needs to make sure it manages them (e.g. multiple agents running on the same machine)
@<1523716917813055488:profile|CloudyParrot43> yes server upgrades deleted it π we are redeploying a copy, should take a few min
Shout-out to Emilio for quickly stumbling on this rare bug and letting us know. If you have a feeling your process is stuck on exit, just upgrade to 1.0.1 π
BTW: GreasyPenguin14 you can also upload them as debug samples (when setting the output_uri, the debug samples will be uploaded to the same destination)
https://github.com/allegroai/clearml/blob/6b9297660e0ed83a77bce3da2fab384c552206fd/examples/reporting/image_reporting.py#L21
my question is how to recover, must i recreate the agents or there is another way?
Yes you have to recreate the Task (I assume they failed, no?!)