SubstantialElk6 is this the pip to install the agent, or the pip the agent is using to install the packages for the specific experiment ?
HighOtter69
By default if you are continuing an experiment it will start from the last iteration of the previous run. you can reset it with:task.set_initial_iteration(0)
Hi CleanPigeon16
I was wondering how (or if) you handle interruptions.
Good question, basically (and I might be missing a few details but I think that's the general gist).
A new instance will be spinned (spot/regular based on your "compute budget") as long as there is a job in the "monitored" queue. that mean that if a worker was kicked by amazon (i.e. is spot) another one will be spinned instead as long as there is a job in the queue. That means that what is probably missing in you...
So inside the pipeline logic you can do Task.current_task().id
Or inside a component Task.current_task().parent
Hi ShallowArcticwolf27
First of all:
If the answer to number 2 is no, I'd loveee to write a plugin.
Always appreciated β€
Now actually answering the Q:
Any torch.save (or any other framework save) will either register or automatically upload, the file (or folder) in the system. If this is a folder it will be zipped and uploaded, if a file just uploaded to to the assigned storage output (the cleaml-server, any object storage service, or shared folder). I'm not actually sure I...
Btw it seems the docker runs in
network=host
Yes, this is so if you have multiple agents running on the same machine they can find a new open port π
I can telnet the port from my mac:
Okay this seems like it is working
[Assuming the above is what you are seeing]
What I "think" is happening is that the Pipeline creates it's own Task. When the pipeline completes, it closes it's own Task, basically making any later calls to Tasl.current_task() return None, because there is no active Task. I think this is the reason that when you are calling process_results(...) you end up with None.
For a quick fix, you can dopipeline = Pipeline(...) MedianPredictionCollector.process_results(pipeline._task)Maybe we should...
I'm guessing some network issue, though I can't figure why it cannot connect and curl seems to work
Ohh I see, could you copy paste what you put there (instead of the secret and key *** will do π )
Hi FiercePenguin76
should return all datasets from all projects?
Correct π
Hi DisturbedWalrus17
This is a bit of a hack, but will work:from clearml.backend_interface.metrics.events import UploadEvent UploadEvent._file_history_size = 10Maybe we should expose it somewhere, what do you think?
AstonishingWorm64
You can turn on the venv cache , it will just handle it's own full env caching π
See here:
https://github.com/allegroai/clearml-agent/blob/4f7407084d1900a79d455570c573e60f40208742/docs/clearml.conf#L100
This seems to be more complicated than what it looks like (ui/backend combination), not are not working on it, just that it might take some time as it passes control to the backend (which by design does not touch external storage points).
Maybe we should create an S3 cleanup service, listing buckets and removing if the Task ID does not exist any longer. wdyt?
SmarmySeaurchin8 what's the mount command you are using?
Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?
If you take a look here, the returned objects are automatically serialized and stored on the files server or object storage, and also deserialized when passed to the next step.
https://github.com/allegroai/clearml/blob/master/examples/pipeline/pipeline_from_decorator.py
You can of course do the same manually
Eg, i'm creating a task usingΒ
clearml.Task.create
Β , often it doesn't properly get the git diff correctly,
ShakyJellyfish91 Task.create does not store any "git diff" automatically, is there a reason not to use Task.init ?
Legit, if you have a cached_file (i.e. exists and accessible), you can return it to the caller
(I think it is the empty config file)
@<1527459125401751552:profile|CloudyArcticwolf80> what are you seeing in the Args section ?
what exactly is not working ?
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?
but cant catch that only one way for service queue or I can experiments with that?
UnevenOstrich23 I'm not sure what exactly is the question, but if you are asking weather this is limited, the answer is no it is not limited to that use case.
Specifically you can run as many agents in "services-mode" pulling from any queue/s that you need, and they can run any Task that is enqueued on those queues. There is no enforced limitation. Did that answer the question ?
WackyRabbit7 How do I reproduce it ?