Hi, Is there a way to stop a clearml-agent from within an experiment?
It is possible but only in the paid tier (it needs backend support for that) 😞
My use case it: in a spot instance marked for termination after 2 mins by aws
Basically what you are saying is you want the instance to spin down after the job is completed, correct?
Okay let me see if I can think of something...
Basically crashing on the assertion here ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L495
Could it be your are passing "Args/resume" True, but not specifying the checkpoint ?
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train.py#L452
I think I know what's going on:
https://github.com/ultralytics/yolov5/blob/d95978a562bec74eed1d42e370235937ab4e1d7a/train...
ScantMoth28 it should work, I think default deployment also has an NGINX with reverse proxy on it switching from " http://clearml-server.domain.com/api " to " http://api.clearml-server.domain.com "
Hi VexedCat68
What type of data is it? And what type of annotations?
Streaming data into the training process is great, but is it post quality control?
Thanks BattyLion34 I fixed the code snippet :)
My main query is do I wait for it to be a sufficient batch size or do I just send each image as soon as it comes to train
This is usually a cost optimization issue, generally speaking if GPU up time is not an issue that the process is stochastic anyhow, so waiting for a batch or not is not the most important factor (unless you use batchnorm layer, in that case this is basically a must)
I would not be able to split the data into train test splits, and that it would be very expensiv...
What about the epochs though? Is there a recommended number of epochs when you train on that new batch?
I'm assuming you are also using the "old" images ?
The main factor here is the ratio between the previously used data and the newly added data, you might also want to resample (i.e. train on more) new data vs old data. make sense ?
I understand that it uses time in seconds when there is no report being logged..but, it has already logged three times..
Hmm could it be the reporting started 3 min after the Task started ?
MagnificentSeaurchin79 you can delay it with:task.set_resource_monitor_iteration_timeout(seconds_from_start=1800)
Correct, (if this is running on k8s it is most likely be passed via env variables , CLEARML_WEB_HOST etc,)
btw: what's the OS and python version?
Hi GrievingTurkey78task.models['output'][-1]
should return the last stored model.
What do you have under under task.models['output'][-1].url
Correct:extra_docker_shell_script: ["apt-get install -y awscli", "aws codeartifact login --tool pip --repository my-repo --domain my-domain --domain-owner 111122223333"]
clearml doesn’t do any “magic” in regard to this for tensorflow, pytorch etc right?
No 😞 and if you have an idea on how, that will be great.
Basically the problem is that there is no "standard" way to know which layer is in/out
I can share some code
Please do 🙂
I can probably have a python script that checks if there are any tasks running/pending, and if not, run docker-compose down to stop the clearml-server, then use boto3 to trigger the creating of a snapshot of the EBS, then wait until it is finished, then restarts the clearml-server, wdyt?
I'm pretty sure there is a nice way, let me check soemthing
Hi JitteryCoyote63
experiments logs ...
You mean the console outputs ?
GrievingTurkey78 please feel free to send me code snippets to test 🙂
Funny it's the extension "h5" , it is a different execution path inside keras...
Let me see what can be done 🙂
@<1687643893996195840:profile|RoundCat60> can you access the web UI over https ?
OddAlligator72 FYI you can also import / export an entire Task (basically allowing you to create it from scratch/json, even without calling Task.create)Task.import_task(...) Task.export_task(...)
I would guess that for some reason loglevel is DEBUG, could that be the case?
Oh my bad, post 0.17.5 😞
RC will be out soon, in the meantime you can install directly from github:pip install git+
if I use automatic code analysis it will not find all packages because of
importlib
.
But you can manually add them with Task.add_requirements, no?
For now I come to the conclusion, that keeping a
requirements.txt
and making clearml parse
Maybe we could just have that as another option?
Yes, the mechanisms under the hood are quite complex, the automagic does not come for "free" 🙂
Anyhow, your perspective is understood. And as you mentioned I think your use case might be a bit less common. Nonetheless we will try to come-up with a solution (probably an argument for Task.init so you could specify a few more options for the auto package detection)
It seems like the naming Task.create a lot of confusion (we are always open to suggestions and improvements). ReassuredTiger98 from your suggestion, it sounds like you would actually like more control in Task.init (let's leave Task.create aside, as its main function is Not to log the current running code, but to create an auxiliary Task).
Did I understand you correctly ?