JitteryCoyote63 I think I failed explaining myself.
- I think the problem of the controller is that you are interacting (aka changing hyper parameters)) with a Task created using new SDK version, with an older SDK version. specifically we added section names to the hyper parameters, and only new version of the SDK is aware of it.
Make sense? - Regrading the actual problem. It seems like this is somehow related to the first one, the task at run time is using an older SDK version , and I t...
This line 🙂
None
Notice Triton (and so is clearml-serving) needs the pytorch model to be converted into torchscript, so that the triton backend can load it
what do you have here in your docker compose :
None
DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?
Wait IrritableOwl63 this looks like ti worked, am I right ? huggingface was correctly installed
GrievingTurkey78 in your cleaml.conf do you have?agent.package_manager.type: condaOr
https://github.com/allegroai/clearml-agent/blob/73625bf00fc7b4506554c1df9abd393b49b2a8ed/docs/clearml.conf#L59
I assume the account name and key refers to the storage account credentials that you can from Azure Storage Explorer?
correct
(It would be nice to have all the Pypi releases tagged in github btw)
I wanted to say, we listen ... and point to the tag , but for some reason it was not pushed LOL.
Why? The task should have completed successfully, how is this aborting?
Early stopping by the HPO process, like hyper-band, e.g. this training model is going nowhere let's stop it.
Okay the type is inferred from the default value of the function step itself, that means that both:data_frame = step_one(pickle_url, extra=1337)anddata_frame = step_one(pickle_url, 1337)Will pass extra as int .
That said if the default value of the argument is missing, it will revert to str
In order to use the type hints as casting hint, we actually need to improve the task.connect to support the type casting (they are stored)
Are you saying you have a single line in the console output of the component Task?
Hi ShortElephant92
No, this is opt-in, so other then checking for updates once in a while, no traffic at all
I think the main issue is that for some reason the container running changed one of the files inside the temp folder. then the host machine is "stuck" with a file that the root user owned/changed, and now it cannot reuse / delete the temp folder.
I think the fix is to make sure the container deleted the temp folder when it is done
Hi ArrogantBlackbird16
but it returns a task handle even after the Task has been closed.
It should not ... That is a good point!
Let's fix that 🙂
K8s can schedule pod with different priorities.
I'm not sure I agree here, could you refer me to the docs on this ability in k8s ?
So maybe no real scheduling means there is no ClearML scheduling after applying pod to k8s.
That is correct 🙂
Does it will implement in the future?
Yes, this is enterprise feature, in the community you can specify --max-pods limit (which will cause it never to pull a job if it hits the max-pod limit)
Hi RoundMosquito25
Hi, are there available somewhere examples of testing in ClearML? For example unit tests that check if parameters are passed correctly to new tasks etc.?
What do you mean by "testing in ClearML" ?
For example unit tests that check if parameters are passed correctly
Passed where / how? Are we thinking agents here ?
With k8s glue going, want to finally look at clearml-session and how people are using it.
If used with k8s glue, you will have to run the glue with --ports-mode, then the clearml session will know how to connect to container itself, since at runtime the container will register the gateway + port for the learml-session client to connect to
JitteryCoyote63 so now everything works as expected ?
VexedCat68 yes 🙂 you can also pass the parent folder and it will zip the entire subfolders into a single artifact
Thanks for answering, Yes, this is exactly what I wanted
Hmm should be possible, how slow is the update that we want to save the time ?
Your git execution needs this file, just like your machine does, to know where the server is and how to authenticate. You have to Manually pass it to your git action.
WackyRabbit7
Cool - so that means the fileserver which comes with the host will stay emtpy? Or is there anything else being stored there?
Debug Images and artifacts will be automatically stored to the file server.
If you want your models to be automagically uploaded add the following :task=Task.init('example', 'experiment', output_uri=' ')(You can obviously point it to any other http/S3/GS/Azure storage)
I'm trying to queue a task in python but I'd like to reuse the prior task ID.
is it your own Task? i,,e, enqueue yourself, if this is the case use task.execute_remotely it will do just that.
If this is another Task, then if it is aborted then you can just enqueue it, by definition it will continue with the Same Task ID.
try to break it into parts and understand what produces the error
for example:increase(test12_model_custom:Glucose_bucket[1m])increase(test12_model_custom:Glucose_sum[1m])increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m])
and so on
DefeatedOstrich93 what do you mean by "I am wondering why do I need to create files before applying diff ?"git diff will not list files unless their are added (they are marked as "untracked") think temp files logs etc. until you add a file to git it will basically ignore that file. Make sense ?
DeliciousBluewhale87 fyi, the new version of the pipeline (hopefully pushed towards the end of this week), will allow you to more easily write steps as functions (not only as Tasks, as the current implementation)
Also check the new Trigger and Scheduler both intended to trigger these pipelines:
https://github.com/allegroai/clearml/blob/fe3c481c37e70881c44d67c1cf9bbce00a84747e/clearml/automation/scheduler.py#L457
https://github.com/allegroai/clearml/blob/fe3c481c37e70881c44d67c1cf9bbce00a8...