Reputation
Badges 1
25 × Eureka!UpsetTurkey67 are you saying there is a sym link in the original repository, and when it copies it, it breaks the symlink ?
Hmm ConvincingSwan15
WARNING - Could not find requested hyper-parameters ['Args/patch_size', 'Args/nb_conv', 'Args/nb_fmaps', 'Args/epochs'] on base task
Is this correct ? Can you see these arguments on the original Task in the UI (i.e. Args section, parameter epochs?)
Okay, so I think it doesn't find the correct Task, otherwise it wouldn't print the warning,
How do you setup the HPO class ? Could you copy paste the code?
Hi @<1713001673095385088:profile|EmbarrassedWalrus44>
So Triton has load/unload model, but these are slowwww, meaning you cannot use them inside a request (you'll just hit the request timeout every time it tries to load the model)
as you can see this is classified as "wish-list" , this is not trivial to implement and requires large CPU RAM to store the entire model, so "loading" becomes moving CPU to GPU memory (which also is not the fastest but the best you can do). As far as I understand ...
. That speed depends on model sizes, right?
in general yes
Hope that makes sense. This would not work under heavy loads, but eg we have models used once a week only. They would just stay unloaded until use - and could be offloaded afterwards.
but then you still might encounter timeout the first time you access them, no?
Hi @<1631102016807768064:profile|ZanySealion18>
ClearML (remote execution) sometimes doesn't "pick-up" GPU. After I rerun the task it picks it up.
what do you mean by "does not pick up"? is it the container is up but not executed with --gpus , so no GPU access?
Create a new version of the dataset by choosing what increment in SEMVER standard I would like to add for this version number (major/minor/patch) and uploadOh this is already there
` cur_ds = Dataset.get(dataset_project="project", dataset_name="name")
if version is not given it will auto increase based on semantic versions incrementing the last number 1.2.3 -> 1.2.4
new_ds = Dataset.create(dataset_project="project", dataset_name="name", parents=[cur_ds.id]) `
currently I'm doing it by fetching the latest dataset, incrementing the version and creating a new dataset version
This seems like a very good approach, how would you improve ?
BroadMole98 thank you for noticing !
I'll make sure it is fixed (a few other properties are also missing there, not sure why, I'll ask them to take a look)
task._wait_for_repo_detection()
You can use the above, to wait until repository & packages are detected
(If this is something users need, we should probably make it a "public function" )
MelancholyElk85
How do I add files without uploading them anywhere?
The files themselves need to be packaged into a zip file (so we have an immutable copy of the dataset). This means you cannot "register" existing files (in your example, files on your S3 bucket?!). The idea is to make sure your dataset is protected against changes on the one hand, but on the other to allow you to change it, and only store the changeset.
Does that make sense ?
Hi UpsetBlackbird87
I might be wrong, but it seems like ClearML does not monitor GPU pressure when deploying a task to a worker rather rely only on its configured queues.
This is kind of accurate, the way the agent works is that you allocate a resource for the agent (specifically a GPU), then sets queues (plural) to listen to (by default priority ordered). Then each agent is individually pulling jobs and running on the allocated GPU.
If I understand you correctly, you want multiple ...
Hi BurlyPig26
I think you can easily change the Web port, but not the API (8008) or files (8081) port
How are you deploying it?
PanickyMoth78 RC is outpip install clearml==1.6.3rc1
🤞
ZanyPig66 it sounds like you need to add the docker args for binding, just add to the Task.create the argument: 'docker_args="-v /mnt/host:/mnt/container"'
Hi @<1618056041293942784:profile|GaudySnake67>Task.create
is designed to create an External task not from the current running process.Task.init
is for creating a Task from your current code, and this is why you have all the auto_connect parameters. Does that make sense ?
By default the remote link (i..e the Task you are creating with Task.create will have all the auto logging turned on)
For finer control we kind of assume you have Task.init inside your remote script, and then just pass add_task_init_call=False
does that make sense ?
Do you think we should have a way to configure those auto_connect args when creating the Task?
Oh no 😞 I wonder if this is connected to:
Any chance the logger is running (or you have) from a subprocess ?
Hi @<1690896098534625280:profile|NarrowWoodpecker99>
Once a model is loaded into GPU memory for the first time, does it stay loaded across subsequent requests,
yes it does.
Are there configuration options available that allow us to control this behavior?
I'm assuming your're thinking dynamic loading/unloading models from memory based on requests?
I wish Triton added that 🙂 this is not trivial and in reality to be fast enough the model has to leave in RAM then moved to GPU (...
This one should work:
` path = task.connect_configuration(path, name=name)
if task.running_locally():
my_params = read_from_path(path)
my_params = change_parmas(my_params) # change some staff
store back the change, my_params assumed to be the content of the param file (text)
task.set_configuration_object(name=name, config_taxt=my_params) `
I'm sorry JitteryCoyote63 No 😞
I do know that the enterprise addition have these features (a.k.a vault & permissions), basically to answer these types of situations.
I think your use case is the original idea behind "use_current_task" option, it was basically designed to connect code that creates the Dataset together with the dataset itself.
I think the only caveat in the current implementation is that it should "move" the current Task into the dataset project / set the name. wdyt?
Hmm interesting...
of course you can do:dataset._task.connect(...)
But maybe it should be public?!
How are you using that (I mean in the context of a Dataset)?
Or am I forced to do a get, check if the latest version is fainallyzed,
Dataset Must be finalized before using it. The only situation where it is not is because you are still in the "upload" state.
, then increment de version of that version and create my new version ?
I'm assuming there is a data processing pipeline pushing new data?! How do you know you have new data to push?
command line to the arg parser should be passed via the "Args" section in the Configuration tab.
What is the working directory on the experiment ?
Glad to hear that! 🙂