DeliciousBluewhale87 You can havwe multiple queues for the k8s queuea in priory order:python k8s_glue_example.py --queue glue_q_high glue_q_low
Then if someone is doing 100 experiments (say HPO), then they push into the "glie_q_low" which means it will first pop Tasks from the high priority queue and if it is empty it will pop from the low priority queue.
Does that make sense ?
JitteryCoyote63 I think there is a ClearML logger , no?
(I think it is the empty config file)
Hi RipeGoose2
You can also report_table them? what do you think?
https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py
https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/clearml/logger.py#L277
Worker just installs by name from pip, and it installs not my package!
Oh dear ...
Did you configure additional pip repositories in the Agent's clearml.conf ? https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L77 It might be that (1) is not enough, as pip will first try to search the package in the pip repository, and only then in the private one. To avoid that, in your code you can point directly to an https of your package` Ta...
Hmm can you try with additional configuration, next to "secure: true" in your clearml.conf, can you add "verify: false"
MuddySquid7
are you saying that for some reason the models pick the artifacts ? Is that reproducible ? (they are two different things)
Can you see the df.pkl on the Models section of the Task (in the UI) ?
Does this mean that I need to create multiple ssh keys? 1 key for each user?
I think so
Use .git-credentials
This might also support multiple user/repo
What exactly do you get automatically on the "Installed Packages" (meaning the "my_package" line)?
Does StorageManager.upload and upload_artifact use the same methods?
Yes they both use StorageManager.upload
Is the only difference is task being async?
Two differences:
Upload being async Registering the artifact on the experiment. StorageManager will only upload, where as upload_artifact will make sure the file is registered as an artifact on the experiment, together with all of the artifacts properties.
PlainSquid19 No worries 🙂
btw: If you could see if the mangling of workings / script path happens with the latest trains, that will be appreciated, because if you were running the script in the first place from "stages/" then the trains should have caught it ...
RoundMosquito25 how is that possible ? could it be they are connected to a different server ?
I see, something like:from mystandalone import my_func_that_also_calls_task_init def task_factory(): task = Task.create(project="my_project", name="my_experiment", script="main_script.py", add_task_init_call=False) return task
if the pipeline and the my_func_that_also_calls_task_init
are in the same repo, this should actually work.
You can quickly test this pipeline with
` pipe = Pipelinecontroller()
pipe.add_step(preprocess, ...)
pipe.add_step(base_task_facto...
This is done in the background while accessing the cache, so it should not have any slowdown effect
I will create a minimal example.
Many thanks ReassuredTiger98 !
at that point we define a queue and the agents will take care of trainingÂ
This is my preferred way as well :)
I suspect it's the localhost - and the trains-agent is trying too hard to access the port, but for some reason does not report an error ...
Error 101 : Inconsistent data encountered in document: document=Output, field=model
Okay this point to a migration issue from 0.17 to 1.0
First try to upgrade to 1.0 then to 1.0.2
(I would also upgrade a single apiserver instance, once it is done, then you can spin the rest)
Make sense ?
My current experience is there is only print out in the console but no training graph
Yes Nvidia TLT needs to actually use tensorboard for clearml to catch it and display it.
I think that in the latest version they added that. TimelyPenguin76 might know more
DisturbedWorm66 it does, I think there is an example here:
https://github.com/allegroai/nvidia-clearml-integration/tree/main/tlt
But there is no need for 2FA for cloning repo
I've seen that the file location of a task is saved
What do you mean by that? is it the execution section "entry point" ?
I reached over 1M API calls in about one week using clearml-serving
Oh that makes sense now 🙂
If I remember correctly, adding an additional model to a signal clearml-serving instance should not actually change the number of API calls, they are mostly affected by the number of clearml-serving / containers and not in the number of models.
Hi DilapidatedDucks58
eg, we want max validation accuracy and all other metric values for the corresponding epoch
Is this the equivalent of nested sort ?
Wouldn't you get the requested behavior if you add all metric columns but sort based on the "accuracy" column ?
The point is, " leap"
is proeperly installed, this is the main issue. And although installed it is missing the ".so" ? what am I missing? what are you doing manually that does Not show in the log?
In other words how did you install it "menually" inside the docker when you mentioned it worked for you when running without the agent ?
I’d definitely prefer the ability to set a docker image/docker args/requirements config for the pipeline controller too
That makes sense, any chance you can open a github issue with feature request so that we do not forget ?
The current implementation will upload the result of the first component, and then the first thing the next component will do is download it.
If they are on the same machine, it should be cached when accessed the 2nd time
Wouldn’t it be more performant f...