
Reputation
Badges 1
2 Γ Eureka!Hi @<1546303293918023680:profile|MiniatureRobin9> !
Would you mind sending me a screenshot of the model page (incl the model path) both for the task you trained locally as well as the one you trained on the agent?
With the screenshots above, the locally run experiment (left), does it have an http url for the model url field? The one you whited out?
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?
Hi! Have you tried adding custom metrics to the experiment table itself? You can add any scalar as a column in the experiment list, it does not have color formatting, but it might be more like what you want in contrast to the compare functionality π
Yes you can! The filter syntax can be quite confusing, but for me it helps to print task.__
dict__
on an existing task object to see what options are available. You can get values in a nested dict by appending them into a string with a .
Example code:
` from clearml import Task
task = Task.get_task(task_id="17cbcce8976c467d995ab65a6f852c7e")
print(task.dict)
list_of_tasks = Task.query_tasks(task_filter={
"all": dict(fields=['hyperparams.General.epochs.value'], p...
Thank you so much! In the meantime, I check once more and the closest I could get was using report_single_value()
. It forces you to report each an every row though, but the comparison looks a little better this way. No color coding yet, but maybe it can already help you a little π
Cool! π Yeah, that makes sense.
So (just brainstorming here) imagine you have your dataset with all samples inside. Every time N new samples arrive they're just added to the larger dataset in an incremental way (with the 3 lines I sent earlier).
So imagine if we could query/filter that large dataset to only include a certain datetime range. That range filter is then stored as hyperparameter too, so in that case, you could easily rerun the same training task multiple times, on differe...
Hi UnevenBee3 , the OptimizerOptuna class should already be able to prune any bad tasks, provided the model itself is iteration-based (so no SVM etc. need iterations for early stopping). You can read our blogpost here: https://clear.ml/blog/how-to-do-hyperparameter-optimization-better/
Hi NuttyCamel41 !
Your suspicion is correct, there should be no need to specify the config.pbtxt
manually, normally this file is made automatically using the information you provide using the command line.
It might be somehow silently failing to parse your CLI input to correctly build the config.pbtxt
. One difference I see immediately is that you opted for "[1, 64]"
notation instead of the 1 64
notation from the example. Might be worth a try to change the input for...
I can see 2 kinds of errors:Error: Failed to initialize NVML
and Unable to allocate pinned system memory, pinned memory pool will not be available: CUDA driver version is insufficient for CUDA runtime version
These 2 lines make me think something went wrong with the GPU itself. Chances are you won't be able to run nvidia-smi
this looks like a non-clearml issue π It might be that triton hogs the GPU memory if not properly closed down (doubl ctrl-c). It says the driver ver...
It part of the design I think. It makes sense that if we want to keep track of changes, we always build on top of what we already have π I think of it like a commit: I'm adding files in a NEW commit, not in the old one.
So you train the model only on those N preprocessed data points then? Never combined with the previous datapoints before N?
For the record, this is a minimal reproducible example:
Local folder structure:
` βββ remove_folder
β βββ batch_0
β β βββ file_0_0.txt
β β βββ file_0_1.txt
β β βββ file_0_2.txt
β β βββ file_0_3.txt
β β βββ file_0_4.txt
β β βββ file_0_5.txt
β β βββ file_0_6.txt
β β βββ file_0_7.txt
β β βββ file_0_8.txt
β β βββ file_0_9.txt
β βββ batch_1
β βββ file_1_0.txt
β βββ file_1_1.txt
β βββ file_1_2.txt
β βββ file_1_3.txt
β βββ fi...
Hi CurvedHedgehog15 , so my previous reply does assume you have reported a scalar for each individual FAR level. Then you can add individual levels as shown in the gif. But like you siad, that might actually cause you to loose your overview in the scalars tab.
So I don't think there's an immediate way to do this in ClearML right now, but would you mind opening an issue on github for it? It might be interesting to add it to the tool?
Hi @<1547028116780617728:profile|TimelyRabbit96> Awesome that you managed to get it working!
Yes, you will indeed need to add all ensemble endpoints separately π
Hi there! There are several services who need persistent storage, check here for an overview diagram.
If I'm not mistaken, there's the fileserver, elastic, mongo and redis. All info is scattered over these (e.g. model files on fileserver, logs on elastic) so there is no one server holding everything.
I'm not a k8s expert, but I think that even a dynamic PVC should not delete itself. Just to be sure though, you can indee...
Hi Jax! We have a blogpost explaining how to use it almost ready to go. I'll ping you here when its out.
In the meantime you can check out the https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tao/resources/tao-getting-started of TAO. Download the zipfile with examples and under notebooks>tao_launcher_starter_kit>detectnet_v2
you'll find a notebook with an example on how to use the integration.
It depends on how complex your configuration is, but if config elements are all that will change between versions (i.e. not the code itself) then you could consider using parameter overrides.
A ClearML Task can have a number of "hyperparameters" attached to it. But once that task is cloned and in draft mode, one can EDIT these parameters and change them. If then the task is queued, the new parameters will be injected into the code itself.
A pipeline is no different, it can have pipeline par...
Hey @<1539780305588588544:profile|ConvolutedLeopard95> , unfortunately this is not built-in into the YOLOv8 tracker. Would you mind opening an issue on the YOLOv8 github page and atting me? (I'm thepycoder on github)
I can then follow up the progress on it, because it makes sense to expose this parameter through the yaml.
That said, to help you right now, please change [this line](https://github.com/ultralytics/ultralytics/blob/fe61018975182f4d7645681b4ecc09266939dbfb/ultralytics/yolo/uti...
Does it help to also run docker login in the init bash script?
You should be able to access your AWS credentials from the environment (the agent will inject them based on your config)
Do you have a screenshot of what happens? Have you checked the console when pressing f12?
Thanks! I know that you posted these locations before in text, I just wanted to make sure that they are the ones I was thinking. It seems like the model isn't properly uploaded to the clearml server. Instead, it's saving only the local path to the model file.
Normally that's what the output_uri=True
in the Task.init(...)
call is for, but it seems there is a bug that's not uploading the model.
Would you mind testing out [manual model uploading](https://clear.ml/docs/latest/docs/clea...
Sure! This is an example of running a custom model. It basically boils down to defining a preprocess, process and postprocess
function. Inside the process
function can be anything, including just a basic call to huggingface to run inference π
I have not tested this myself mind you, but I see no reason why it wouldn't work!
In fact, I think even Triton itself supports running on CPU these days, so you still ...
Sorry, I jumped the gun before I fully understood your question π So with simple docker compose file, you mean you don't want to use docker-compose-triton.yaml
file and so want to run the huggingface model on CPU instead of Triton?
Or do you want to know if the general docker compose version is able to handle a huggingface model?
That wasn't my intention! Not a dumb question, just a logical one π
Unfortunately, ClearML HPO does not "know" what is inside the task it is optimizing. It is like that by design, so that you can run HPO with no code changes inside the experiment. That said, this also limits us in not being able to "smartly" optimize.
However, is there a way you could use caching within your code itself? Such as using functools' LRU cache? This is built-in in python and will cache function return values if it's ever called again with the same input arguments.
There also see...
Hey @<1541592213111181312:profile|PleasantCoral12> thanks for doing the profiling! This looks pretty normal to me. Although 37 seconds for a dataset.get is definitely too much. I just checked and for me it takes 3.7 seconds. Mind you the .get()
method doesn't actually download the data, so the dataset size is irrelevant here.
But the slowdowns do seem to only occur when doing api requests. Possible next steps could be:
- Send me your username and email address (maybe dm if you don't wa...