Reputation
Badges 1
2 × Eureka!It part of the design I think. It makes sense that if we want to keep track of changes, we always build on top of what we already have 🙂 I think of it like a commit: I'm adding files in a NEW commit, not in the old one.
I agree, I came across the same issue too. But your post helps make it clear, so hopefully it can be pushed! 🙂
That looks like a bug, would you mind copy pasting this into a github issue? 🙂 AgitatedDove14 is there something else this could be?
Ok, so I recreated your issue I think. Problem is, HPO was designed to handle more possible combinations of items than is reasonable to test. In this case though, there are only 11 possible parameter "combinations". But by default, ClearML sets the maximum amount of jobs much higher than that (check advanced settings in the wizard).
It seems like HPO doesn't check for duplicate experiments though, so that means it will keep spawning experiments (even though it might have executed the exact s...
Hi @<1523701062857396224:profile|AttractiveShrimp45> , I'm checking your issue myself. Do you see any duplicate experiments in the summary table?
Hi @<1546303293918023680:profile|MiniatureRobin9> !
Would you mind sending me a screenshot of the model page (incl the model path) both for the task you trained locally as well as the one you trained on the agent?
Wow awesome! Really nice find! Would you mind compiling your findings to a github issue, then we can help you search better :) this info is enough to get us going at least!
I'm not quite sure what you mean here? From the docs it seems like you should be able to simply send an HTTP request to the localhost url to get the metrics. Is this not working for you? Otherwise, all the metrics end up in Prometheus, so you can also query that instead or use something like Grafana to visualize it
Ah I see 😄 I have submitted a ClearML patch to Huggingface transformers: None
It is merged, but not in a release yet. Would you mind checking if it works if you install transformers from github? (aka the latest master version)
This looks to me like a permission issue on GCP side. Do your GCP credentials have the compute.images.useReadOnly
permission set? It looks like the worker needs that permission to be able to pull the images correctly 🙂
Hey! So several things here:
As per the plotly docs, you have to give which axis you want to format, in your example plotly can't know. If you look closely to their example, you'll see it's a nested dict, with the key being 'xaxis' Your numpy data has x values of 1 2 3, but your extra layout has values 1 3 4 which don't match. Plotly took the first element of each subarray to be the x value.
If we fix these 2 things, your example becomes:
` task.logger.report_line_plot('this_is_the_title',
...
Can you please post the result of running df -h
in this chat? Chances are quite high your actual machine does indeed have no more space left 🙂
Not exactly sure what is going wrong without an exact error or reproducible example.
However, passing around the dataset object is not ideal, because passing info from one step to another in a pipeline requires ClearML to pickle said object and I'm not exactly sure a Dataset obj is picklable.
Next to that, running get_local_copy() in the first step does not guarantee that you can access that data from the other step. Both might be executed in different docker containers or even on different...
Yes, you will indeed need to add all ensemble endpoints separately 🙂
AgitatedDove14 I was able to recreate the error. Simply by running Lavi's example on clearml==1.6.3rc1
in a fresh env. I don't know what is unique to the flow itself, but it does seem reproducible
RoundMosquito25 it is true that the TaskScheduler
requires a task_id
, but that does not mean you have to run the pipeline every time 🙂
When setting up, you indeed need to run the pipeline once, to get it into the system. But from that point on, you should be able to just use the task_scheduler on the pipeline ID. The scheduler should automatically clone the pipeline and enqueue it. It will basically use the 1 existing pipeline as a "template" for subsequent runs.
Hey @<1539780305588588544:profile|ConvolutedLeopard95> , unfortunately this is not built-in into the YOLOv8 tracker. Would you mind opening an issue on the YOLOv8 github page and atting me? (I'm thepycoder on github)
I can then follow up the progress on it, because it makes sense to expose this parameter through the yaml.
That said, to help you right now, please change [this line](https://github.com/ultralytics/ultralytics/blob/fe61018975182f4d7645681b4ecc09266939dbfb/ultralytics/yolo/uti...
Hi GrittyHawk31 ! ClearML is integrated with a bunch of frameworks from which it tries to automatically gather information. You can find a list here: https://clear.ml/docs/latest/docs/integrations/libraries
For example, if you're already reporting scalars to tensorboard, you won't have to add any clearml code, it will automatically be captured. The same will happen with e.g. LightGBM. Take a look at the example codes in the link to find what is automatically supported for your framework.
...
ExuberantBat52 The dataset alias thing giving you multiple prompts is still an issue I think, but it's on the backlog of our devs 😄
HomelyShells16 Thanks for the detailed write-up and minimal example. I'm running it now too
Here is an example of deploying an sklearn model using ClearML serving.
However, please note that sklearn-like models don't have input and output shapes in the same sense as deep learning models have. Setting the I/O shapes using the CLI is usually meant for GPU-based deep learning models that need to know the sizes for better GPU allocation. In the case of sklearn on CPU, all you have to do is set up your preprocess...
You're not the first one with this problem, so I think I'll ask the devs to maybe add it as a parameter for clearml-agent
in that way it will show up in the docs and you might have found it sooner. Do you think that would help?
Could you use tags for that? In that case you can easily filter on which group you're interested in, or do you have a more impactful UI change in mind to implement groups? 🙂
Now worries! Just so I understand fully though: you were already using the patch with success from my branch. Now that it has been merged into transformers main branch you installed it from there and that's when you started having issues with not saving models? Then installing transformers 4.21.3 fixes it (which should have the old clearml integration even before the patch?)
Yes, with docker auto-starting containers is def a thing 🙂 We set the containers to restart automatically (a reboot will do that too) for when the container crashes it will immediately restarts, let's say in a production environment.
So the best thing to do there is to use docker ps
to get all running containers and then kill them using docker kill <container_id>
. Chatgpt tells me this command should kill all currently running containers:docker rm -f $(docker ps -aq)
And I...
That's what happens in the background when you click "new run". A pipeline is simply a task in the background. You can find the task using querying and you can clone it too! It is places in a "hidden" folder called .pipelines
as a subfolder on your main project. Check out the settings, you can enable "show hidden folders"
I still have my tasks I ran remotely and they don't show any uncommitted changes. @<1540142651142049792:profile|BurlyHorse22> are you sure the remote machine is running transformers from the latest github branch, instead of from the package?
If it all looks fine, can you please install transformers from this repo (branch main) and rerun? It might be that not all my fixes came through
Hi @<1523701949617147904:profile|PricklyRaven28> sorry that this is happening. I tried to run your minimal example, but get a IndexError: Invalid key: 5872 is out of bounds for size 0
error. That said, I get the same error without the code running in a pipeline. There seems to be no difference between simply running the code and the pipeline (for me). Do you have an updated example, maybe also including getting a local copy of an artifact, so I can check?