This sounds like you don't have clearml installed in the ubuntu container. Either this, or your clearml.conf
in the container is not pointing to the server, as a result all information is missing.
I'd rather suggest you change the approach, and run a clearml-agent
setup with docker
and when you want to run YOLOv5 training you actually execute it remotely on the queue that the agent is listening to
Wait, my config looks a bit different, what clearml package version are you using?
Hey Pawel, thanks for opening the PR on Ultralytics’ side. The full support should come from them, so if it’s missing for YOLOv8 it means they didn’t enable it. Still , you can try clearml-task
for auto-logging support in case of remote execution .
Also, I’d say you could easily have the possibility to use a ClearML dataset id as input to YOLOv8 with a few lines of code by basically downloading/ get
ing the dataset by id yourself and passing the path to it as input to the ultralytics...
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
Can you please attach the code for the pipeline?
And how many agents do you have listening on the “services“ queue?
It happens due to an internal use of Dataset.get
, the larger the dataset, the more verbose it will be. We’ll fix this in the upcoming releases
I see you want to use the services
queue for both the pipeline controller and pipeline steps, but you have only one worker/agent listening to this queue. In this case you need at least 2 agents listening to the services queue. Try spawning an additional agent that listens to this queue and let me know how it goes .
Hey @<1554275802437128192:profile|CumbersomeBee33> , aborted usually means that someone manually stopped the pipeline or one of it's experiments. Can you provide us with the code you used to run it?
That's not that much. You can use the AWS autoscaler and provision a spot g4dn GPU instance with a bit more disk. This should cost you less than 50 cents an hour
Hey @<1681836303299121152:profile|RoundElk14> , it seems you are using a self-hosted ClearML server. This error you're getting happens because your email is not configured in the server. Ask your admin to perform the following steps:
- [The admin] Go to Settings > Users & Groups > Users and click on "+ Add User" where they will be prompted to specify the user's email
- [The user] Once the admin confirms that they did step 1, the user should first Sign In with their email to the server
- [The...
Are referring to the clearml-serving
project ?
Can you please attach the full traceback here?
Hey @<1523701066867150848:profile|JitteryCoyote63> , could you please open a GH issue on our repo too, so that we can more effectively track this issue. We are working on it now btw
Can you paste here the code of the pipeline that you're trying to run?
Hey @<1545216070686609408:profile|EnthusiasticCow4> , for requirements pointing to packages in git repositories you need to make sure that the environment the agent is running in has the valid credentials to access the repo. In your case ( git+ssh
) it means you need to have a pair of ssh keys, and the public key should be registered with the repo.
To link a dataset to a task you need to pass the alias=
parameter to the Dataset.get
. See here: https://clear.ml/docs/latest/docs/clearml_data/clearml_data_sdk#accessing-datasets
The line before the last in your code snippet above. pipe.start_locally
.
Yes, you can do that. But it may make it harder to identify the task later on
Hello @<1523710243865890816:profile|QuaintPelican38> , could you try Dataset.get
ing an existent dataset and tell whether there are any errors or not?
I think you can set the cuda version in the clearml.conf
, alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine
Hey @<1661904968040321024:profile|SpotlessOwl43> that's a great question!
how the metric should be saved, via report_single_value?
That's correct
what should I enter into the title and series fields in Project Dashboard?
The title should be "Summary" and series is the name of the single value you reported
Yes, works with GCP too
To my knowledge, no. You'd have to create your own front-end and use the model served with clearml-serving via an API
If your git credentials are stored in the agent's clearml.conf
it means these are a HTTPS username/password pair. But you specified that the package should be downloaded via git ssh, for which I assume you don't have credentials in agent's environment. So it can't authenticate with SSH, and PIP doesn't know how to switch from git+ssh to git+https, because the downloading of the package is done by PIP not by clearml.
And there probably are auth errors if you scroll through the entire log ...
Hey @<1639074542859063296:profile|StunningSwallow12> what exactly do you mean by "training in production"? Maybe you can elaborate what kind of models too.
ClearML in general assigns a unique Model ID to each model, but if you need some other way of versioning, we have support for custom tags, and you can apply those programmatically on the model
Thanks for pointing this out, we will need to update our documentation. Still, if you manually inspect the ~/clearml.conf
file you will see the available configurations
Hey @<1547390438648844288:profile|ScaryJellyfish75> , can you provide the whole code for the pipeline, and also mention what clearml version are you using?