I think you can set the cuda version in the clearml.conf
, alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine
Hey @<1569858449813016576:profile|JumpyRaven4> , about your first point, what exactly is the question?
About your second point - you can try to manually save the final model and give it a proper file name, that way we will show it in the UI with the name you provided. Make sure to use xgboost.save_model
and not raw pickle.
For your final question , given that your models have customised code, I can suggest trying to use clearml.OutputModel
which will register the file you provide ...
Do you mean that you want your published experiments to be either “approved” or “not approved“ based on the presence of the attachments you mentioned ?
What happens if you comment or remove the pipe.set_default_execution_queue('default')
and use run_locally
instead of start_locally
?
Because in the current setup, you are basically asking to run the pipeline controller task locally, while the rest of the steps need to run on an agent machine. If you do the changes I suggested above, you will be able to run everything on your local machine.
That is not specific enough. Can you show the code? And ideally also the console log of the pipeline
Is this a jupyter notebook or something ? Can you download it properly as either a .ipynb or .py file?
Hey, yes, the reason for this issue seems to be our currently limited support for lightning 2.0. We will improve the support in the following releases. Right now one way to circumvent this issue, that I can recommend, is to use torch.save
if possible, because we fully support automatic model capture on torch.save
calls.
Hey @<1523701066867150848:profile|JitteryCoyote63> , could you please open a GH issue on our repo too, so that we can more effectively track this issue. We are working on it now btw
Hey @<1535069219354316800:profile|PerplexedRaccoon19> , yes it does. Take a look at this example, and let me know if there are any more questions: None
Yes, you can do that. But it may make it harder to identify the task later on
Hello @<1523710243865890816:profile|QuaintPelican38> , could you try Dataset.get
ing an existent dataset and tell whether there are any errors or not?
You can create a new dataset and specify the parent datasets as all the previous ones. Is that something that would work for you ?
Hey @<1577468626967990272:profile|PerplexedDolphin99> , yes, this method call will help you limit the number of files you have in your cache, but not the total size of your cache. To be able to control the size, I’d recommend checking the ~/clearml.conf
file in the sdk.storage.cache
section
Thanks for pointing this out, we will need to update our documentation. Still, if you manually inspect the ~/clearml.conf
file you will see the available configurations
Wait, my config looks a bit different, what clearml package version are you using?
Yes, that is correct. Btw, not it looks more like my clearml.conf
Hey Pawel, thanks for opening the PR on Ultralytics’ side. The full support should come from them, so if it’s missing for YOLOv8 it means they didn’t enable it. Still , you can try clearml-task
for auto-logging support in case of remote execution .
Also, I’d say you could easily have the possibility to use a ClearML dataset id as input to YOLOv8 with a few lines of code by basically downloading/ get
ing the dataset by id yourself and passing the path to it as input to the ultralytics...
Hey @<1547390444877385728:profile|ThickSnake12> , how exactly do you access the artifact next time? Can you provide a code sample?
Hey @<1681836314334334976:profile|GrotesqueSeaturtle83> , yes, it is possible to do so, but you must configure the docker --entrypoint
argument (as part of the docker_arguments
) and the docker image of for said task. In general this isn't a recommended approach. Rather than that, prefer a setup where your task code invokes the functionalities defined in other scripts that are pre-baked in the image.
See docker args here:
[None](https://clear.ml/docs/latest/docs/references/sdk/task/...
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Hey @<1526734437587357696:profile|ShaggySquirrel23> , what version of the clearml-agent are you using? Also, if I were you I’d check how much free disk there’s on the machine running the agents
@<1637624992084529152:profile|GlamorousChimpanzee22> using localhost I'm assuming it's minio, is the s3 path you're trying to access something like this: None <some file or dir>
?
Hey Yasir, to use tensorflow prefetch your data needs to be (1) chunked and (2) stored on some server/bucket/network-attached FS. If both conditions are not satisfied, TF prefetch won't help you.
How large is the dataset we're talking about?
Yes, works with GCP too
That's not that much. You can use the AWS autoscaler and provision a spot g4dn GPU instance with a bit more disk. This should cost you less than 50 cents an hour
Hey @<1546303293918023680:profile|MiniatureRobin9> , to help narrow down the problem, could you try to manually download None and open it with pickle
?
Also, is your agent running on the same machine as your server and the example pipeline code? And what Python version are you using for all three components? Because I see there's a warning `could not locate requested Python version 3.11, reverting t...
Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?
Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything
This is the method you're looking for None . But make sure you have a model saved on disk before using it. And if you don't want the model to be deleted from disk after it, make sure to set auto_delete_file=False
To copy the artifacts please refer to docs here: None