For on-premise deployment with premium features we have the enterprise plan š
If your git credentials are stored in the agent's clearml.conf
it means these are a HTTPS username/password pair. But you specified that the package should be downloaded via git ssh, for which I assume you don't have credentials in agent's environment. So it can't authenticate with SSH, and PIP doesn't know how to switch from git+ssh to git+https, because the downloading of the package is done by PIP not by clearml.
And there probably are auth errors if you scroll through the entire log ...
And how many agents do you have listening on the āservicesā queue?
Can you update the clearml version to latest (1.11.1) and see whether the issue is fixed?
Hello @<1523710243865890816:profile|QuaintPelican38> , could you try Dataset.get
ing an existent dataset and tell whether there are any errors or not?
Hey @<1681836314334334976:profile|GrotesqueSeaturtle83> , yes, it is possible to do so, but you must configure the docker --entrypoint
argument (as part of the docker_arguments
) and the docker image of for said task. In general this isn't a recommended approach. Rather than that, prefer a setup where your task code invokes the functionalities defined in other scripts that are pre-baked in the image.
See docker args here:
[None](https://clear.ml/docs/latest/docs/references/sdk/task/...
Glad I could be of help
About the first question - yes, it will use the destination URI you set.
About the second point - did you archive or properly delete the experiments?
Hey @<1523705721235968000:profile|GrittyStarfish67> , we have just released 1.12.1 with a fix for this issue
The issue may be related to the fact that right now we have some edge cases when working with lightning >= 2.0, we should have better support in the upcoming release
Hey @<1654294828365647872:profile|GorgeousShrimp11> can you abort all pending experiments that wait to be fetched from this queue and try again ? Off the top of my head it could be that the clearml-agent canāt pull the custom docker image. In general you should treat the docker images not as step definitions but only as the environment , hence setting the entrypoint is not necessary
Hey @<1671689458606411776:profile|StormySeaturtle98> we do support something called "Model Design" previews, basically an architecture description of the model, a la Caffe protobufs. None For example we store this info automatically with Keras
Do you mean that you want your published experiments to be either āapprovedā or ānot approvedā based on the presence of the attachments you mentioned ?
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
Can you please attach the full traceback here?
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
Hey @<1639799308809146368:profile|TritePigeon86> , given that you want to retry on connection error, wouldn't it be easier to use retry_on_failure
from PipelineController
/ PipelineDecorator.pipeline
None ?
Which gives me an idea. Could you please remove the entrypoint from the docker image altogether and try again ?
Overriding the entrypoint in the image can lead to docker run/docker exec failing to work properly , because instead of a shell it will use your entrypoint to run everything
This sounds like a use case for the enterprise version of ClearML. In it you can set read/write permissions. Publishing is considered a "write", so you can limit who can do it. Another thing that might be useful in your scenario is to try using "Reports", and connect the "approved" experiments info to a report and then publish it. Here's a short video introducing reports .
By the way, please note that if the experiment/report/whatever is publis...
Hey @<1529271085315395584:profile|AmusedCat74> , I may be wrong , but I think you canāt attach a gpu to an e2 instance , it should be at least an n1, no?
Sounds interesting. But my main concern with this kind of approach is if the surface of the (hparam1, hparam2, objective_fn_score)
is non-convex, using your method you may not reach the best set of hyperparameters. Maybe try using smarter search algorithms, like BOHB or TPE if you have a large search space, otherwise, you can try to do a few rounds of manual random search, reducing the search space around the region of most-likely best hyperparameters after every round.
As for why struct...
Hey @<1678212417663799296:profile|JitteryOwl13> , just to make sure I understand, you want to make your imports inside the pipeline step function, and you're asking whether this will work correctly?
If so, then the answer is yes, it will work fine if you move the imports inside the pipeline step function
Ah, I see now. There are a couple of ways to achieve this.
- You can enforce that the pipeline steps execute within a predefined docker image that has all these submodules - this is not very flexible, but doesn't require your clearml-agents to have access to your Git repository
- You can enforce that the pipeline steps execute within a predefined git repository, where you have all the code for these submodules - this is more flexible than option 1, but will require clearml-agents to have acce...
Hey @<1603198163143888896:profile|LonelyKangaroo55> If you only use the summary writer, does it report properly to both TB and ClearML?
Wait, my config looks a bit different, what clearml package version are you using?
Yes, that is correct. Btw, not it looks more like my clearml.conf
Hey @<1577468626967990272:profile|PerplexedDolphin99> , yes, this method call will help you limit the number of files you have in your cache, but not the total size of your cache. To be able to control the size, Iād recommend checking the ~/clearml.conf
file in the sdk.storage.cache
section
Thanks for pointing this out, we will need to update our documentation. Still, if you manually inspect the ~/clearml.conf
file you will see the available configurations