And the quota is not cumulative , otherwise we’d run out of storage with the oldest accounts 😃
Hey @<1644147961996775424:profile|HurtStarfish47> , you can use S3 for debug images specifically , see here: https://clear.ml/docs/latest/docs/references/sdk/logger/#set_default_upload_destination but the metrics (everything you report like scalars, single values, histograms, and other plots) are stored in the backend. The fact that you are almost running out of storage could be because of either t...
Ah, I think I understand. To execute a pipeline remotely you need to use None pipe.start()
not task.execute_remotely
. Do note that you can run tasks remotely without exiting the current process/closing the notebook, (see here the exit_process
argument None ) but you won't be able to return any values from this task....
Hey @<1523701083040387072:profile|UnevenDolphin73> , sorry for late reply, I’m investigating now the issue that you mentioned that running a remote task with create_function_task
fails. I can’t quite reproduce it, can you please provide a complete runnable code snippet that fails like you just described
Hey @<1671689458606411776:profile|StormySeaturtle98> we do support something called "Model Design" previews, basically an architecture description of the model, a la Caffe protobufs. None For example we store this info automatically with Keras
Hey @<1547390438648844288:profile|ScaryJellyfish75> , can you provide the whole code for the pipeline, and also mention what clearml version are you using?
Hey @<1523701083040387072:profile|UnevenDolphin73> what you're building here sounds like a useful tool. Let me understand what you're trying to achieve here, please correct me if I'm wrong:
- You want to create a set of
Step
classes with which you can define pipelines, that will be executed either locally or remotely. - The pipeline execution is triggered from a notebook.
- The
steps
are predefined transformations, the user normally won't have to create their own steps
Did I get all...
Can you please check with the latest 1.10.2 SDK version if the checkpointing issue still happens. As for the example code which couldn't be reproduced, we're already working on it and should have a fix for it for the next minor SDK version
@<1637624992084529152:profile|GlamorousChimpanzee22> using localhost I'm assuming it's minio, is the s3 path you're trying to access something like this: None <some file or dir>
?
I ’ m afraid serializing an entire class won’t be possible , but create_function_task
will send the entire environment for remote execution , so you can still access your code
Sounds interesting. But my main concern with this kind of approach is if the surface of the (hparam1, hparam2, objective_fn_score)
is non-convex, using your method you may not reach the best set of hyperparameters. Maybe try using smarter search algorithms, like BOHB or TPE if you have a large search space, otherwise, you can try to do a few rounds of manual random search, reducing the search space around the region of most-likely best hyperparameters after every round.
As for why struct...
Hey @<1523704157695905792:profile|VivaciousBadger56> , I was playing around with the Pipelines a while ago, and managed to create one where I have a few steps in the begining creating and ClearML datasets like users_dataset
, sessions_dataset
, prefferences_dataset
, then I have a step which combines all 3, then an independent data quality step which runs in parallel with the model training. Also, if you want to have some fun, you can try to parametrize your pipelines and run HPO on...
That seems strange. Could you provide a short code snippet that reproduces your issue?
I think you can set the cuda version in the clearml.conf
, alternatively you can have the agent use a docker image with your required version of cuda instead of setting the environment directly on the machine
Hey @<1681836314334334976:profile|GrotesqueSeaturtle83> , yes, it is possible to do so, but you must configure the docker --entrypoint
argument (as part of the docker_arguments
) and the docker image of for said task. In general this isn't a recommended approach. Rather than that, prefer a setup where your task code invokes the functionalities defined in other scripts that are pre-baked in the image.
See docker args here:
[None](https://clear.ml/docs/latest/docs/references/sdk/task/...
Hello @<1533257278776414208:profile|SuperiorCockroach75> , thanks for asking. It’s actually unsupervised, because modern LLMs are all trained to predict next/missing words, which is an unsupervised method
Hey @<1523704757024198656:profile|MysteriousWalrus11> , given your use case, did you consider passing the path to the dataset? Like an address to an S3 bucket
Hello @<1523710243865890816:profile|QuaintPelican38> , could you try Dataset.get
ing an existent dataset and tell whether there are any errors or not?
Can you paste here the code of the pipeline that you're trying to run?
Hey @<1529271085315395584:profile|AmusedCat74> , I may be wrong , but I think you can’t attach a gpu to an e2 instance , it should be at least an n1, no?
I can't quite reproduce your issue. From the traceback it seems it has something to do with torch.load
. I tried both your code snippet and creating a PyTorch model and then loading it, neither led to this error.
Could you provide a code snippet that is more like the code that is causing the issue? Also, can you please tell what clearml version are you using, and what is the Model URL in the UI? You can use the same filters in UI as the ones you used for Model.query_models
to find th...
Hey @<1582542029752111104:profile|GorgeousWoodpecker69> can you please tell whether you're running this jupyter notebook as part of a repo or as a standalone file, and what command did you run to launch your clearml-agent?
And how many agents do you have listening on the “services“ queue?
If your git credentials are stored in the agent's clearml.conf
it means these are a HTTPS username/password pair. But you specified that the package should be downloaded via git ssh, for which I assume you don't have credentials in agent's environment. So it can't authenticate with SSH, and PIP doesn't know how to switch from git+ssh to git+https, because the downloading of the package is done by PIP not by clearml.
And there probably are auth errors if you scroll through the entire log ...
You can create a new dataset and specify the parent datasets as all the previous ones. Is that something that would work for you ?
This sounds like a use case for the enterprise version of ClearML. In it you can set read/write permissions. Publishing is considered a "write", so you can limit who can do it. Another thing that might be useful in your scenario is to try using "Reports", and connect the "approved" experiments info to a report and then publish it. Here's a short video introducing reports .
By the way, please note that if the experiment/report/whatever is publis...
You can try to add the force_download=True
flag to .get()
to ignore the locally cached content. Let me know if it helps.
This is the method you're looking for None . But make sure you have a model saved on disk before using it. And if you don't want the model to be deleted from disk after it, make sure to set auto_delete_file=False
Could you please run the misbehaving example, try to add a breakpoint in clearml/backend_interface/task/task.py
in Task.update_output_model
on the line with url = output_model.update_weights(
, and tell me what the value of model_path
is? In case you're using virtual environments, clearml library should be installed somewhere in <virtual env directory>/lib/python3.10/site-packages/clearml/