Googling ES error: es_rejected_execution_exception and it seems that it is caused due to the excessive load on ES. Apparently the hardware cannot keep up with the pace that you're sending events batches. I would recommend working with smaller batches and checking whether the error goes away.
Hi @<1635088270469632000:profile|LividReindeer58> , I think the best ways are either using tags or metadata on the model itself. What do you think?
Hi @<1749602841338580992:profile|ImpressionableSparrow64> , the S3 configuration (Credentials) is always done on the client side. You don't need to configure anything server side. Also good that you configured the agent.
Not really, you can even point the files_server in clearml.conf
to s3. the files server is there so there would be some basic storage solution attached.
@<1523701295830011904:profile|CluelessFlamingo93> , just so I understand - you want to upload a string as the artifact?
JitteryCoyote63 , Hi 🙂
The new system takes a bit of time to get used to but it's really nice afterwards. You can right click an experiment and click on details to open the info of the experiment
JitteryCoyote63 what browser/os are you on?
Hi @<1668427977265778688:profile|BurlyOtter33> , you mean you're deploying the server itself on a mac?
Are you running a self hosted server?
My bad, should have asked you to go to Network as well to see if anything returns errors
Hi @<1545216070686609408:profile|EnthusiasticCow4> , go into settings -> configuration, there is an option to show hidden projects. This way you should be able to view pipelines in the experiments view
Hi @<1545216070686609408:profile|EnthusiasticCow4> , start_locally()
has the run_pipeline_steps_locally
parameter for exactly this 🙂
Interesting. I'll try to reproduce and see if it occurs to me as well 🙂
Looks decent, give it a try and update us it's working 🙂
Which version of clearml are you using?
Exact same usage?
VexedCat68 , just making sure, when talking about tags for models, you mean the tags viewable in the 'models' tab in UI, correct?
VexedCat68 , I was about to mention it myself. Maybe only keeping last few or last best checkpoints would be best in this case. I think SDK also supports this quite well 🙂
I think these are the relevant methods 🙂
https://clear.ml/docs/latest/docs/references/sdk/task#register_artifact
https://clear.ml/docs/latest/docs/references/sdk/task#unregister_artifact
And later you can use
https://clear.ml/docs/latest/docs/references/sdk/task#upload_artifact
When you have a finalized version of what you want
VexedCat68 , are you running the scheduler from the same machine? Where is the folder located?
VexedCat68 , I don't think such an example exists, but if you create one it would be great if you opened a PR for the open source 🙂
From the error you provided it looks like virtualenv isn't installed on the environment
I think the pipeline runs from start to end, starting when the first step starts
You can delete locally but it should not affect the remote data.
The data itself is stored in the fileserver. Whatever you do locally does not affect the remote storage, only when creating a new version the changes should be stored (Like when using 'clearml-data sync').
VexedCat68 , in the screenshot you provided it looks like the location is being printed. Did you check to see if something is there?
VexedCat68 Hi 🙂
Can you please provide snippets of how you're saving and retrieving the files?
VexedCat68 , you need to run the agent on a machine with gpu. The server doesn't need to have gpu 🙂
Regarding the queue:
Yes, if someone enqueues a task onto queue X, an agent listening to that queue will run the task.