Hi @<1716987924207112192:profile|CostlyOctopus40>
is opensearch supported in ClearML instead of Elasticsearch ? please shed some light on that
Long story short, maybe?! but this is not officially supported.
We only support elasticsearch, the opensearch fork is not officially supported and since we continue to use more advanced features of Elastic, it might be that the API will not be compatible in the future.
Out of curiosity, why are you using opensearch?
ContemplativeCockroach39 unfortunately No directly as part of clearml 😞
I can recommend the Nvidia triton serving (I'm hoping we will have the out-of-the-box integration soon)
mean while you can manually run it , see docs:
https://developer.nvidia.com/nvidia-triton-inference-server
docker here
https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver
Hi PunyGoose16 ,
next release includes it (eta after this weekend 😉 )
Hi @<1566596960691949568:profile|UpsetWalrus59>
All correct with the exception of " ...or 1GB Metric" this is a limit, since metrics (and meta data) is always stored on the clearml-server, so it is metered. There is also an API limit, basically anti abuse, which of course resets every month, but if you are running tens of experiments at the same time you will hit this limit. Make sense ?
But I'm sure there is a cleaner way to proceed.
Maybe ?!path = task.get_output_destination().replace('file://', '', 1)
Hi RobustGoldfish9 Kudos on the mount, and my apologies for forgetting to mention it.
You are absolutely right, I'll make sure we have it in the documentation, there is no way to know that obscure env variable 🙂
HurtWoodpecker30 currently in the open source only AWS is supported, I know the SaaS pro version supports it (I'm assuming enterprise as well).
You can however manually spin an instance on GCP and launch an agent on the instance (like you would on any machine)
Hi LudicrousParrot69
Not sure I follow, is this pyfunc running remotely ?
Or are you looking for interfacing with previously executed Tasks ?
No need, it should auto close it if you started it with Task.init (or the agent executed it)
Maybe you should make
naming_function
as public variable in
SearchStrategy
class or allow changing it in
HyperParameterOptimizer
class?
I like this idea, let's do that
Just making sure, you hit the 1024 character limit on S3 path?
If this is the case we should also fix the "artifact naming" to take that into account (it already does and has a limit, see here:
https://github.com/allegroai/clearml/blob/24464b7c1019f7a7b3149ecb80a379...
FlutteringWorm14 Can you verify that even with the clearml.conf it has no effect?
There was an issue in some versions where seeborn plots were blank. Is that the case?
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3Then run againclearml-task ...
Feel free to add to the UI request list:
https://github.com/allegroai/trains/issues/81
Hi RoughHedgehog31
I'm assuming your git diff is just too big to be stored as is (probably some binary files)
it should not really have any effect on the execution, it just means the clearml-agent will not be able to reproduce the uncommitted changes.
Make sense ?
Please attach the log 🙂
This works.
great!
So it is still in master and should be included in 1.0.5?
correct, RC will be released soon with this fix included
To store all the debug samples, also it can store all the models (if you configure the output_uri=' http://file_server_here:8081 ') Yes: instead of the file server have 's3://<ip_of_minio>:9000/bucket' make sure you add the credentials for the minio in the trains.conf Yes, basically once you have the creendtials in the trains.conf, you could do StorageManager.get_local_copy('s3://<minio>:9000/bucket/file') (also upload of course 🙂 )
RoundMosquito25 are you using clearml-agent daemon --stop or are you killing them ?
killing them basically means you loose them in the UI when they timeout, the backend does not see them for 10min so it assumes they died, when you call clearml-agent --stop they will unregister themselves and disappear immortally
TrickySheep9
Is there a way to see a roadmap on such things
? (edited)
Hmm I think we have some internal one, I have to admit these things change priority all the time (so it is hard to put an actual date on them).
Generally speaking, pipelines with functions should be out in a week or so, TaskScheduler + Task Triggers should be out at about the same time.
UI for creating pipelines directly from the web app is in the working, but I do not have a specific ETA on that
@<1533619716533260288:profile|SmallPigeon24> , failed task should not actually be reused (i.e. cached), are you saying a failed Task is being reused? or are you saying that you want to "invalidate" the cache in the execution but still leave the Task as completed ?
CooperativeFox72 we are aware of Pool throwing exception that causes things to hang. Fix will be deployed in 0.16 (due to be released tomorrow).
Do you have a code to reproduce it, so I can verify the fix solves the issue?
JitteryCoyote63 not yet 😞
I actually wonder how poplar https://github.com/pallets/click is ?
agent.cuda_driver_version = ...
agent.cuda_runtime_version = ...
Interesting idea! (I assume for reporting only, not configuration)
... The agent mentionned used output from nvcc (2) ...
The dependencies I shared are not how the agent works, but how Nvidia CUDA works 🙂
regrading the cuda check with nvcc , I'm not saying this is a perfect solution, I just mentioned that this is how this is currently done.
I'm actually not sure if there is an easy way to get it from nvid...
This is strange, let me see if we can get around it, because I'm sure it worked 🙂
Is it possible to do something so that the change of the server address is supported and the pictures are pulled up on the new server from the new server?
The link itself (full link) is stored inside the server. Can I assume the access is IP based not host based (i.e. dns) ?