![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/SlimyElephant79.png)
Reputation
Badges 1
30 × Eureka!I'd like to understand this as well. I moved my data & model versioning to AWS S3. So, can I get rid of the fileserver? Can I use Cloudwatch to work with logs rather than (what I assume is being done by) Elasticsearch?
Hi @<1523701070390366208:profile|CostlyOstrich36> , sorry to tag you directly. Is this something that you have clarity on? Our team currently has an exploration where we are trying to see how we can optimize the pipeline we've already defined for the edge device.
DistinctShark58 I don't think Hyper Datasets is available in self-hosted ClearML community. Correct me if I am wrong.
- Yes, in this scenario both the Agent & the code were present in the same machine
- The queue being assigned to default was something we had changed after some debugging, yes.
- We verified from the ClearML UI that the queue that the task is being assigned to wasn't default.
- The pipeline only worked through remote execution when the entrypoint script was in the root of the git repo (which kept getting picked up as the working directory)
@<1523701087100473344:profile|SuccessfulKoala55> for training experiments, I do see the scalars show the learning rate etc being picked up from the model training. They update real-time along with the scalars related to GPU & CPU monitoring that I showed above. However, the console doesn't show any sort of logs and just remains a blank screen
Other information about the experiments are visible, yes.
I do see plots, yes.
Scalars, sometimes I do and sometimes I don't. I am not reporting any scalars manually in my experiments. Whenever I do see scalars, it has been the monitoring values of CPU & GPU like below.
Hey @<1523701087100473344:profile|SuccessfulKoala55> that was actually my first approach. I used the command clearml-agent daemon --stop <worker-id>
However, this is what I saw:
For the people who are going to visit this thread later, what I found works is basically providing all the defining parameters of the clearml worker and then passing the arg --stop in the command.
Ex: sudo clearml-agent daemon --detached --queue gpu_default gpu_priority --gpus 0 --docker --stop
What didn't work for me: sudo clearml-agent daemon --stop <worker-id>
@<1523701205467926528:profile|AgitatedDove14> this worked and gave me what I exactly needed. Thanks.
But I think I want to clarify something here after re-reading your message. My question is whether the API server can read credentials or weblogin password, username from something like Secret Manager rather than using a local file like secure.conf or apiserver.conf.
hey CostlyOstrich36 I have shared the log file that prints out device IDs, credentials, endpoints, etc on personal chat to avoid sharing anything accidentally that I might not have identified as a security problem. I have retracted most of those things, but just wanted to be sure.
Hi @<1532170113770328064:profile|DelightfulElephant81> , there is a self-hosted version of ClearML that has access to all of the community edition features. You can use docker-compose/Helm to run it on your server. If you want to self-host the server while needing paid features like HyperDatasets, RBAC, ClearML Serving etc, you can choose to have the enterprise version.
See this: None
@<1523701087100473344:profile|SuccessfulKoala55> What would be the recommended command to delete/unregister an agent? None of the CLI commands seemed relevant to this operation other than the clearml-agent daemon --stop command that seems to only be stopping running instances of the agent and not unregistering it.
SuccessfulKoala55 Thanks for letting me know. I'll immediately give this a try.
Is there a way I can auto-increment the iteration value instead of specifying an integer? SuccessfulKoala55
Thanks @<1523701070390366208:profile|CostlyOstrich36> happy to contribute to the community even if it is bug reports 🙂
Hi @<1523701087100473344:profile|SuccessfulKoala55> this is what I see in the Settings page
- WebApp: 1.8.0-254 • Server: 1.8.0-254 • API: 2.22
What is weird is that (I think?) it worked Friday when I used it for another model object. Not sure what is happening here.
Hey CostlyOstrich36 , do you want me to share a specific part of the execution section? There is quite a bit of content hidden under scroll.
Thanks @<1523701070390366208:profile|CostlyOstrich36> . We were only changing this for each output_uri, but this makes our lives a bit easier. Thanks :)
Hi SuccessfulKoala55 I want to trigger a retraining pipeline every set cadence of a few months.
Great start! I'm not sure if I'll be able to commit time during the hackathon, but I'd like to help with this extension once I have a leaner period at work. I hope this will be open-source & accept contributions @<1541954607595393024:profile|BattyCrocodile47>
Sure @<1523701435869433856:profile|SmugDolphin23> . Thanks for the quick response.
Let me know if you need more information from me. I can add that information in a git issue if you want to track it that way.
Hi @<1523701087100473344:profile|SuccessfulKoala55> , I'd like the Secret Manager to store the creds of the users & ClearML to utilize this instead of a file in a local directory. Or, we could also work with some options where the creds are stored within one of the databases that ClearML server already utilizes.
The AWS Secret Manager data would be within the accounts where we have set up the ClearML server in AMI/EC2 instances.
Hi @<1523701070390366208:profile|CostlyOstrich36> , I took a look at the CLI & SDK documentation for the Dataset class, but it didn't look like I had an option to control the preview. Am I looking at the wrong place? Apologies if I missed something from the documentation.
For other people's reference: None
@<1523701070390366208:profile|CostlyOstrich36> any thoughts?
It says that the environment setup is completed, and it has cloned the project as well, but why is it facing an issue here?