Reputation
Badges 1
25 × Eureka!But from your other answer, I think I'm understanding that you
can
have multiple agents on a single instance listening to the same queue.
Correct
So we could maybe initialize 4 instances of the agent on a single EC2 instance which would allow us to handle a higher volume of small batches concurrently without tying up the entire instance.
Correct (that said I do not understand how come a single Task does not utilize the CPU, I was under the impression it is run...
Correct.
It starts with the initial script (entry point), if it is self contained (i.e. does not interact with the rest of the repo) it will only analyze it, otherwise it will analyze the entire repo code.
SoggyBeetle95 maybe it makes sense to configure the agent with an access-all credentials? Wdyt
BattyLion34 Okay, I'll try to see if we can solve the multi-instance issue on Windows (because obviously it should be automatic)
GiganticTurtle0 is it in the same repository ?
If it is it should have detected the fact that it needs to analyze the entire repository (not just the standalone script, and then discover tensorflow)
Is there an option to do this from a pipeline, from within theΒ
add_step
Β method? Can you link a reference to cloning and editing a task programmatically?
Hmm, I think there is an open GitHub issue requesting a similar ability , let me check on the progress ...
nope, it works well for the pipeline when not I don't choose to continue_pipeline
Could you send the full log please?
EnviousPanda91 so which frame works are being missed? Is it a request to support new framework or are you saying there is a bug somewhere?
Hi PompousBeetle71 I'm with SteadyFox10 on this one. Unless you choose a file name based on epoch or step , you are literally overwriting the model file, which Trains will reflect. If you use epoch in the filename you will end up with all your models logged by Trains. BTW we are actively working on integration with pytorch ignite, so if you have any suggestions now is the time :)
I want in my CI tests to reproduce a run in an agent
you mean to run it on the CI machine ?
because the env changes and some things break in agents and not locally
That should not happen, no? Maybe there is a bug that needs fixing on clearml-agent ?
BTW:str('\.') Out[4]: '\\.' str(('\.', )) Out[5]: "('\\\\.',)"
This is just python str casting
I'm wondering what happens if i were to host the instance and one of these were to go down from time to time in production, as the deployments provided by the helm chart are not redundant.
Long story short, it will break the clearml-server, please do not take them down, if you do need to do that, also take down the clearml-server. The python clients will wait until it is up again, so no session would be destroyed
YummyFish22 can you point to the huggingface example you are using?
correct, you can pass it as keys on the "task_filter" argument, e.g:Task.get_tasks(..., task_filter={'status': ['failed']})
Registering some metadata as a model doesnβt feel correct to me.
Yes I'm with you π
BTW what kind of meta-data would need versions during the life time of a Task ?
PompousBeetle71 let me know if it solves your problem
You need to adjust it to your setup , specifically change the queue name to one you have. Does that make sense ?
Yey! BTW: what the setup you are running it with ? does it include "manual" tasks? Do you also report on completed experiments (not just failed ones)? Do you filter by iteration numbers?
Because of that, I cannot create a task in this project programmatically locally because it tries to access the bucket and fails. And there is no easy way to change the default output location (not in the web UI, not in the sdk)
JitteryCoyote63 hmm that is a pickle ...
let me check the code ...
ETA for the next release is end of the month/early March, it is planned to include many other improvements π
Hmm could you try to upload to your files server (not the S3)
Maybe some credentials error ?
What I mean is that I don't need to have cudatoolkit installed in the current conda env, right?
Wait, are you using conda as package manager ?
EDIT: meaning configured in trains.conf as package manager
Hi @<1523710243865890816:profile|QuaintPelican38>
What's the clearml version ?
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
Hmm I guess doable π could you open a github issue with feature request ?
If we have enough support it will bump it in the priority π€
Just wanted to know how many people are actively working on clearml.
probably 30+ π
ReassuredTiger98 are you afraid from lack of support? or are you offering some (it is always welcomed) ?
Hi SubstantialElk6
If you are using boto to acess anything that is Not AWS S3 you have to add both address and port, and make sure you configure the "security" flag.
See example in clearml.conf :
https://github.com/allegroai/clearml-agent/blob/176b4a4cdec9c4303a946a82e22a579ae22c3355/docs/clearml.conf#L247
` aws {
s3 {
{
host: "my-minio-host:9000"
key: "12345678"
secret: "12345678"
...
For that I need more info, what exactly do you need (or trying to achieve) ?
New version will contain much more advanced search (including all the task fields)
are there any more fields in this function with partial matching? for example project? tags?
Yes they can all be filtered (basically everything you see in the UI)
notice: tags are strings (you can provide list of tags), project is an ID of the project
(Use Task.get_project_id, I think)