Reputation
Badges 1
662 × Eureka!In the Profile section, yes, they are well defined (bucket, secret, key, and endpoint)
@<1523701070390366208:profile|CostlyOstrich36> I added None btw
Always great to find a bug! I'll make relevant SDK updates then.
Feels like we've been over this π Has there been new developments perhaps?
It's essentially that this - https://clear.ml/docs/latest/docs/guides/advanced/multiple_tasks_single_process cannot work in a remote execution.
Either one would be nice to have. I kinda like the instant search option, but could live with an ENTER to search.
I opened this meanwhile - https://github.com/allegroai/clearml-server/issues/138
Generally, it would also be good if the pop-up presented some hints about what went wrong with fetching the experiments. Here, I know the pattern is incomplete and invalid. A less advanced user might not understand what's up.
I have seen this quite frequently as well tbh!
I'll kill the agent and try again but with the detached mode π€
Hm, that seems less than ideal. I was hoping I could pass some CSV locations. I'll try and find a workaround for that. Thanks!
I just used this to create the dual_gpu
queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
Hah. Now it worked.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more π
Yeah I managed to work around those former two, mostly by using Task.create
instead of Task.init
. It's actually the whole bunch of daemons running in the background that takes a long time, not the zipping.
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
There is a data object it, but there is no script object attached to it (presumably again because of pytest?)
The key/secret is also shared internally so that sounds like a nice mitigation actually!
Which environment variable am I looking for? I couldn't spot anything specifically in that environment variables page
There's not much (or anything) in the log to provide...
` (.venv) 15:42 [0:user@server$~] CLEARML_CONFIG_FILE=~/agent_clearml.conf clearml-agent daemon --queue default on_prem --detached --order-fairness
Environment variables set from configuration: ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_DEFAULT_REGION']
...
We have the following, works fine (we also use internal zip packaging for our models):
model = OutputModel(task=self.task, name=self.job_name, tags=kwargs.get('tags', self.task.get_tags()), framework=framework)
model.connect(task=self.task, name=self.job_name)
model.update_weights(weights_filename=cc_model.save())
Could also be that the use of ./
is the issue? I'm not sure what else I can provide you with, SweetBadger76
Will try!
Curious - is there a temporary changelog for 1.2.0? π Always fun to poke at the upcoming features
EDIT: Wait, should the clearml RC be installed outside the venv for the agent as well?
Anyway sounds good! π
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling task.close()
takes a long time
Last but not least - can I cancel the offline zip creation if I'm not interested in it π€
EDIT: I see not, guess one has to patch ZipFile
...
Actually SuccessfulKoala55 , there is something like that happening behind the scenes.
I have an AWS Autoscaler running on a services
queue, so the autoscaler inherits the configuration used by the services
agent, right?
Now, when my autoscaler launched new EC2 instances, they used the same fileserver
as the one that was defined in the services
agent too π€
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE
to e.g. /home/username/clearml.conf
instead of /Users/username/clearml.conf
as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist
). Thanks for the support! β€
So some UI that shows the contents of users.get_all
?
Thanks SuccessfulKoala55 ! Could I change this during runtime, so for example, only the very first task goes through this process?
Happens with the latest version indeed.
I canβt share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
Not sure if @<1523701087100473344:profile|SuccessfulKoala55> or @<1523701827080556544:profile|JuicyFox94> maybe knows?