I'll kill the agent and try again but with the detached mode π€
Hm, that seems less than ideal. I was hoping I could pass some CSV locations. I'll try and find a workaround for that. Thanks!
I just used this to create the dual_gpu
queue:clearml-agent daemon --queue dual_gpu --create-queue --gpus 0,1 --detached
Hah. Now it worked.
yes, a lot of moving pieces here as we're trying to migrate to AWS and set up autoscaler and more π
Yeah I managed to work around those former two, mostly by using Task.create
instead of Task.init
. It's actually the whole bunch of daemons running in the background that takes a long time, not the zipping.
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
There is a data object it, but there is no script object attached to it (presumably again because of pytest?)
The key/secret is also shared internally so that sounds like a nice mitigation actually!
Which environment variable am I looking for? I couldn't spot anything specifically in that environment variables page
There's not much (or anything) in the log to provide...
` (.venv) 15:42 [0:user@server$~] CLEARML_CONFIG_FILE=~/agent_clearml.conf clearml-agent daemon --queue default on_prem --detached --order-fairness
Environment variables set from configuration: ['AWS_ACCESS_KEY_ID', 'AWS_SECRET_ACCESS_KEY', 'AWS_DEFAULT_REGION']
...
We have the following, works fine (we also use internal zip packaging for our models):
model = OutputModel(task=self.task, name=self.job_name, tags=kwargs.get('tags', self.task.get_tags()), framework=framework)
model.connect(task=self.task, name=self.job_name)
model.update_weights(weights_filename=cc_model.save())
Could also be that the use of ./
is the issue? I'm not sure what else I can provide you with, SweetBadger76
Will try!
Curious - is there a temporary changelog for 1.2.0? π Always fun to poke at the upcoming features
EDIT: Wait, should the clearml RC be installed outside the venv for the agent as well?
Anyway sounds good! π
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling task.close()
takes a long time
Last but not least - can I cancel the offline zip creation if I'm not interested in it π€
EDIT: I see not, guess one has to patch ZipFile
...
Actually SuccessfulKoala55 , there is something like that happening behind the scenes.
I have an AWS Autoscaler running on a services
queue, so the autoscaler inherits the configuration used by the services
agent, right?
Now, when my autoscaler launched new EC2 instances, they used the same fileserver
as the one that was defined in the services
agent too π€
This was a long time running since I could not access the macbook in question to debug this.
It is now resolved and indeed a user error - they had implicitly defined CLEARML_CONFIG_FILE
to e.g. /home/username/clearml.conf
instead of /Users/username/clearml.conf
as is expected on Mac.
I guess the error message could be made clearer in this case (i.e. CLEARML_CONFIG_FILE='/home/username/clearml.conf' file does not exist
). Thanks for the support! β€
So some UI that shows the contents of users.get_all
?
Thanks SuccessfulKoala55 ! Could I change this during runtime, so for example, only the very first task goes through this process?
Happens with the latest version indeed.
I canβt share our code, but the gist of it is:
pipe = PipelineController(name=..., project=..., version=...)
pipe.add_function_step(...) # Many calls
pipe.set_default_execution_queue(...)
pipe.start(queue=..., wait=True)
Not sure if @<1523701087100473344:profile|SuccessfulKoala55> or @<1523701827080556544:profile|JuicyFox94> maybe knows?
Then perhaps mac treats missing environment variables as empty and linux just crashes? Anyway, the config loading should be deferred, shouldn't it?
Well, -ish. Ideally what we're after is one of the following:
Couple a task with a dataset. Keep it visible in it's destined location. Create a dataset separately from the task. Have control over its visibility and location. If it's hidden, it should not affect normal UI interaction (most annoying is having to click twice on the same project name when there are hidden datasets, which do not appear in the project view)
That's what I found as well, but it did not like it after all (boto is fine with it, but underlying urllib
and requests
were not?)
It's fine -- I see the added benefit in making sure the users set up their clearml.conf
and I've made a script to edit it to our needs as part of the installation process π Thanks Martin!
TimelyPenguin76 CostlyOstrich36 It seems a lot of manual configurations is required to get the EC2 instances up and running.
Would it not make sense to update the autoscaler (and example script) so that the config.yaml
that's used for the autoscaler service is implicitly copied to the EC2 services, and then any extra_clearml_conf
are used/overwritten?
I commented on your suggestion to this on GH. Uploading the artifacts would happen via some SDK before switching to remote execution.
When cloning a task (via WebUI or SDK), a user should have an option to also clone these input artifacts or simply linking to the original. If linking to the original, then if the original task is deleted - it is the user's mistake.
Alternatively, this potentially suggests "Input Datasets" (as we're imitating now), such that they are not tied to the original t...
I think ClearML boots up only afterwards, so those environment variables may not be available yet.
You should set them manually in the bootstrap code unfortuantely.