Reputation
Badges 1
42 × Eureka!So apparently the NVIDIA AMI https://aws.amazon.com/marketplace/pp/prodview-e7zxdqduz4cbs
doesn't have the aws-cli
installed. So I install it in the extra_vm_bash_script
and now it wants a configuration. Is there any way to get that from the ENV vars you create? Do you think I should create my own AMI just for this?
And for some reason this clone is marked as completed. Not sure why, as it failed
right, of course 🙂 so just to make sure I'm running it correctly. I ran python aws_autoscaler.py --run
on my laptop and I see the Task on ClearML. Then took a completed task, cloned it and enqueued to the queue defined on the autoscaler. That should spin up an instance, right? (it currently doesn't, and I'm not sure where to debug)
I looked there, but couldn't find it. I'm currently experimenting with your free hosted server
I did not, I see that there's a field for extra_trains_conf
, but couldn't find clear documentation on how to use it. Is it just a reference to a trains_conf
(maybe clearml_conf
?)?
Sounds promising, any ETA for the next version?
CLEARML_DOCKER_IMAGE=nvidia/cuda:10.1-runtime-ubuntu18.04
How do I pull the image using the agent?
Hey AgitatedDove14 thanks, that works! The docker is now up and running, great success.
I have a follow up, maybe you can help debug. Now for some reason git clone
doesn't work through the agent, but if I login myself to the machine and run the same command I see that fails in the log it works. The error I see is:
` cloning: git@gitlab.com:<repo_path>
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Reposito...
Hey, I tried doing that but sadly it doesn't seem to work. As suggested by the ECR docs, I added:aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ECR URI>
To the extra_vm_bash_script
in the config file. I even added a docker pull
which I think worked (because it took much longer for the instances to spin up), but I still got the same error message 😞 Is there any way to debug these sessions through clearml? Thanks!
I was thinking about sending the parameters programatically. We have different pipelines that can generate tasks, I would like to be able to tell trains the user who started the pipeline.
so no magic "username" key? 😛
I have access to the machine using SSH from my computer.
There doesn't seem to be any other error in the debug mode.
` Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel `
legit, I was thinking only about task tracking, less about user based credentials. good point
python -m
http://script.as .a.module first_arg second_arg --named_arg value
<- something like that
Thanks! A followup question - can I make the steps in the pipeline use the latest commit in the branch?
AgitatedDove14 is there any update on the open issue you talked about before? I think it's this one: https://github.com/allegroai/clearml/issues/214
Hooray! That works AND the feature works!
Quick follow up question, is there any way to abort a pipeline and all of the tasks it ran?
Update 2: it works with the public repo using https: https://gitlab.com/gitlab-org/gitlab-foss.git but not with the private one, withfatal: could not read Username for '
': terminal prompts disabled
yup, it's there in draft mode so I can get the latest git commit when it's used as a base task
Also, tried the continue_pipeline option, didn't work as it couldn't parse the previous step that run...ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
when I ran the script it autogenerated the YAML, so I should manually copy it to the remote services agents?
I just want to use auth0 (which we already use in the company) in order to manage the users...
Is there an option to do this from a pipeline, from within the add_step
method? Can you link a reference to cloning and editing a task programmatically? nope, it works well for the pipeline when not I don't choose to continue_pipeline
nope, only port 22 is open for SSH. Is there anyway to set that as the port for clearml-session?
cool! just to verify - I'll still need to have the credentials created in the server, right?
what about using ENV variables? is it possible to override the config file's credentials?