
Reputation
Badges 1
42 × Eureka!AgitatedDove14 is there any update on the open issue you talked about before? I think it's this one: https://github.com/allegroai/clearml/issues/214
Also, tried the continue_pipeline option, didn't work as it couldn't parse the previous step that run...ValueError: Could not parse reference '${run_experiment.models.output.-1.url}', step run_experiment could not be found
something needs to run the autoscaler, I thought it would be the machine that runs the services queue, no?
So apparently the NVIDIA AMI https://aws.amazon.com/marketplace/pp/prodview-e7zxdqduz4cbs
doesn't have the aws-cli
installed. So I install it in the extra_vm_bash_script
and now it wants a configuration. Is there any way to get that from the ENV vars you create? Do you think I should create my own AMI just for this?
Is there an option to do this from a pipeline, from within the add_step
method? Can you link a reference to cloning and editing a task programmatically? nope, it works well for the pipeline when not I don't choose to continue_pipeline
Hey, I tried doing that but sadly it doesn't seem to work. As suggested by the ECR docs, I added:aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin <ECR URI>
To the extra_vm_bash_script
in the config file. I even added a docker pull
which I think worked (because it took much longer for the instances to spin up), but I still got the same error message 😞 Is there any way to debug these sessions through clearml? Thanks!
Hey AgitatedDove14 thanks, that works! The docker is now up and running, great success.
I have a follow up, maybe you can help debug. Now for some reason git clone
doesn't work through the agent, but if I login myself to the machine and run the same command I see that fails in the log it works. The error I see is:
` cloning: git@gitlab.com:<repo_path>
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
Reposito...
Update: got the same error while trying to clone a public repo: git@gitlab.com:gitlab-org/gitlab-foss.git
Hooray! That works AND the feature works!
Quick follow up question, is there any way to abort a pipeline and all of the tasks it ran?
yeah, maybe as an option in the Task.init
it's in the docker image, doesn't the git clone command run in the container?
Sounds promising, any ETA for the next version?
so no magic "username" key? 😛
No, I use an SSH connection which worked with the regular clearml-agent
, we prefer to work with SSH instead of creating a git user.
ok, hopefully last question on this subject 🙂
I want to use Jenkins for some pipelines. What I would like to do is have one set of credentials saved on Jenkins. Then whenever a user triggers a pipeline - this is the user that will be marked as the task's user.
If I understand the options you suggested, I'll currently need either to (1) have some mapping between users and their credentials and have all the credentials saved on Jenkins; or, (2) have each user manually add 2 environment varia...
I have access to the machine using SSH from my computer.
There doesn't seem to be any other error in the debug mode.
` Remote machine is ready
Setting up connection to remote session
Starting SSH tunnel
SSH tunneling failed, retrying in 3 seconds
Starting SSH tunnel `
nope, only port 22 is open for SSH. Is there anyway to set that as the port for clearml-session?
cool! just to verify - I'll still need to have the credentials created in the server, right?
what about using ENV variables? is it possible to override the config file's credentials?
I looked there, but couldn't find it. I'm currently experimenting with your free hosted server
when I ran the script it autogenerated the YAML, so I should manually copy it to the remote services agents?
Sure, redacted most of the params as they are sensitive:
` run_experiment {
base_task_id = "478cfdae5ed249c18818f1c50864b83c"
queue = null
parents = []
timeout = null
parameters {
# Redacted the parameters
}
executed = "d1d361d1059c4f0981200f59d7683773"
}
segment_slides {
base_task_id = "ae13cc979855482683474e9d435895bb"
queue = null
parents = ["run_experiment"]
timeout = null
parameters {
Args/param = """
[
#...
yeah, totally. Are there any services OOB like this?
I just want to use auth0 (which we already use in the company) in order to manage the users...