Examples: query, "exact match", wildcard*, wild?ard, wild*rd
Fuzzy search: cake~ (finds cakes, bake)
Term boost: "red velvet"^4, chocolate^2
Field grouping: tags:(+work -"fun-stuff")
Escaping: Escape characters +-&|!(){}[]^"~*?:\ with \, e.g. \+
Range search: properties.timestamp:[1587729413488 TO *] (inclusive), properties.title:{A TO Z}(excluding A and Z)
Combinations: chocolate AND vanilla, chocolate OR vanilla, (chocolate OR vanilla) NOT "vanilla pudding"
Field search: properties.title:"The Title" AND text
Answered
Hey Friends, How Do You Configure Clearml To Use An S3 Bucket? Specifically: Does

Hey friends, how do you configure ClearML to use an S3 bucket?

Specifically: does every data scientist have to have hard-coded AWS credentials with read/write access to the artifacts bucket in their own clearml.conf file?

That seems problematic for 3 reasons:

  • Operationally, it'd be difficult to have each data scientist acquire those credentials and then place them into their own clearml.conf file (manual steps and all that)
  • We use temporary credentials. In general, we try to favor AWS SSO so that users don't even have long-term credentials on their machine. Access keys expire every few hours. It's be difficult to get our data scientists to refresh their tokens and then place them into their clearml.conf file. Additionally, the credentials could expire in the middle of a ClearML task run, which would have unpredictable behavior (probably)
  • Lastly, how does this work with the AWS autoscaler? Do the autoscaled instances simply rely on a role that has access to that bucket?
    image
  
  
Posted one year ago
Votes Newest

Answers 7


Hi @<1541954607595393024:profile|BattyCrocodile47> , sorry - I seemed to have missed this... 🙏
The easiest way to achieve #1 would be to use the AWS_PROFILE env var supported by boto3. For #2, if you have an IAM role assigned to an instance, you'll need to make sure the use_credentials_chain option is set to true .
#3 should work 🙂
Regarding the last questions, yes you can - there's an init bash acript you can provide where you can do whatever you wish 🙂

  
  
Posted one year ago

Glad I can help Eric!
More or less, it may not be the best (or a good at all) approach but it seems like a decent workaround.
You can read in the credentials in with Boto

session = boto3.Session(profile_name='some-profile')
dev_s3_client = session.client('s3')

from the shared config but yes you may need a separate script to update the values (which I don't think needs to necessarily use Boto) or maybe load the clearml.conf from a shared space (of course that adds other dependencies)
The Config vault vault definitely seems to address this (it's also an Enterprise feature). It looks like you can update config through the UI and merge into the config file while ClearML and ClearML Agent are used but I'm not sure whether that could still cause any blips. It comes with this "Productivity tip: Keep the vault disabled while you edit your configuration, and enable it when the configuration is ready."

  
  
Posted one year ago

Thanks Vasil! Can you elaborate on what you mean by using boto3? Do you mean writing a script using boto that pulls the credentials down and writes to the user's clearml.conf

Also, I've been seeing references to "credentials vault" in the docs. I can see this is the problem that it solves.

  
  
Posted one year ago

Yay! Man, I want to do ClearML with "hard mode" (non-enterprise, self-hosted) first, before trying to sell BENlabs (my work) on it. I could see us paying for enterprise to get the Hyper Datasets and Vault features if our scientists/developers fall in love with it--they probably will if we can get them to adopt it since right now we have a homemade system that isn't nearly as nice as ClearML.

@<1523701087100473344:profile|SuccessfulKoala55> how exactly do you configure ClearML to use the credentials on the client machine? Is there a way to set the AWS profile somehow (maybe from the clearml.conf file to achieve something similar to what @<1550289509273309184:profile|CooperativeBeetle24> was saying?

For example, we just created an "MLOps Tooling" AWS account. And our developers/scientists all have an AWS profile called [mlops-tooling] in their ~/.aws/credentials file (well, actually it's in the ~/.aws/config file since we use AWS SSO AKA AWS Identity Center).

  • Setting the profile in clearml.conf would be a really nice solution to this for us because we use the exact workflow you just described. (Except, we don't use the default profile, is that what you were suggesting?). Is this possible? I don't see a setting called "aws_profile" in the boto section of clearml.conf
  1. I guess our auto-scaled instances could do something similar, but with an IAM role. Would giving those instanecs a role and leaving their clearml.conf files alone achieve that?

  2. And laaaaaast question: we'll need to authorize our AWS Autoscaled ClearML agents to git clone our repositories. I'm assuming if a developer's laptop is being used as an agent, ClearML will just use their SSH keys at the default location of ~/.ssh/id_rsa* .

Can we have the autoscaled ClearML EC2 instances in AWS run a script to download those keys from AWS SecretsManager on startup? (that would work nicely because, presumably, the instances will have an IAM role that allows that).

  
  
Posted one year ago

@<1541954607595393024:profile|BattyCrocodile47> to answer the original question, I'm not sure each user having their own credentials is such a deviation from the way AWS expected you to use their credentials anyway - seems to me this is their standard mode of operation - you come in to work, use your MFA to generate a new token, than use it for 12 hours 🙂
Basically you can configure ClearML to just use the existing AWS credentials set on the machine without changing the clearml.conf every time (that can be the developer's workstation, or the training machine), so anything you would normally be doing with AWS setup is already supported

  
  
Posted one year ago

Just chiming in regarding the vault feature, yes it does address this issue and allowed centralized configuration as well as per-user configuration :)

  
  
Posted one year ago

Maybe you've already seen this and it doesn't help but:
Based on ClearML's S3 setup guide , it looks like you can handle 1 and 2 with Boto3 and have config that sets the keys per session / profile so they would be picked up from a shared config file or aws config file. Not 100% sure if that will avoid the credential expiry mid-task issue but I think it should, maybe with some additional code for you to run a check for cred updates if you get kicked out of your session.
Apologies if I'm repeating things you've read but hope this helps a bit!

  
  
Posted one year ago
1K Views
7 Answers
one year ago
one year ago
Tags