you will need to provide more context than that if you don't want the answer: Have you try to turn it off and back on again ?
once you install manually your package inside the docker container, check that your file module_b/templates/my_template.yml is where it should be
Should I put that in the clearml.conf file?
not sure how that work with Docker and machine that is not set up with ssh public key ... We will go to that path sometime in the future so I am quite interested too, on how people do it without ssh public key
I mean, what happen if I import and use function from another py file ? And that function code changes ?
Or you are expecting code should be frozen and only parameters changes between runs ?
What should I put in there? What is the syntax for git package?
@<1523701070390366208:profile|CostlyOstrich36> Thanks !! That's look much cleaner than task.export_task()['runtime']['gpu_type'] :D
@<1523701205467926528:profile|AgitatedDove14>
What is the env var name for Azure Blob storage ? That the one we use for our Artifiact.
Also, is there function call rather than env var ?
It would be simplier in our case to call a function to set credential for clearml rather than fetch secret and set env var prior to running the python code.
If there is only the option of using env var, I am thinking fetchcing secrets and set env var from python, eg: os.environ["MY_VARIABLE"] = "hello" ...
but then it still missing a bunch of library in the Taks (that succeed) > Execution > INSTALLED PACKAGES
So when I do a clone of that task, and try to run the clone, the task fail because it is missing python package 😞
so in your case, in the clearml-agent conf, it contains multiple credential, each for different cloud storage that you potential use ?
right, in which case you want to dynamically change with your code, not with the config file. This is where the Logger.set_default_output_upload come in
I don;t think ClearML is designed to handle secrets other than git and storage ...
Just to confirm: "output_uri to log everything to S3" is that on the server config or client config (the clearml.conf where the code is actually running) ?
Where the model will be saved/uploaded is defined by the client and not the server.
what about the log aroundwhen it try to actually clone your repo ?
if you are using a self hosted clearml server spin up with docker-compose, then you can just mount your NAS to /opt/clearml/fileserver on the host machine, prior to starting clearml server with docker-compose up
with ssh public key, if from a terminal, I can do git clone, then so do the clearml agent, as it run on behalf of an local user. That apply to both local and VM
For local agent running on-prem, we use Service Principal or each user login to auth with Azure and then mount ~/.azure into the container
this look like you had configured git to track a bunch of file that should not be tracked like uv.lock or .DS_Store ...
pretty sure GCP have all the equivalent
Sure:
def main():
repo = "redacted"
commit = "redacted"
commit = "redacted"
bands = ["redacted"]
test_size = 0.2
batch_size = 64
num_workers = 12
img_size = (128, 128)
random_seed = 42
epoch = 20
learning_rate = 0.1
livbatch_list = get_livbatch_list(repo, commit)
lbs = download_batches(repo, commit, livbatch_list)
df, label_map = get_annotation_df(lbs, bands)
df_train, df_val = deterministic_train_val(df, test_size=test_siz...
For us, we use Azure, we use KeyVault to store secret.
The VM/node that run agent have a Azure Identity that have permission to read those Secret.
To pull the Secret, we simply have az login --identity [--client-id foobar] prior to az secret ....
Hi.
How do you tell the server to use my azure storage instead of local drive, on the host machine ? Isn't it by setting azure.storage in /opt/clearml/config/clearml.conf ?
Nevermind, didn't read properly ...
I guess when the pods simply crash or disconnect, the clearml agent won't have a chance to report to ClearML server: hey, the network is going to be cut ....
You will need to k8s logic to flow back to the DS that the node just die for xyz reason ...
got it working. I was using CLEARML_AGENT_SKIP_PIP_VENV_INSTALL .
now I just use agent.package_manager.system_site_packages=true
ok, so if git commit or uncommit changes differ from previous run, then the cache is "invalidated" and the step will be run again ?