Reputation
Badges 1
23 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> Thanks for response. Can I ask a second question? I have script main.py in my docker image in this path "/", but when clearml starts my container on agent it tries to execute in this path "/root/.clearml/venvs-builds/3.10/code/" Do you know how to change this behavior? For example I tried the --cwd argument, but Clearml-task tells me that "repository(Error: working directory '{}', must be relative to repository root)", but I don't use repository...
Yep, it's inside repo. The steps are like in documentation
clearml-task --project examples --name remote_test --script /path/to/my/script.py
--packages "keras" "tensorflow>2.2" --args epochs=1 batch_size=64
--queue dual_gpu
I have found the solution. I should store script outside git repo, but I think autodetection in ClearML is not a good choice, there is should be flag like --no_autodetect_git
And about agent... Agent is listening queue, but the problem that I cant put in queue task without --script or module, here is a code of clearml-task " if raise_on_missing_entries and not base_task_id:
if not script and not module:
raise ValueError("Entry point script not provided")
if not repo and not folder and (script and not Path(script).is_file()):
raise ValueError("Script file '{}' could not be found".format(script))" But wh...
As I understand its CLEARML_AGENT_FORCE_CODE_DIR? From documentation I try to understand, should I specify these variables in agent Dockerfile or I can dynamically specify it?
I investigated that in such path there is no script. Where should it be?
Thanks, I've seen this option. I thought it would be possible to do it via Task. I tried the method via Pipeline. The pod is lifted, the library is installed, but at the end there is no file. What could be the problem, can you tell me? Here are the logs: Environment setup completed successfully
Starting Task Execution:
/root/.clearml/venvs-builds/3.10/bin/python: can't open file '/root/.clearml/venvs-builds/3.10/task_repository/clearml-agent.git/test_remote_execution.py': [Errno 2] No such fi...
But it's weird. If I want to run the code without a repository, for example through "execute_remotely" or through "add_function_step", because by default it is assumed that the repository is not needed, isn't it so?
@<1523701070390366208:profile|CostlyOstrich36> I may not fully understand the functionality of remote code execution. Do I always need to have a git repository for this?
I'm currently unsure about the correct approach. Would you kindly review my attempts and point out where I might have made a mistake? Here's what I've tried:
- I've added the default url in agent helm chart
clearml:
...
clearmlConfig: |-
sdk {
development {
default_output_uri: "
"
}
}
- I've added url in agent section:
agentk8sglue:
...
fileServerUrlReference:
- In the Python fil...
Ok, guys, I done it, by manually uploading model.task = Task.init(project_name='test', task_name='PyTorch MNIST train filserver dataset')
output_model = OutputModel(task=task, framework="PyTorch")
output_model.set_upload_destination(uri="
None ")
tmp_dir = os.path.join(gettempdir(), "
mnist_cnn.pt ")
torch.save(model.state_dict(), tmp_dir)
output_model.update_weights(weights_filename=tmp_dir)
Pod easily can download dataset, upload to fileserver logs, but can't upload model 😀
Ok, maybe someone knows: how does a pod created by a K8s agent know the model registry URL? When I added the output_uri parameter in the Task, like output_uri=" None ", it doesn't show anything now. Previously, without this parameter, it showed a path like " None ...." in WebUI->Experiments->Artifacts
I run code from pod created by agent and model has been uploaded. But when task was started by agent command it doesn't upload) Magic
Hi @<1523701070390366208:profile|CostlyOstrich36> , I tried this, but It doesn't work, should it be fileserver url?
I didn't save it in any way. I relied on the auto-save from Clearml
@<1523701070390366208:profile|CostlyOstrich36> Hi, yes, I tried these options, but nothing happened. Do you know how to debug this problem?
@<1523701070390366208:profile|CostlyOstrich36> You didn't understand me)) I want to push on the agent one function from code , wait some calculations, and continue code. I don't need to push a whole script
Yes, you are right. The path is file:///home/jovyan/mlops/model.pkl, it's not a file server. I cant understand how this path has been appeared :))
Ok, I found out that using scikit-learn the model is uploading, but pytorch doesn't.
You are right. Based on this page S3 is supported. My file with this function:
from clearml import Task
task = Task.init(project_name='s3_upload_models', task_name='sklearn', output_uri=True)
task.update_output_model(auto_delete_file=False, name='v0.0.1', model_path='s3://<BUCKET_NAME>/MLOps/models/sklearn/sklearn.pkl')
task.close()
my clearml.conf
aws {
s3 {
# default, used for any bucket not specified below
key: ""
secret: ""
region: ""...
@<1523701087100473344:profile|SuccessfulKoala55> Hi, thank you. So, does ClearML have any direct interface to copy/upload files from S3 to its fileserver? Or do we need to download files locally first? I found InputModel.import_model
- is this the recommended way to import models from S3 to ClearML?