Reputation
Badges 1
64 × Eureka!SweetBadger76
It’s not a credential issue, because I do upload artifacts manually with tsk.upload_artifact(...)
I’ll try changing the extension, but I have to admit that in the past (I havn’t used clearml for a while and updated it recently to the latest verion) it did got this file extension right
CostlyOstrich36
Is that command evaluated prior to the task creation?
Or only after the task is executed remotely?
AgitatedDove14
I'm not sure.
In my case I'm not trying to reproduce a local environment in the agent, but to run a script inside a docker which already has the environment built in.
The environment is conda based.
@<1523701070390366208:profile|CostlyOstrich36> am I doing anything wrong here?
AgitatedDove14
It's still failing.
I updated clearml-agent to 1.2.0rc7 and also:docker_setup_bash_script= [ "export PATH=""/workspace/miniconda/bin:$PATH", "export LOCAL_PYTHON=/workspace/miniconda/bin/python3", "conda activate"])But the conda activate (base env) returns:CommandNotFoundError: Your shell has not been properly configured to use 'conda activate'.
I noticed that conda ...
Oh wow AgitatedDove14 . Appreciate it!
Are you sure it’s just a matter of the python version?
The same experiment script, was working on the exact docker image in the past (with older clearml versions though…).
For example this experiment log:
Oh!
That was so silly on my side...
what does that actually mean?2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
SuccessfulKoala55
we are using the fileserver which is configured at clearml.conf to a path on a network drive (i.e the NAS) -files_server: file:///mnt/clearml_storage
SuccessfulKoala55
Is it a rare use case to use a NAS as the fileserver?
What would you suggest?
Thanks ExasperatedCrab78
AgitatedDove14 - attached
Because we want all our data to be stored on premises.
SuccessfulKoala55
Were using the community server
AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2
- botocore==1.19.2 `
So I run the same script as part of a git repo - but unfortunately the package is still missing.
I'm not sure if it matters but 'kwcoco' is being imported inside one of the repo's functions and not on the script's header.
can you test what happens if you pass the credentials in the global scope as well, i.e. here:
That didn’t help
AgitatedDove14
Yes, I'd like to point to a specific binary, which is in a conda environment.
(b.t.w how can I specify the python version on the Task?)
UnevenDolphin73 Thanks! Ill look into and reach out if needed
Its mounted automatically.
The local .ssh folder is copied to a temp folder which is mounted by the agent in the docker run command -Executing: ['docker', 'run', '-t', ......., '-v', '/tmp/clearml_agent.ssh.fuu4r8ta:/root/.ssh', .... , '--rm', 'nvidia/cuda:10.1-cudnn7-runtime-ubuntu18.04', ....]
Yes you are right.
This is the default docker image from clearml, and I was thinking that the agent will install conda if it's not already there (like it installs pip...) Isn't it?
@<1523701070390366208:profile|CostlyOstrich36>
Well, I doubt that it is the case in my situation.
Is there any API where I can get read the metrics per experiment?
AgitatedDove14 .
Note that the actual error is /workspace/miniconda/bin/python3: No module named clearml_agent since all the packages (including clearml_agent) were already installed by the agent on the default (non conda) python binary.
AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error
` 2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads
2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf54...
can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help
I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely() it works fine).
I use the latest clearml 1.6.2
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing sl...
After signing with google, the login page is stuck at this
The script is intended to be executed remotely.
Can I declare an absolute path in this case?
And I'm using the latest clearml / clearml-agent
@<1523701070390366208:profile|CostlyOstrich36>