
Reputation
Badges 1
64 × Eureka!SuccessfulKoala55
Is it a rare use case to use a NAS as the fileserver?
What would you suggest?
@<1523701070390366208:profile|CostlyOstrich36> am I doing anything wrong here?
Not sure I understand the purpose of this.
it meant pip will look for wheels at thus url?
SuccessfulKoala55
Were using the community server
SuccessfulKoala55
I'm not sure I understand your suggestion
And I'm using the latest clearml / clearml-agent
@<1523701070390366208:profile|CostlyOstrich36>
@<1523701070390366208:profile|CostlyOstrich36>
from clearml import Task
from clearml.automation import HyperParameterOptimizer, UniformIntegerParameterRange, DiscreteParameterRange
task = Task.init(
project_name="examples",
task_name="HP optimizer",
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False,
)
task.execute_remotely(queue_name="services")
an_optimizer = HyperParameterOptimizer(
base_task_id="c7618e30ff5c4955b4942971b410f72d",
...
AgitatedDove14 its running inside a docker based worker.
Are you interested in the full pip freeze of that docker?
Something like:
model = SomePytorchModel()
checkpoint = {'model_state_dict': model.state_dict()}
torch.save(checkpoint, “model.tar”)
can you test what happens if you pass the credentials in the global scope as well, i.e. here:
That didn’t help
Oh!
That was so silly on my side...
what does that actually mean?2022-07-17 07:59:40,339 - clearml.storage - ERROR - Failed uploading: Parameter validation failed: Invalid type for parameter ContentType, value: None, type: <class 'NoneType'>, valid types: <class 'str'>
AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2
- botocore==1.19.2 `
/mnt/clearml_storage
is the mount point of the NAS on one linux machine.
On macos it would be /Volumes/clearml_storage,
On windows - //NAS/clearml_storage
Hi @<1523701070390366208:profile|CostlyOstrich36> ,
The idea is indeed to control the object via API, but in that particular case, if I don't want the seed to be specified by the API but just set it to be current timestamp.
Could you think of a better use?
can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help
I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely()
it works fine).
I use the latest clearml 1.6.2
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing sl...
SweetBadger76
It’s not a credential issue, because I do upload artifacts manually with tsk.upload_artifact(...)
I’ll try changing the extension, but I have to admit that in the past (I havn’t used clearml for a while and updated it recently to the latest verion) it did got this file extension right
Thanks AgitatedDove14 !
I’ll use clearml 1.4.1 until the fix is out.
AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error
` 2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads
2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf54...
CostlyOstrich36
I do get errors - failing to launch the clearml images.WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested standard_init_linux.go:228: exec user process caused: exec format error
Well, after diving into this, it seems like the clearml images were built usin amd64 (on top of amd64 base images...)
AgitatedDove14 .
Note that the actual error is /workspace/miniconda/bin/python3: No module named clearml_agent
since all the packages (including clearml_agent) were already installed by the agent on the default (non conda) python binary.
Anyway, when I add the binary's path to PATH, it still won't work.
I calltask.set_base_docker(docker_image='my/docker/image', docker_setup_bash_script= ["export PATH=""/workspace/miniconda/bin:$PATH"])
just after Task.init.
On execution, the agent installs all pip requirements with python binary at /usr/bin/python3.6
but eventually the task fails with, /workspace/miniconda/bin/python3: No module named clearml_agent
Yes.
Its kwcoco
This is my conda env export
:
name: clearml
channels:
- defaults
dependencies: - ca-certificates=2021.10.26=hecd8cb5_2
- certifi=2021.10.8=py39hecd8cb5_0
- libcxx=12.0.0=h2f01273_0
- libffi=3.3=hb1e8313_2
- ncurses=6.3=hca72f7f_2
- openssl=1.1.1l=h9ed2024_0
- pip=21.2.4=py39hecd8cb5_0
- python=3.9.7=h88f2d9e_1
- readline=8.1=h9ed2024_0
- setuptools=58.0.4=py39hecd8cb5_0
- sqlite=3.36.0=hce871da_0
- tk=8.6.11=h7bc2e8c_0
- tzdata=2021e=hda174b...
So I run the same script as part of a git repo - but unfortunately the package is still missing.
I'm not sure if it matters but 'kwcoco' is being imported inside one of the repo's functions and not on the script's header.
AgitatedDove14 , here's the log