
Reputation
Badges 1
64 × Eureka!@<1523701070390366208:profile|CostlyOstrich36> am I doing anything wrong here?
@<1523701070390366208:profile|CostlyOstrich36>
from clearml import Task
from clearml.automation import HyperParameterOptimizer, UniformIntegerParameterRange, DiscreteParameterRange
task = Task.init(
project_name="examples",
task_name="HP optimizer",
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False,
)
task.execute_remotely(queue_name="services")
an_optimizer = HyperParameterOptimizer(
base_task_id="c7618e30ff5c4955b4942971b410f72d",
...
can you provide some mode details please ? Do you intend to store your artefacts locally or remotely ?
Does the manual reporting also fails ?
If you could also give your clearml packages versions it could help
I store the artifacts on a minio server (in my LAN).
If I run the python script locally (i.e. no execute remotely()
it works fine).
I use the latest clearml 1.6.2
Did you by any chance save the checkpoint without any file extention? Or with a weird name containing sl...
AgitatedDove14 , here's the log
CostlyOstrich36
I do get errors - failing to launch the clearml images.WARNING: The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested standard_init_linux.go:228: exec user process caused: exec format error
Well, after diving into this, it seems like the clearml images were built usin amd64 (on top of amd64 base images...)
So I run the same script as part of a git repo - but unfortunately the package is still missing.
I'm not sure if it matters but 'kwcoco' is being imported inside one of the repo's functions and not on the script's header.
AgitatedDove14 its running inside a docker based worker.
Are you interested in the full pip freeze of that docker?
That would be a very useful feature.
What is the status of that issue? I havn't found it on github.
Something like:
model = SomePytorchModel()
checkpoint = {'model_state_dict': model.state_dict()}
torch.save(checkpoint, “model.tar”)
AgitatedDove14 ,
From the experiment’s console log:
` - boto3==1.16.2
- botocore==1.19.2 `
Thank for the great explanation! Now it makes much more sense.
You are right about the issue that 'kwcoco' isn't being detected, and Im actually running this as a single script, and the kwcoco not imported directly (but from within another package).
Ill try running it from a repo and see how it works.
SuccessfulKoala55
Were using the community server
SuccessfulKoala55
we are using the fileserver which is configured at clearml.conf to a path on a network drive (i.e the NAS) -files_server: file:///mnt/clearml_storage
SuccessfulKoala55
I'm not sure I understand your suggestion
Because we want all our data to be stored on premises.
/mnt/clearml_storage
is the mount point of the NAS on one linux machine.
On macos it would be /Volumes/clearml_storage,
On windows - //NAS/clearml_storage
AgitatedDove14 , did you test it using a worker, or with local execution?
I just tested https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_mnist.py with a (docker based) worker and it yields the same error
` 2022-07-17 07:59:40,330 - clearml.Task - INFO - Waiting to finish uploads
2022-07-17 07:59:40,330 - clearml.storage - INFO - Starting upload: /tmp/.clearml.upload_model_0_4d_ikk.tmp => tapsff.local:9000/clearml/examples/PyTorch MNIST train.02ed1df11bf54...
AgitatedDove14 Yes, thats correct.
It's in my local conda environment though.
AgitatedDove14
I'm not sure.
In my case I'm not trying to reproduce a local environment in the agent, but to run a script inside a docker which already has the environment built in.
The environment is conda based.
Not sure I understand the purpose of this.
it meant pip will look for wheels at thus url?
Yes you are right.
This is the default docker image from clearml, and I was thinking that the agent will install conda if it's not already there (like it installs pip...) Isn't it?
SuccessfulKoala55
Is it a rare use case to use a NAS as the fileserver?
What would you suggest?
can you test what happens if you pass the credentials in the global scope as well, i.e. here:
That didn’t help
This is actually in runtime (i.e. when running the code),
My script looks like that
` import clearml
clearml.Task.force_requirements_env_freeze(force=False, requirements_file="requirements.txt")
task = clearml.Task.init(...)
task.set_script(working_dir=dir, entry_point="my_script.py")
task.execute_remotely(queue_name='default')
rest of script goes here.... `When you refer to runtime, do you mean when the script is executed remotely, or when I run my_script.py locally (in order to ...
Yes.
Its kwcoco
This is my conda env export
:
name: clearml
channels:
- defaults
dependencies: - ca-certificates=2021.10.26=hecd8cb5_2
- certifi=2021.10.8=py39hecd8cb5_0
- libcxx=12.0.0=h2f01273_0
- libffi=3.3=hb1e8313_2
- ncurses=6.3=hca72f7f_2
- openssl=1.1.1l=h9ed2024_0
- pip=21.2.4=py39hecd8cb5_0
- python=3.9.7=h88f2d9e_1
- readline=8.1=h9ed2024_0
- setuptools=58.0.4=py39hecd8cb5_0
- sqlite=3.36.0=hce871da_0
- tk=8.6.11=h7bc2e8c_0
- tzdata=2021e=hda174b...