Reputation
Badges 1
606 × Eureka!Thu Mar 11 17:52:45 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.56 Driver Version: 460.56 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | ...
conda env update -p .clearml/venvs-builds/3.8 ./environment.yml
with environment.yml
name: clearml
channels:
- pytorch
- anaconda
- conda-forge
- defaults
dependencies:
- pytorch==1.8.0
You suggested this fix earlier, but I am not sure why it didnt work then.
Can you actually reproduce my problem when also using conda_freeze: true
?
==> 2021-03-11 12:50:38 <==
# cmd: /home/tim/miniconda3/condabin/conda create --yes --mkdir --prefix /home/tim/.clearml/venvs-builds/3.8 python=3.8
--
==> 2021-03-11 12:50:40 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c pytorch cudatoolkit=11.0 --quiet --json
--
==> 2021-03-11 12:50:43 <==
# cmd: /home/tim/miniconda3/condabin/conda install -p /home/tim/.clearml/venvs-builds/3.8 -c defaults -c conda-forge -c p...
Thanks a lot. To summarize: To me clearml is a framework, but I would rather have it be a library.
Other than that I am very happy with clearml and it is probably my favorite machine learning related package of the last two years! 🙂 And thanks for taking so much time to talk to me!
Also I can see that clearml correctly loads the configSTORAGE S3BucketConfig(bucket='clearml', host='myhost:9000', key='mykey' secret='mysecret', token='', multipart=False, acl='', secure=True, region=None, verify=True, use_credentials_chain=False)
But this means the logger will use the default fileserver or not?
Is sdk.development.default_output_uri
used with s3://ip:9000/clearml or
ip:9000/clearml
?
This is the error I get from setting the logger upload destination.botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the PutObject operation: The AWS Access Key Id you provided does not exist in our records.
No, it is just a pain to find files that have been deleted by a user, but are actually not deleted in the fileserver/s3 🙂
But no worries, nothing that is crucial.
I created this issue today, which can alleviate the pain temporarily: https://github.com/allegroai/clearml-server/issues/133
` apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-stopped
volumes:
- /opt/clearml/logs:/var/log/clearml
- /opt/clearml/config:/opt/clearml/config
- /opt/clearml/data/fileserver:/mnt/fileserver
depends_on:
- redis
- mongo
- elasticsearch
- fileserver
- fileserver_datasets
environment:
CLEARML_ELASTIC_SERVICE_HOST: elasticsearch
CLEARML_...
Exactly. I don't want people to circumvent the queue 🙂
Thanks! I am fascinated by what you guys offer with clearml 🙂
For example I get the following error if I simply clone and rerun:ERROR: Could not find a version that satisfies the requirement ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28)) (from versions: none) ERROR: No matching distribution found for ruamel_yaml_conda>=0.11.14 (from conda==4.10.1->-r /tmp/cached-reqs6wtc73be.txt (line 28))
I see, so it is actually not related to clearml 🎉
In the first run the package only existed because it is preinstalled in the docker image. Afaik, in the second run it is also preinstalled, but pip will first try to resolve it and then see whether it already exists. But I am not to sure about this.
No no, I was just wondering how much effort it is to create something like ClearML. And your answer gives me a rough estimate 🙂