Reputation
Badges 1
14 × Eureka!I want to design a pipeline: step 1: process the local dataset. 2. upload local dataset to clearml server (self-hosted). 3. start training use this dataset. 4. save model to clearml server.
i will have a try. thank you so much.
but i am still stuck in the login page.
it can work now. thank you so much
first, you need to make sure all containers are running normally. especially the clearml-elastic
, this container may restart every few seconds due to some errors. such situation may cause authorization issue.
i’m a beginner of clearML. thank you so much for the advice. i should read more documentation.
Try below options:
sudo chown -R 1000:1000 /opt/clearml
in docker-compose.yaml
, elasticsearch -> volumes
Change- /opt/clearml/data/elastic_7:/usr/share/elasticsearch/data
TO- /opt/clearml/data/elastic_7:/var/lib/elasticsearch/data
i have same issue. solved by option 2.
hope it’s helpful
yes. i followed the steps to spin up an agent. there is no other task in this queue.
copy credential from webUI to ~/clearml.conf (linux)
Hi Evgeny. thank you so much for your help. i tried this setting. no error log any more.
there is no problem if i run task manually
then you delete the browser’s cookie and cache, log in webui again to check if the credential is still there.
also check apiserver and elastic logs and find if there are any errors.
i found the created credential will be missing.
based on my understanding, the key/secret is only for agent services on server side.
I start clearml-agent on another awc ec2 using below command:
clearml-agent daemon --queue default
I just run examples examples/pipeline/pipeline_from_tasks.py
after commit all files (step1*, step2*, step3*, etc) to my bitbucket repo.
it always show Running
.
the logs are :
Adding venv into cache: /home/ubuntu/.clearml/venvs-builds/3.10
Running task id [e7f7792081ef438bb8a6f993c71a0515]:
[.]$ /home/ubuntu/.clearml/venvs-builds/3.10/bin/python -u pipelines/pp_task.py
Summary - installed python packages:
pip:
- attrs==23.2.0
- boto3==1.34.78
- botocore==1.34.79
- certifi==2024.2.2
- charset-normalizer==3.3.2
- clearml==1.15.0
- Cython==3.0.10
- furl==2.1.3
- idna==3.6
- jmespath==1.0.1
- jsonschema==4.21.1
- jsonsch...
but i can run the experiments “Step 1…“, “Step 2…” manually (ClearML server -> Projects -> examples -> pipeline step 1 dataset artifact)