Reputation
Badges 1
14 × Eureka!I want to design a pipeline: step 1: process the local dataset. 2. upload local dataset to clearml server (self-hosted). 3. start training use this dataset. 4. save model to clearml server.
but i can run the experiments “Step 1…“, “Step 2…” manually (ClearML server -> Projects -> examples -> pipeline step 1 dataset artifact)
hope it’s helpful
it always show Running
.
the logs are :
Adding venv into cache: /home/ubuntu/.clearml/venvs-builds/3.10
Running task id [e7f7792081ef438bb8a6f993c71a0515]:
[.]$ /home/ubuntu/.clearml/venvs-builds/3.10/bin/python -u pipelines/pp_task.py
Summary - installed python packages:
pip:
- attrs==23.2.0
- boto3==1.34.78
- botocore==1.34.79
- certifi==2024.2.2
- charset-normalizer==3.3.2
- clearml==1.15.0
- Cython==3.0.10
- furl==2.1.3
- idna==3.6
- jmespath==1.0.1
- jsonschema==4.21.1
- jsonsch...
based on my understanding, the key/secret is only for agent services on server side.
also check apiserver and elastic logs and find if there are any errors.
I start clearml-agent on another awc ec2 using below command:
clearml-agent daemon --queue default
I just run examples examples/pipeline/pipeline_from_tasks.py
after commit all files (step1*, step2*, step3*, etc) to my bitbucket repo.
Hi Evgeny. thank you so much for your help. i tried this setting. no error log any more.
but i am still stuck in the login page.
it can work now. thank you so much
then i check the status using WebApp
i will have a try. thank you so much.
copy credential from webUI to ~/clearml.conf (linux)
Try below options:
sudo chown -R 1000:1000 /opt/clearml
in docker-compose.yaml
, elasticsearch -> volumes
Change- /opt/clearml/data/elastic_7:/usr/share/elasticsearch/data
TO- /opt/clearml/data/elastic_7:/var/lib/elasticsearch/data
first, you need to make sure all containers are running normally. especially the clearml-elastic
, this container may restart every few seconds due to some errors. such situation may cause authorization issue.
i have same issue. solved by option 2.
then you delete the browser’s cookie and cache, log in webui again to check if the credential is still there.
i found the created credential will be missing.
i’m a beginner of clearML. thank you so much for the advice. i should read more documentation.
yes. i followed the steps to spin up an agent. there is no other task in this queue.
no other output after “ starting Task Execution"
there is no problem if i run task manually