Reputation
Badges 1
104 × Eureka!~/.local/bin/clearml-agent daemon --foreground
He tried to help me in another thread but I still couldn't make things work
@<1523701070390366208:profile|CostlyOstrich36>
Should I leave as is or fill the values in docker-compose for agent-services? I set it to localhost since agent-services is running together with other clearml containers on one machine. Not sure why do you have to fill those values.
CLEARML_HOST_IP: "<my_clearml_server_ip>"
CLEARML_WEB_HOST: " None "
CLEARML_API_HOST: " None "
CLEARML_FILES_HOST: " [None](http://127.0.0.1...
Hi @<1722061389024989184:profile|ResponsiveKoala38> , I am using those specific versions because my previous ClearML installation runs with such versions, they are in docker compose file. Version of ClearML image is 1. Afaik the latest is 1.16.2. My goal is to move ClearML to a different machine so I need to stick to those versions
My current setup is:
sdk.development.default_output_uri=< None > # no port, no bucket
sdk.aws.s3.key=<my-access-key>
sdk.aws.s3.secret=<my-secret-key>
sdk.aws.s3.region=<my-region> # I think it can be skipped but somewhere in the clearml code it says that it must be specified if it's not default like us-east-1 or something
sdk.aws.s3.credentials.bucket=<my-bucket> # just a bucket name
sdk.aws.s3.credentials.host=< None : 443> # the same as output...
Console output of clearml-agent daemon --foreground ?
After I run my experiment I have a console error that says I am missing security headers. This is a custom XML response. The same behaviour could be achieved when just trying to curl the endpoint or plug it in the browser. When I run e.g. boto3 client where I explicitly specify endpoint, ak, sk and bucket I could do whatever I want. So it seems to me ClearML is trying to get to this endpoint in some incorrect way
SmugDolphin23 That fixed the issue, thank you very much!
I looked through agent-services logs and found new error I haven't seen before:clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://<my_ip>:8008 ?
SucculentCrab55 I've had this problem when I tried to launch UI too quickly, try to wait a bit and then check UI again
@<1523701087100473344:profile|SuccessfulKoala55> I run it from local machine, that's right. When I run the task it says it can't clone repository. In the web UI on my task there's a REPOSITORY string. It's a correct ssh URL to my repo but it's missing git@ after ssh:// If I add the git part to it by editing the task and queuing again it works. In my config file I have option force_git_ssh_user: git enabled.
@<1523701087100473344:profile|SuccessfulKoala55> I reloaded agent couple of times, cleared cache and for some reason it works now! Anyways, thanks for your help!
Can a problem be that backups are made while ClearML was running, not stopped, like docs suggest? @<1523701070390366208:profile|CostlyOstrich36>
CostlyOstrich36 Seems like on my server agent-services container is missing. It's not running. Could it be the issue?
@<1523701070390366208:profile|CostlyOstrich36>
What agent-services is doing on start up? Seems like something is preventing it from properly working. I already added a command to entrypoint to configure pip.conf since we have to use a trusted mirror to download python packages. Also I managed to connect local agent to ClearML server by using 127.0.0.1 host in credentials. Still no luck with remote agent
@<1523701087100473344:profile|SuccessfulKoala55>
from random import random
from clearml import Task, TaskTypes
import pandas as pd
task: Task = Task.init(
project_name="My Project",
task_name='Sample task',
task_type=TaskTypes.inference
)
task.connect(args)
task.execute_remotely(queue_name="default")
value = random()
task.get_logger().report_single_value(name="sample_value", value=value)
df = pd.DataFrame.from_dict({'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']})...
@<1523701087100473344:profile|SuccessfulKoala55>
I managed to create clearml.conf file with clearml-agent init after fixing proxy problem. And now trying to run daemon with this conf file. I suspect something is missing from it since request validator fails with missing attribute
Thank you, got it. I tried it because I couldn't figure out how to make auto-detection work. When I run a task from my local project folder (which is also a git repo) via Task.init it says that no repository was found. Also there is Task.create method which lets you pass git URL but I suspect the Task.init is more preferrable method
@<1523701087100473344:profile|SuccessfulKoala55>
So, I did it with debug and got this stacktrace error:type_checker=validator.TYPE_CHECKER.redefine_many({AttributeError: type object 'Draft4Validator' has no attribute 'TYPE_CHECKER'
Yeah, I mean fresh installation using old docker compose file. Just without backups (/clearml/data). So it seems the solution to me should be:
- Migrate to the latest version of elastic on old installation
- Make a backup
- Deploy latest ClearML installation with that backup
482e96243041 allegroai/clearml:latest "python3 -m jobs.asy…" 18 months ago Up 7 weeks 8008/tcp, 8080-8081/tcp async_delete26c677f2b70f allegroai/clearml:1 "/opt/clearml/wrappe…" 18 months ago Up 16 months 8008/tcp, 8080-8081/tcp, 0.0.0.0:8080->80/tcp, :::8080->80/tcp clearml-webserver- `7e2cf4462f44 allegroai/clearml:1 "/opt/clearml/wrappe…" 18 months ago Up 7 months 0.0.0.0:8008->8008/tcp, :::8008->8008/tcp, 8080-8081/tcp clearml-apiserv...
The terminal hangs on the command
CostlyOstrich36 Am I right I should also provide this URLS in agent-services section in docker-compose file?
CLEARML_HOST_IP: ${CLEARML_HOST_IP:-}
CLEARML_WEB_HOST: ${CLEARML_WEB_HOST:-}
CLEARML_API_HOST: http://apiserver:8008
@<1523701070390366208:profile|CostlyOstrich36> I understand but the description of the error seems to indicate not about database conflicts but about connectivity to elastic by apiserver. I couldn't find info about this on the internet. I think I ruled out incosistent image versions. Are there any more suggestions? Thanks.
I'll get back to you in a minute
` % Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 100k 100 100k 0 0 10236 0 0:00:10 0:00:10 --:--:-- 21354
Warning: Transient problem: HTTP error Will retry in 10 seconds. 10 retries
Warning: left.
100 100k 100 100k 0 0 10237 0 0:00:10 0:00:10 --:--:-- 21345
Warning: Transient problem: HTTP error Will retry in 10 seconds. 9 retries
Warning: left...
@<1523701087100473344:profile|SuccessfulKoala55> Right
I don't think so. Just some info about cluster state in there
CostlyOstrich36 Yep, it seems it was the case. I did not provide credentials for API in docker compose. I did that but now agent-services just keeps restarting. I looked into containers logs and it seems to be a proxy error. Why this container is trying to connect somewhere?