Reputation
Badges 1
104 × Eureka!I don't think so. Just some info about cluster state in there
All log entries have "level": "INFO"
@<1523701070390366208:profile|CostlyOstrich36>
Versions in compose are:
image: allegroai/clearml:1
image: elasticsearch:7.6.2
image: mongo:4.4.9
I am not quite sure that backups were made on those versions. Is there a way to see service versions from backup?
SmugDolphin23 That fixed the issue, thank you very much!
CostlyOstrich36 Seems like on my server agent-services container is missing. It's not running. Could it be the issue?
~/.local/bin/clearml-agent daemon --foreground
Right, seems the lib was severely outdated
I looked through agent-services logs and found new error I haven't seen before:clearml_agent: ERROR: Connection Error: it seems *api_server* is misconfigured. Is this the ClearML API server http://<my_ip>:8008 ?
@<1523701087100473344:profile|SuccessfulKoala55> So I have to provide a host for it to work and no other way around it?
@<1523701087100473344:profile|SuccessfulKoala55> I run it from local machine, that's right. When I run the task it says it can't clone repository. In the web UI on my task there's a REPOSITORY string. It's a correct ssh URL to my repo but it's missing git@
after ssh://
If I add the git part to it by editing the task and queuing again it works. In my config file I have option force_git_ssh_user: git
enabled.
clearml 1.9.0
clearml-agent 1.5.1
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
Console output of clearml-agent init
with no clearml.conf:
...ClearML Hosts configuration:
Web App:
NoneAPI:
NoneFile Store:
None
Verifying credentials ...
Error: could not verify credentials: key=ak secret=sk
...
Console output of clearml-agent daemon --foreground
with clearml.conf created by clearml-init
is missing. No output.
...
@<1523701070390366208:profile|CostlyOstrich36>
What agent-services is doing on start up? Seems like something is preventing it from properly working. I already added a command to entrypoint to configure pip.conf since we have to use a trusted mirror to download python packages. Also I managed to connect local agent to ClearML server by using 127.0.0.1 host in credentials. Still no luck with remote agent
@<1523701087100473344:profile|SuccessfulKoala55>
So, I did it with debug and got this stacktrace error:type_checker=validator.TYPE_CHECKER.redefine_many({
AttributeError: type object 'Draft4Validator' has no attribute 'TYPE_CHECKER'
Sorry, guys, maybe I am not expressing myself clear or it's something I am missing, I am not a native speaker so I'll try to reformulate. What we have is enterprise solution built on S3 technology, I don't have an access to servers on where it's run, I don't have a port. All I have been provided with are: secret key, access key, endpoint that looks like a regular web URL and a bucket name. Using these creds I can access this cloud storage just fine by any means except ClearML
Can a problem be that backups are made while ClearML was running, not stopped, like docs suggest? @<1523701070390366208:profile|CostlyOstrich36>
@<1523701070390366208:profile|CostlyOstrich36>
Try to run docker ps
and check if all of your clearml containers up and running (should be 8 total)
@<1523701435869433856:profile|SmugDolphin23> Thanks a lot, that actually worked! It was very difficult to figure out you have to plug those exact values given you have https endpoint:
- Using s3 protocol instead of https together with bucket name in output URI
- Not providing a bucket name in credentials section where it is by default
- Providing default secure port for both host and output URI
- Disabling credentials chainI think a common use case for many people that they get S3 storage wi...
The terminal hangs on the command
@<1523701087100473344:profile|SuccessfulKoala55> I reloaded agent couple of times, cleared cache and for some reason it works now! Anyways, thanks for your help!
Console output of clearml-agent daemon --foreground
?
@<1523701087100473344:profile|SuccessfulKoala55> I figured where to find a region but we don't have an AWS dashboard. We have a custom S3 solution for our own enterprise servers like many companies do, data is not stored on amazon servers. That is why we have and endpoint which is an URL starting with http://
If I would connect to our bucket via boto3 I would pass endpoint to a client session with endpoint_url
@<1523701435869433856:profile|SmugDolphin23> I didn't use a region at first and that was not working. Now I use a region and it still doesn't work.
From the boto3 inside a Python I could create a session where I specify ak and sk, and create a client from the session where I pass service_name and endpoint_url. It works just fine
It works like I mentioned before: the terminal jumps on a new line and sits there, no output after that, nothing is happening in the console. But if you go to UI you see that "Last used" is updating
CostlyOstrich36 Any thoughts?
Also, previous problem was in incorrect proxy configuration on agent machine