Reputation
Badges 1
104 × Eureka!SmugDolphin23 Sorry to bother again, output_uri should be a URI to S3 endpoint or clear ml fileserver? If it's not provided artifacts are stored locally, right?
@<1523701070390366208:profile|CostlyOstrich36>
What agent-services is doing on start up? Seems like something is preventing it from properly working. I already added a command to entrypoint to configure pip.conf since we have to use a trusted mirror to download python packages. Also I managed to connect local agent to ClearML server by using 127.0.0.1 host in credentials. Still no luck with remote agent
@<1722061389024989184:profile|ResponsiveKoala38> Shouldn't the escape slash be before the quote?
Traceback (most recent call last):
File "/home/<home>/.local/bin/clearml-agent", line 8, in <module>
sys.exit(main())
File "/home/<home>/.local/lib/python3.8/site-packages/clearml_agent/__main__.py", line 83, in main
return run_command(parser, args, command_name)
File "/home/<home>/.local/lib/python3.8/site-packages/clearml_agent/__main__.py", line 46, in run_command
return func(**args_dict)
` File "/home/<home>/.local/lib/python3....
CostlyOstrich36
The error appears regardless of --foreground tag. This is not full stacktrace, I will provide it with the next message.
clearml 1.9.0
clearml-agent 1.5.1
Ubuntu1 8.04.6 LTS
SmugDolphin23 That fixed the issue, thank you very much!
clearml 1.9.0
clearml-agent 1.5.1
NAME="Ubuntu"
VERSION="18.04.6 LTS (Bionic Beaver)"
` from random import random
from clearml import Task, TaskTypes
args = {}
task: Task = Task.init(
project_name="My Proj",
task_name='Sample task',
task_type=TaskTypes.inference,
auto_connect_frameworks=False
)
task.connect(args)
task.execute_remotely(queue_name="default")
value = random()
task.get_logger().report_single_value(name="sample_value", value=value)
with open("some_artifact.txt", "w") as f:
f.write(f"Some random value: {value}\n")
task.upload_artifact(name="test...
@<1523701304709353472:profile|OddShrimp85> I fixed my SSL error by putting REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt
in .bashrc
file
A bit overwhelmed by configuration, since it has an agent, a server and bunch of configuration files, easy to mess up
@<1523701087100473344:profile|SuccessfulKoala55> Fixed it by setting env var with path to certificates. I was sure that wouldn't help since I can curl and python get request to my endpoint from shell just fine. Now it says I am missing security headers, seems it's something on my side. Will try to fix this
@<1523701087100473344:profile|SuccessfulKoala55> Could you provide a sample of how to properly fill all the necessary config values to make S3 work, please?
My endpoint starts with https://
and I don't know what my region is, endpoint URL doesn't contain it.
Right now I fill it like this:
aws.s3.key = <access-key>
aws.s3.secret = <secret-key>
aws.s3.region = <blank>
aws.s3.credentials.0.bucket = <just_bucket_name>
aws.s3.credentials.0.key = <access-key>
aws.s3.credentials.0.secret ...
@<1523701435869433856:profile|SmugDolphin23> I actually don't know where to get my region for the creds to S3 I am using. From what I figured, I have to plug in my sk, ak and bucket into credentials in agent and output URI must be my S3 endpoint — complete URI with protocol. Is it correct?
@<1523701435869433856:profile|SmugDolphin23> I didn't use a region at first and that was not working. Now I use a region and it still doesn't work.
From the boto3 inside a Python I could create a session where I specify ak and sk, and create a client from the session where I pass service_name and endpoint_url. It works just fine
@<1523701435869433856:profile|SmugDolphin23> Thanks a lot, that actually worked! It was very difficult to figure out you have to plug those exact values given you have https endpoint:
- Using s3 protocol instead of https together with bucket name in output URI
- Not providing a bucket name in credentials section where it is by default
- Providing default secure port for both host and output URI
- Disabling credentials chainI think a common use case for many people that they get S3 storage wi...
SmugDolphin23 Got it. Now I am a bit confused about region parameter in s3 section. Amazon docs say that region could be a regular URL with protocol like https://etc.etc which my endpoint actually is. I plugged it in s3 section in clearml.conf. Should it stay that way?
@<1523701304709353472:profile|OddShrimp85> I haven't done it, for me it worked as-is
SmugDolphin23 Thank you very much!
That's clearml.conf for ClearML end users right?
The code is run from another machine where clearml.conf configured to connect to ClearML server, no other configurations are provided
@<1722061389024989184:profile|ResponsiveKoala38> Hello. It seems that it didn't work for me. I made a backup, moved it to another machine and tried to run clearml service (latest docker compose). Now, I have async-delete, apiserver, mongo, fileserver, elastic constantly restarting
@<1523701087100473344:profile|SuccessfulKoala55> I reloaded agent couple of times, cleared cache and for some reason it works now! Anyways, thanks for your help!
Thank you, got it. I tried it because I couldn't figure out how to make auto-detection work. When I run a task from my local project folder (which is also a git repo) via Task.init
it says that no repository was found. Also there is Task.create
method which lets you pass git URL but I suspect the Task.init
is more preferrable method
@<1523701087100473344:profile|SuccessfulKoala55>
from random import random
from clearml import Task, TaskTypes
import pandas as pd
task: Task = Task.init(
project_name="My Project",
task_name='Sample task',
task_type=TaskTypes.inference
)
task.connect(args)
task.execute_remotely(queue_name="default")
value = random()
task.get_logger().report_single_value(name="sample_value", value=value)
df = pd.DataFrame.from_dict({'col_1': [3, 2, 1, 0], 'col_2': ['a', 'b', 'c', 'd']})...
@<1523701087100473344:profile|SuccessfulKoala55> I run it from local machine, that's right. When I run the task it says it can't clone repository. In the web UI on my task there's a REPOSITORY string. It's a correct ssh URL to my repo but it's missing git@
after ssh://
If I add the git part to it by editing the task and queuing again it works. In my config file I have option force_git_ssh_user: git
enabled.
SuccessfulKoala55 So my question is how to setup auto-detection properly so worker knows what git repo to pull from
Sorry, guys, maybe I am not expressing myself clear or it's something I am missing, I am not a native speaker so I'll try to reformulate. What we have is enterprise solution built on S3 technology, I don't have an access to servers on where it's run, I don't have a port. All I have been provided with are: secret key, access key, endpoint that looks like a regular web URL and a bucket name. Using these creds I can access this cloud storage just fine by any means except ClearML
@<1523701087100473344:profile|SuccessfulKoala55> I provided following env vars:
CLEARML_HOST_IP: "<my_ip>"
CLEARML_WEB_HOST: " http://<my_ip>:8080 "
CLEARML_API_HOST: " http://<my_ip>:8008 "
CLEARML_FILES_HOST: " http://<my_ip>:8081 "
CLEARML_API_ACCESS_KEY: <my_access_key>
CLEARML_API_SECRET_KEY: <my_secret_key>
also I changed IP in entrypoint from apiserver:8008 to <my_ip>:8008
Yes, I run both commands from the same place — dedicated user on my worker m...
@<1523701087100473344:profile|SuccessfulKoala55>
I managed to create clearml.conf file with clearml-agent init
after fixing proxy problem. And now trying to run daemon with this conf file. I suspect something is missing from it since request validator fails with missing attribute
@<1523701070390366208:profile|CostlyOstrich36>