
Reputation
Badges 1
25 × Eureka!Itβs the correct way to do it, right?
Yep π that said this is not running as a service you will need to spin it on your machine. that said you can definitely connect it with the free SaaS server, and spin the serving on your machine with docker-compose
Hmm, in the credentials popup there should be a "secure connect" checkbox, it tells it to use https instead of http. Can you verify?
Ohh try to add --full-monitoring
to the clearml-agent execute
None
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
SmarmySeaurchin8 regarding the original question:task.set_project(project_id)
Task.get_projects() to get all the project names/ids
btw: what's the OS and python version?
Hi MammothGoat53
Basically what you are missing are the headers with the Token you have:
https://blog.logrocket.com/secure-rest-api-jwt-authentication/
Hi @<1540142641931358208:profile|FancyBaldeagle86>
You mean in the UI? i.e. clone an experiment hover over the Configuration / Hyperparameter section and clicking edit ?
The pod has an annotation with a AWS role which has write access to the s3 bucket.
So assuming the boto environment variables are configured to use the IAM role, it should be transparent, no? (I can't remember what the exact envs are, but google will probably solve it π _
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_SESSION_TOKEN. I was expecting clearml to pick them by default from the environment.
Yes it should, the OS env will always override the configuration file sect...
None of them is problematic, this is what I'm trying to say π
I think the minio browser gets confused.
if you want to test the upload time on the client you can try:task.flush(wait_for_uploads=True) tic = time() task.upload_artifact('test', '/tmp/localfile') task.flush(wait_for_uploads=True) print(time() - tic)
AstonishingRabbit13
https://github.com/googleapis/google-cloud-python/issues/4941#issuecomment-369472576
check the openssl and the date, this seems like SSL low level error (even before authentication)
IrritableGiraffe81 could it be the pipeline component is not importing pandas inside the function? Notice that a function decorated with pipeline component become a stand-alone, this means that if you need pandas you need to import inside the function. The same goes for all the rest of the packages used.
When you are running with run_loclly or debug_pipeline you are using your local env , as opposed to the actual pipeline where a new env is created inside the repo.
Can you send the Entire p...
os.environ['CLEARML_PROC_MASTER_ID'] = ''
Nice catch! (I'm assuming you also called Task.init somewhere before, otherwise I do not think this was necessary)
I think i solved it by deleting the project and running the base_task one time before the hyper parameter optimzation
So isit working now? everything is there ?
However I'm quite confident, that plots and scalars are not visible online only when 'git diff to large to store' appears.
These should be unrelated, are you seeing console outputs ?
Hmm let me check something
GentleSwallow91 what you are looking for is here π
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L149
SourSwallow36 okay, let's assume we have the base experiment (the original one before the HP process).
What we do is we clone that experiment (either in UI or with code or with code automation, aka HP optimizer. Then each clone of the original gets a set of new HP, then we enqueue the 10 experiments into the execution queue. In parallel, we run trains-agent on a machine, and connect it to the queue. It will pull the experiments, one after the other, run them and log their results. We will en...
in the UI, find the task (just search for the Task ID, it will find it), then tight click it, and select "reset"
And after having called
Task.init()
the second time, the automatic logging of resources and tensorboard plots works as well. I would recommend adding explanation to the docs for
Oh yeah! you always need to call Task.init first, Task,current_task should be called from anywhere you like but after the Task.init was called.
And what is exactly missing from the "installed packages" ? Is "help_models" an additional wheel you have to install ?
Just making sure here, but remember that if your original code did not have a git repo, the only thing that is "copied" to the trains-server is the initial script, so any accompanying scripts will be missing in the trains-agent environment
TenseOstrich47 this sounds like a good idea.
When you have a script, please feel free to share, I think it will be useful for other users as well π
I'm just trying to see what is the default server that is set, and is it responsive
I'm assuming you mean your own server, not the demo server, is that correct ?
and then second part is to check if it is up and alive
Yes, you can curl
to the ping endpoint :
https://clear.ml/docs/latest/docs/references/api/debug#post-debugping
I would ideally just want to have NVIDIA drivers and Docker on the on-prem nodes (along with the clearML agents). Would that allow me to get by with basic job scheduling/queues through clearML?
Yes this is fully supported and very easy to setup.
Regrading limiting users usage. This is doable, I think the easiest solution both for users and management of the cluster is introducing priority into the queue, basically a user can push job into low priority, and only some users can push into high...
regrading the actual artifact access, this is the usual Task.artifacts access: see example here:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts_retrieval.py
Hi GracefulDog98
The agent will map the ~/.ssh folder automatically into the docker's /root/.ssh
It will also convert http links to ssh pull if you set force_git_ssh_protocol
in your clearml.conf :
https://github.com/allegroai/clearml-agent/blob/351f0657c3dcf707659875d7e0a52fa387709978/docs/clearml.conf#L25
But the same configuration does not work on the machine with the trains-agent?
Seems like
Task.create
is the correct use-case then, since again this is about testing flows using e.g. pytest,
Make sense
This seems to be fine for now, ...
Sounds good! thanks UnevenDolphin73