Reputation
Badges 1
25 × Eureka!Everything seems correct...
Let's try to set it manually.
create a file ~/trains.conf , then copy paste the credentials section from the UI, it should look something like:api { web_server: http:127.0.0.1:8080 api_server: http:127.0.0.1:8008 files_server: http:127.0.0.1:8081 credentials { "access_key" = "access" "secret_key" = "secret" } }
Let's see if that works
Just making sure, the machine that you were running the "trains-init" on can access the API server ?
if it ain't broke, don't fix it
π
Up to you, just a few features & nicer UI.
BTW: everything is backwards compatible, there is no need to change anything all the previous trains/trains-agent packages will work without changing anything π
(This even includes the configuration file, so you can keep the current ~/trains.conf and work with whatever combination you like of trains/clearml on the same machine)
Hi @<1546303293918023680:profile|MiniatureRobin9>
Im not sure to understand the difference between a worker and an agent.
hmm we should probably make that clearer π
agent = the clearml-agent instance running on the machine
worker is the system term representing the instance of the agent
You can have one machine with multiple agents (i.e. multiple workers) running on it.
Does that make sense ?
In the documentation it warns about
.close()
"Only call Task.close if you are certain the Task is not needed."
Maybe this is not clear enough, this means you do not need to automatically Add/Log/Track things into the Task in the current process.
This does Not mean you cannot access the Task or its artifacts
Mark closed means to externally (i..e not from the process that crated the Task, maybe even from a different machine) close and mark the task as completed (this...
Notice that you can embed links to specific view of an experiment, by copying the full address bar when viewing it.
Hi @<1573119962950668288:profile|ObliviousSealion5>
Hello, I don't really like the idea of providing my own github credentials to the ClearML agent. We have a local ClearML deployment.
if you own the agent, that should not be an issue,, no?
forward my SSH credentials using
ssh -A
and then starting the clearml agent?
When you are running the agent and you force git clonening with SSH, it will autmatically map the .ssh into the container for the git to use
Ba...
owning the agent helps, but still it's much better if the credentials don't show up in logs,
They are not, they are always filtered out,
- how does
force_git_ssh_protocol
help please? it doesn't solve the issue of the agent simply not having accessIt automatically maps the host .ssh into the container, so that git can use SSH to clone.
What exactly is not working?
and how are you configuring it?
Hi @<1572395184505753600:profile|GleamingSeagull15>
Is there an official place to report bugs and add feature requests for the app.clear.ml website?
GitHub issues is usually the place, or the
Assuming GitHub, but just making sure you don't have another PM tool you'd rather use.
Really appreciate asking! it is always hard to keep track π
Thank you so much @<1572395184505753600:profile|GleamingSeagull15> !
looks like your
faq.clear.ml
site is missing from your main sites sitemap files,
Thank you for noticing! I'll check with the webdevs
Also missing the
robots
meta tag on that site,
π
Last tip is to add a link on the
faq.clear.ml
site back to
clear.ml
for search index relevancy ( connects the two sites as being related in content...
Hi @<1572395181150310400:profile|DeterminedHare56>
Yes Slack is not the best for knowledge sharing, but it is the easiest for users to communicate over, and it is the easiest to setup and scale.
Specifically you can find historical log of the Slack channel here: None
Which we hoped google will index, but seems like this is still not working as expected, if you have any inputs it will be great to improve it
RattySeagull0 I think you are correct, python 3.6 is the installed inside the docker. Is it important to have 3.7 ? You might need another docker (or change the installation script and install python 3.7 inside)
when I duplicate the experiment and clone it remote, the call is ignored and the recorded values are used?
Yes ScantChimpanzee51 exactly.
Think of it as the inital value you want to put on the Task when you are running the code on your machine, later when you clone the Task, you can edit the base docker image in the UI (or with the API), of course the new value is used when the agent spins this Task, and to avoid the actual docker (the one you changed in the UI) to be overwritten by ...
Wonβt they be printed out as well in the Web UI?
They would in the log, but it will not be stored back on the Task (the idea is these are "agent specific" additions no need for them to go with the Task)
So Iβve tried the approach and it does work,
ScantChimpanzee51 What do you mean it does not work? what exatcly are you trying with task.connect and does not work?
Is there a way to inject environment variables into a Task or into its container?
Yes you can with:
` task.s...
The remaining problem is that this way, they are visible in the ClearML web UI which is potentially unsafe / bad practice, see screenshot below.
Ohhh that makes sense now, thank you π
Assuming this is a one time credntials for every agent, you can add these arguments in the "extra_docker_arguments" in clearml.conf
Then make sure they are also listed in: hide_docker_command_env_vars
which should cover the console log as well
https://github.com/allegroai/clearml-agent/blob/26e6...
Hi ScantChimpanzee51
Is it possible to run multiple agent on EC2 machines started by the Autoscaler?
I think that by default you cannot,
having the Autoscaler start 1x p3.8xlarge (4 GPU) on AWS might be better than 4x p3.2xlarge (1 GPU) in terms of availability, but then then weβd need one Agent per GPU.
I think that this multi-GPU setup is only available in the enterprise tier.
That said, the AWS pricing is linear, it costs the same having 2 instances with 1 GPU as 1 instanc...
BTW, this one seems to work ....
` from time import sleep
from clearml import Task
Task.set_offline(True)
task = Task.init(project_name="debug", task_name="offline test")
print("starting")
for i in range(300):
print(f"{i}")
sleep(1)
print("done") `
It might be broken for me, as I said the program works without the offline mode but gets interrupted and shows the results from above with offline mode.
How could I reproduce this issue ?
But there might be another issue in between of course - any idea how to debug?
I think I missed this one, what exactly is the issue ?
EnviousStarfish54 you can also run the docker-compose on one of the machines on your local LAN. but then you will not be able to access it from home π
with tensorboard logging, it works fine when running from my machine, but not when running remotely in an agent.
This is odd, could you send the full Task log?
Hi @<1729309131241689088:profile|MistyFly99>
notice that the files server need to have an "address" that can be accessed from the browser, data is stored in a federated manner. This means your browser is directly accessing the files server, not through the API server, I'm assuming the address is not valid?
Yeah, but I still need to update the links in the clearml server
yes... how many are we talking about here?