You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
link with "localhost" in it Oo
Hmm I think this is the main issue, for some reason the dataset default upload destination is "localhost", what do you have configured in your clearml.conf under files server?
Hi @<1547028052028952576:profile|ExuberantBat52>
task = Task.get_task(...)
print(task.data)
wdyt?
odd message though ... it should have said something about boto3
Hmm whats the OS and python version?
Is this simple example working for you?
None
Hi UpsetTurkey67
repository discovery stores github repo in the form:
...
while for others
git@github.com:...
Yes that depends on how they locally cloned the repo (via SSH or user/pass/token)
Interestingly in the former case the ssh config is ignored and cloning repository breaks on the worker
If you have passed git user/pass to the agent it should use them not SSH, how did you configure the agent ?
btw: what's the OS and python version?
Yes, the left side is the location of the file on the host machine, the right side is the location of the file inside the docker. in our case it is the same location
This will mount the trains-agent machine's hosts file into the docker
I am struggling with configuring ssh authentication in docker mode
GentleSwallow91 Basically the agent will automatically mount the .ssh into the container , just make sure you set the following in the clearml.conf:force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L30
Hi @<1539055479878062080:profile|FranticLobster21>
Like this?
https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-optimization/hyper_parameter_optimizer.py
[https://github.com/allegroai/clearml/blob/4ebe714165cfdacdcc48b8cf6cc5bddb3c15a89f[…]ation/hyper-parameter-opt...
BTW: UnevenDolphin73 you should never actually do "task = clearml.Task.get_task(clearml.config.get_remote_task_id())"
You should just do " Task.init()
" it will automatically take the "get_remote_task_id" and do all sorts of internal setups, you will end up with the same object but in an ordered fashion
Yes even without any arguments give to Task.init()
, it has everything from the server
JitteryCoyote63 I think I found the bug in clearml-task
it adds it at the end instead of before everything else
Or you can do:
param={'key': 123}
task.connect(param)
Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?
@<1566959357147484160:profile|LazyCat94>
I found the issue, the import of clearml should be before anything else, this way it patch the Argparser before using it
from clearml import Task
Move it to the first line, everything should work 🙂
mostly by using
Task.create
instead of
Task.init
.
UnevenDolphin73 , now I'm confused , Task.create is Not meant to be used as a replacement for Task.init, this is so you can manually create an Additional Task (not the current process Task). How are you using it ?
Regarding the second - I'm not doing anything per se. I'm running in offline mode and I'm trying to create a dataset, and this is the error I get...
I think the main thing we need to...
FYI all the git pulls are cached even in docker mode so there is no "tax" to pay for pulling the sub-modules (only the first time of course)
Actually I saw that the
RuntimeError: context has already been set
appears when the task is initialised outside
if name == "main":
Is this when you execute the code or when the agent ?
Also what's the OS of your machine/ agent ?
Yes, actually ensuring pip is there cannot be skipped (I think in the past it cased to many issues, hence the version limit etc.)
Are you saying it takes a lot of time when running? How long is the actual process that the Task is running (just to normalize times here)
I think the easiest way is to add another glue instance and connect it with CPU pods and the services queue. I have to admit that it has been a while since I looked at the chart but there should be a way to do that
ElegantCoyote26 It means we need to have a keras logger that logs everything to trains, then we need to hook it automatically.
Do you feel like PR-ing the logger (the hooking I can take care of 🙂 )?
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to 🙂
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
Here you go 🙂
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
and this
server_info['url'] = f"http://{server_info['hostname']}:{server_info['port']}/{server_info['base_url']}/"
Do you have to have a value there ?
Hi ReassuredTiger98
I think DefiantCrab67 solved it 🙂
https://clearml.slack.com/archives/CTK20V944/p1617746462341100?thread_ts=1617703517.320700&cid=CTK20V944