Reputation
Badges 1
25 × Eureka!PanickyMoth78 'tensorboard_logger' is an old deprecated package that meant to create TB events without TB, it was created before TB was a separate package. Long story short, it is not supported. That said if you just run the same code and replace tensorboard_logger with tensorboard, you should see all scalars in the UI
background:
ClearML logs TB events as they are created in real-time, TB_logger is not TB, it creates events and dumps them directly into a TB equivalent event file
Yes it should
here is fastai example, just in case 🙂
https://github.com/allegroai/clearml/blob/master/examples/frameworks/fastai/fastai_with_tensorboard_example.py
Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init
function is ignored, because the Task already exists. Make sense ?
So yes, it creates the Task on your machine (with the name and project etc.)
Then it stops the local process, and pushes it into the exxecution queue, when the agent pulls it, and re-executes the code, it will ignore the Task.init
I am struggling with configuring ssh authentication in docker mode
GentleSwallow91 Basically the agent will automatically mount the .ssh into the container , just make sure you set the following in the clearml.conf:force_git_ssh_protocol: true
https://github.com/allegroai/clearml-agent/blob/178af0dee84e22becb9eec8f81f343b9f2022630/docs/clearml.conf#L30
I'm assuming you cannot directly access port 10022 (default ssh port on the remote machine) from your local machine, hence the connection issue. Could that be?
GentleSwallow91 notice that on the Task you have "Installed Packages" this is the equivalent of requirments.txt , you can edit it and add a missing package, or programatically add it in code (though usually directly imported packages are automatically registered, how come this one is missing?)
to add a package in code:Task.add_requirements(package_name="my_package", package_version=">=1") task = Task.init(...)
base docker image but clearML has not determined it during the script ru...
Here are my extra_docker_arguments that make the thing working:
GentleSwallow91 Nice!
BTW: in theory there should not need to be any need to add the specific: "-v","/home/nino/.ssh:/home/testuser/.ssh", the agent should do that automatically
So if I do this in my local repo, will it mess up my git state, or should I do it in a fresh directory?
It will install everything fresh into the target folder (including venv and code + uncommitted changes)
this is not the case as all the scalars report the same iterations
MassiveHippopotamus56 could it be the the machine statistics? (i.e. cpu/gpu etc. these are considered scalars as well...)
Sigint (ctrl c) only
Because flushing state (i.e. sending request) might take time so only when users interactively hit ctrl c we do that. Make sense?
MassiveHippopotamus56
the "iteration" entry is actually the "max reported iteration over all graphs" per graph there is different max iteration. Make sense ?
MoodyCentipede68 from your log
clearml-serving-triton | E0620 03:08:27.822945 41 model_repository_manager.cc:1234] failed to load 'test_model_lstm2' version 1: Invalid argument: unexpected inference output 'dense', allowed outputs are: time_distributed
This seems the main issue of triton failing to.load
Does that make sense to you? how did you configure the endpoint model?
based on this one:
https://stackoverflow.com/questions/31436407/git-ls-remote-returns-fatal-no-remote-configured-to-list-refs-from
I think this is a specific issue of the local git repo configuration, can you verify
(btw: I tested with git 2.17.1 git ls-remote --get-url
will return the remote url, without an error)
StaleKangaroo85 check https://demoapp.trains.allegro.ai/projects/0e152d03acf94ae4bb1f3787e293a9f5/experiments/193ac2bced184c49a57658fceb4bd7f9/info-output/metrics/plots?columns=type&columns=name&columns=status&columns=project.name&columns=user.name&columns=started&columns=last_update&columns=last_iteration&order=last_update on the demo server, seems okay to me...
Hi OutrageousSheep60
Is there a way to instantiate a
clearml-task
while providing it a
Dockerfile
that it needs to build prior to executing the task?
Currently not really, as at the aned the agent does need to pull a container,
But you can cheive basically the same by adding the "dockerfile" script as --docker_bash_setup_script
Notice of course that this is an actual bash script not Docker script, so no need for "RUN" prefix.
wdyt?
Hi @<1523715429694967808:profile|ThickCrow29> , thank you for pinging!
We fixed the issue (hopefully) can you verify with the latest RC? 1.14.0rc0 ?
LOL 🙂
Make sure that when you train the model or create it manually you set the default "output_uri"
task = Task.init(..., output_uri=True)
or
task = Task.init(..., output_uri="s3://...")
Sometimes it is working fine, but sometimes I get this error message
@<1523704461418041344:profile|EnormousCormorant39> can I assume there is a gateway at --remote-gateway <internal-ip>
?
Could it be that this gateway has some network firewall blocking some of the traffic ?
If this is all local network, why do you need to pass --remote-gateway ?
Hmm so you are saying you have to be logged out to make the link work? (I mean pressing the link will log you in and then you get access)
Hi GrotesqueOctopus42
In theory it can be built, the main hurdle is getting elk/mongo/redis containers for arm64 ...
Yes, could you send the full log? screen grab ?
Hi YummyFish22
Looks like the task does not have "Task.init" call on the main script (or an import of clearml)? could that be the case?
This part is odd:SCRIPT PATH: tmp.7dSvBcyI7m
How did you end with this random filename? how are you running this code?
Hi @<1523711619815706624:profile|StrangePelican34>
Hmm, I think this is missing from the docs, let me ping the guys about that 🙏
And same behavior if I make the dependance explicty via the retunr of the first one
Wait, are you saying that in the code above, when you abort "step_a" , then "step_b" is executed ?
BTW:
If I try to find the right model in the
task.models["output"]
(this time there is just one but in my code there may be several) it appears with the
(see other attached screenshot).
What would make sense here ? (I have to be honest I'm not sure).
To be specific there is "model name" which is not unique , and there is model-key which is unique to the Task (i.e. task.models["output"]["model-key"]
)
Okay let me check if we can reproduce, definitely not the way it is supposed to work 😞
ShortElephant92 yep, this is definitely enterprise feature 🙂
But you can configure user/pass on the open source, even store as hasedh the passwords if you need.