Reputation
Badges 1
25 × Eureka!PanickyMoth78 'tensorboard_logger' is an old deprecated package that meant to create TB events without TB, it was created before TB was a separate package. Long story short, it is not supported. That said if you just run the same code and replace tensorboard_logger with tensorboard, you should see all scalars in the UI
background:
ClearML logs TB events as they are created in real-time, TB_logger is not TB, it creates events and dumps them directly into a TB equivalent event file
How does it work with k8s?
You need to install the clearml-glue and them on the Task request the container, notice you need to preconfigure the clue with the correct Job YAML
Hi @<1801424298548662272:profile|ConvolutedOctopus27>
I am getting errors related to invalid git credentials. How do I make sure that it's using credentials from local machine?
configure the git_user/git_pass (app key) inside your clearml.conf on the machine with the agent:
None
Okay, I was able to reproduce it (this is odd) let me check ...
These paths are
pathlib.Path
. Would that be a problem?
No need to worry, it should work (i'm assuming "/src/clearml_evaluation/" actually exists on the remote machine, otherwise useless 🙂
that might be it.
Is the web UI working properly ?
What ports are you using?
(BTW: any reason not to use the agent?)
after generating a fresh set of keys
when you have a new set, copy paste them idirectly into the 'cleaml.conf' (should be at the top, can't miss it)
FreshParrot56 we could add this capability, but the main caveat is that f your version depends on multiple parent versions you still need to download and extract all the parent versions, which means that when you clear them you might hurt later performance. Does that make sense? What is the use-case / scenario for you?
trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)
Hi AdventurousRabbit79
Try:"extra_clearml_conf" : "aws { s3 {key: A, secret : B, region: C, }} ",Generally speaking no need for the quotes on the secret/key
You also need the comma to separate between keys.
You can test if it is working by adding the same string to your local clearml.conf and importing the cleaml package
Hi WackyRabbit7
So I'm assuming after the start_locally is called ?
Which clearml version are you using ?
(just making sure, calling Task.current_task() before starting the pipeline returns the correct Task?)
Hi OutrageousGrasshopper93
Are you working with venv or docker mode?
Also notice that is you need all gpus you can pass --gpus all
@<1523720500038078464:profile|MotionlessSeagull22> you cannot have two graphs with the same title, the left side panel presents graph titles. That means that you cannot have a title=loss series=train & title=loss series=test on two diff graphs, they will always be displayed on the same graph.
That said, when comparing experiments, all graph pairs (i.e. title+series) will be displayed as a single graph, where the diff series are the experiments.
Are you asking regrading the k8s integration ?
(This is not a must, you can run the clearml-agent bare-metal on any OS)
HI FranticCormorant35 , the Reporter is internal implementation the Logger uses. In general you should use the Logger.
ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
It will not create another 100 tasks, they will all use the main Task. Think of it as they "inherit" it from the main process. If the main process never created a task (i.e. no call to Tasl.init) then they will create their own tasks (i.e. each one will create its own task and you will end up with 100 tasks)
git config --system credential.helper 'store --file /root/.git-credentials'
Maybe we should use this hack for cloning with user/token in general ...
yes, that makes sense to me.
What is your specific use case, meaning when/how do you stop / launch the hpo?
Would it make sense to continue from a previous execution and just provide the Task ID? Wdyt?
Correct (copied == uploaded)
👍
Okay But we should definitely output an error on that
I suspect it failed to create one on the host and then mount into the docker
So dynamic or static are basically the same thing, just in dynamic, I can edit the artifact while running the experiment?
Correct
Second, why would it be overwritten if I run a different run of the same experiment?
Sorry, I meant in the same run, if you reuse the artifact name you will be overwriting it. Obviously different runs different artifacts :)
So you have two options
- Build the container from your docker file and push it to your container registry. Notice that if you built it on the machine with the agent, that machine can use it as Tasks base cintainer
- Use the From container as the Tasks base container and have the rest as docker startup bash script. Wdyt?
ValueError: Missing key and secret for S3 storage access
Yes that makes sense, I think we should make sure we do not suppress this warning it is too important.
Bottom line missing configuration section in your clearml.conf