I'm assuming some package imports absl (the TF define package) and that's the reason you see the TF defines). Does that make sense?
Okay, progress.
What are you getting when running the following from the git repo folder:git ls-remote --get-url origin
BattyLion34 if everything is installed and used to work, what's the difference from the previous run that worked ?
(You can compare in th UI the working vs non-working, and check the installed packages, it would highlight the diff, maybe the answer is there)
but the requirement was already satisfied.
I'm assuming it is satisfied on the host python environment, do notice that the agent is creating a new clean venv for each experiment. If you are not running in docker-mode, then you ca...
Hi ElegantCoyote26
If there is, it will have to be using the docker-mode, but I do not think this is actually possible because this is not a feature of docker. It is possible to do on k8s, but that's a diff level of integration π
EDIT:
FYI we do support k8s integration
hmm can you share the log of the Task? (the clearml-session created Task)
Could it be the Args section of the task it clones does not have the "input_train_data" argument ?
A few epochs is just fine
should I update nodejs in centos image ?
I think so, it might have been forgotten
AstonishingRabbit13
https://github.com/googleapis/google-cloud-python/issues/4941#issuecomment-369472576
check the openssl and the date, this seems like SSL low level error (even before authentication)
Hmm seems like everything is working, can you check in the UI if you see the serving session ID in the DevOps project? maybe there are two, and you configured one an dthe docker-compose is running another ?
overrides -> "kubectl run --overrides "
template -> "kubectl apply template.yaml"
The bug was fixed π
Can you test with the credentials also in the global section
None
key: "************"
secret: "********************"
Also what's the clearml python package version
Hi RipeGoose2
You can also report_table them? what do you think?
https://github.com/allegroai/clearml/blob/master/examples/reporting/pandas_reporting.py
https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/clearml/logger.py#L277
the time taken to upload halved. It is puzzling because as you say it's not that much to upload.
Maybe it was the load on the server? meaning dealing with multiple requests at the same time delayed the requests?!
For now I've whittled down the number of entries to a more select but useful few and that has solved the issue. If it crops up again I will try
connect_configuration
properly.
Thanks for your help!
My pleasure π
Done π
PompousBeetle71 quick question, will you ever want to pass an empty string ? reason for asking is that it is either one or the other, there is no way for Trains to actually differentiate (from the web UI, perspective this is just an empty string field...)
SmallDeer34 I have to admit this reference is relatively old, maybe we should update to auther http://clearml.ml (would that make sense ?)
We just donβt want to pollute the server when debugging.
Why not ?
you can always remove it later (with Task.delete) ?
that should have worked, do you want send the log?
restart_period_sec
I'm assuming development.worker.report_period_sec
, correct?
The configuration does not seem to have any effect, scalars appear in the web UI in close to real time.
Let me see if we can reproduce this behavior and quickly fix
I'm with on this one π it better to make a company wide decision on these things and not allow too much flexibility (just two options to choose from, and it should be enough, I think)
Hi JollyChimpanzee19
What are the versions (clearml , TF , PT), also could you add one more line from the stack (I.e. which call triggered the exception)
DeliciousKoala34 any chance you are using PyCharm 2022 ?
? Do you have a link how to setup a task scheduler to run in service mode in k8s?
basically spin the agent pod and add an argument to the agent itself (this is the --service-mode)
https://clear.ml/docs/latest/docs/clearml_agent#services-mode
Actually it cannot be differed, long story short when the agent is running the same code we have to verify and pass arguments at import time. I have to wonder, I'm expecting the env variables to be preset (I.e previously set for the entire environment) how come they are manually set inside the code (and wouldn't that break when running with an agent)?