Reputation
Badges 1
25 × Eureka!Hi RattyBat71
Do you tend to create separate experiments for each fold?
If you really want to parallelized the workload, then splitting it to multiple executions (i.e. passing an argument of the index of the same CV) makes sense, then you can compare / sort the results based on a specific metric. That said if speed is not important, just having a single script with multiple CVs might be easier to implement?!
Can you see the repo itself ? the commit id ?
Hi @<1625303806923247616:profile|ItchyCow80>
Could you add some prints ? Is it working without the Task.init call? the code looks okay and the - No repository found, message basically says it logs it as a standalone script (which makes sense)
GiddyTurkey39
A flag would be really cool, just in case if theres any problem with the package analysis.
Trying to think if this is a system wide flag (i.e. trains.conf) or a flag in task.init.
What do you think?
Hmm I just noticed:
'--rm', '', 'bash'
This is odd this is an extra argument passed as "empty text" how did that end up there? could it be you did not provide any docker image or default docker container?
Hi GrievingTurkey78
the artifacts are downloaded to the cache folder (and by default the last 100 accessed artifacts are maintained there).
node executes the task all the info will be erased or does this have to be done explicitly?
Are you referring to the trains-agent running a docker?
By default the cache is persistent between execution (i.e. saving time on multiple downloads between experiments)
Hmm yeah I think that makes sense. Can you post here the arguments?
I'm assuming you have something like '1.23a' in the arguments?
I think latest:
clearml==1.17.0
matplotlib==3.6.2
shap==0.46.0
Python 3.10
think this is because of the version of xgboost that serving installs. How can I control these?
That might be
I absolutely need to pin the packages (incl main DS packages) I use.
you can basically change CLEARML_EXTRA_PYTHON_PACKAGES
https://github.com/allegroai/clearml-serving/blob/e09e6362147da84e042b3c615f167882a58b8ac7/docker/docker-compose-triton-gpu.yml#L100
for example:export CLEARML_EXTRA_PYTHON_PACKAGES="xgboost==1.2.3 numpy==1.2.3"
Hi EnviousStarfish54
The Enterprise edition extends Trains functionality.
It adds security, scale and full data management (data management and versioning being the key difference)
You can get it as a saas solution or on prem.
If you need more information, you can leave contact details on the website, I'm sure sales will be happy to help :)
If this is a simple two level nesting:
You can use the section name:task.connect(param['data'], name='data') task.connect(param['model'], name='model')Would that help?
The comparison reflects the way the data is stored, in the configuration context. that means section name & key value (which is what the code above does)
Could you verify the Task.init call is inside the main function and Not the global scope? We have noticed some issues with global scope calls in some cases
Okay let me check if we can reproduce, definitely not the way it is supposed to work 😞
But I think this error has only appeared since I upgraded to version 1.1.4rc0
Hmm let me check something
Is there a way to force clearml not to upload these models?
DistressedGoat23 is it uploading models or registering them? to disable both set auto_connect_frameworks https://clear.ml/docs/latest/docs/clearml_sdk/task_sdk#automatic-logging
Their name only contain the task name and some unique id so how can i know to which exact training
You mean the models or the experiments being created ?
The confusion matrix shows under debug sample, but the image is empty, is that correct?
SolidSealion72 this makes sense, clearml deletes artifacts/models after they are uploaded, so I have to assume these are torch internal files
AstonishingSeaturtle47 that's awesome! Could you explain the hack, it might be helpful for others (I assume :))
Yes the easiest is os.environ call before the import
Regarding azure blob
General azure env vars should work because it configure the underlying azure sdk, but I would double check
Generally speaking
Generic Override Format
ClearML allows you to override any config entry using this format:
bash
CLEARML__<section>__<key>=<value>
Double underscores __ separate the hierarchy levels.
All keys and values are treated as strings.
This works for nested entries in clearml.conf.
@<1541954607595393024:profile|BattyCrocodile47> first let me say I ❤ the dark theme you have going on there, we should definitly add that 🙂
When I run
python set_triggers.py; python basic_task.py
, they seem to execute, b
Seems like you forgot to start the trigger, i.e.
None
(this will cause the entire script of the trigger inc...
from your jupyterlab can you do:!curl
Hi @<1610083503607648256:profile|DiminutiveToad80>
This sounds like the wrong container ? I think we need some more context here
ZanyPig66 is this reproducible? This sounds like a bug, whats the TB version and OS you rae using?
Is this example working for you (i.e. you see debug images)
https://github.com/allegroai/clearml/blob/master/examples/frameworks/pytorch/pytorch_tensorboard.py
The address is valid. If i just go to the files server address on my browser,
@<1729309131241689088:profile|MistyFly99> what is the exact address of those files? (including the http prefix) and what is the address of the web application ?
Hi SmallDeer34
Can you see it in TB ? and if so where ?
JitteryCoyote63 any chance the trains-agent-1 is running in services mode ?
Which means it will spin more than a single experiment at once