Reputation
Badges 1
25 × Eureka!Hi DeliciousBluewhale87
When you say "workflow orchestration", do you mean like a pipeline automation ?
Even if you had any packages, I'm pretty sure there is nothing for you to worry about, it will just list them, and if they are preinstalled, the preinstalled will be used
Hi @<1533620191232004096:profile|NuttyLobster9>base_task_factory
is a function that gets the node definition and returns a Task to be enqueued ,
pseudo code looks like:
def my_node_task_factory(node: PipelineController.Node) -> Task:
task = Task.create(...)
return task
Make sense ?
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
(currently I think the implementation expects that if the download completed, it was successful)
Thanks BitterStarfish58 !
For example, ServerA stores file at /opt/clearml but ServeB stores at /some_path/clearml
As long as you adjust your docker-compose yaml file, should be just fine
But I do not have anything linked correctly since I rely in conda installing cuda/cudnn for me
From the log it installed:cudatoolkit==11.1.1
based on the CUDA it found on the host machine: agent.cuda_version = 110
But for some reason it installed the pytorch from the conda "pytorch" repo without the cuda support.
And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)
PleasantGiraffe85 you can disable the SSL verification on the client end:
https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L12
Basically you can just manually create the clearml.comf
with only the following:api { api_server:
web_server:
files_server:
`
credentials {"access_key": "EGRTCO8JMSIGI6S39GTP43NFWXDQOW", "secret_key": "x!XTov_G-#vspE*Y(h$Anm&DIc5Ou-F)jsl$PdOyj5wG1&E!Z8"}
# verify...
Hi GrittyKangaroo27
Is it possible to import user-defined modules when wrapping tasks/steps with functions and decorators?
Sure, any package (local included) can be imported, and will be automatically listed in the "installed packages" section of the pipeline component Task
(This of course assumes that on a remote machine you could do the "pip install <package")
Make sense ?
How did you add the args? Is it argparser? If so the help is automatically picked so you can see it in yhe UI. BTW, the ability to provide a list of options is a really cool feature to have, I'll make sure to pass ot to product π
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script
which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
Hi @<1644147961996775424:profile|HurtStarfish47>
. I see
Add image.jpg
being printed for all my data items ...
I assume you forgot to call upload
? the sync "marks" files for uploaded / deletion but the upload call actually does the work,
Kind of like git add / push , if that makes sense ?
BTW: for future reference, if you set the ulimit in the bash, all processes created after that should have the new ulimit
TenseOstrich47
I noticed that with one agent, only one task gets executed at one time
Yes you can π
Also, you are correct, a single agent will run a single Task at a time, that said you can have multiple agents running on the same machine, and when you launch them you specify which GPUs they use (in theory they can share the same GPU, but your code might not like it π )
You can see a few examples here:
https://github.com/allegroai/clearml-agent#running-the-clearml-agent
Specifically for model files, if you set the Task.init(..., output_uri=True) it will automatically upload any saved model to the files server (you can also pointΒ to any object storage / shared folder)
What's the framework you are using ?
SoggyBeetle95 is this secret a per Task secret, or is it for the agent itself (I.e. for all Tasks the agent will spin)?
Hi @<1734020162731905024:profile|RattyBluewhale45>
What's the clearml agent version? And could you verify with the latest RC?
Lastly how are you running the agent, docker mode? What's the bade container?
Hi @<1730033904972206080:profile|FantasticSeaurchin8>
Is this only relates to this
https://github.com/coqui-ai/Trainer/issues/7
Or is it a clearml sdk issue?
Hi @<1603198134261911552:profile|ColossalReindeer77>
When you select poetry as package manager the agent passes control to poetry, this means poetry needs to decide on hte correct torch wheel based on your cuda. I do not think poetry can do that, but I do think you can specify the extra index url to take the torch wheel from:
None
In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..
SoreDragonfly16 as SmallDeer34 mentioned, you can iterate over the Tasks, pull metrics (with either task.get_last_scalar_metrics
or task.get_reported_scalar
) then report them on the Task that runs the Loop itself with the Logger.
wdyt?
because comparing experiments using graphs is very useful. I think it is a nice to have feature.
So currently when you compare the graphs you can select the specific scalars to compare, and it Update in Real Time!
You can also bookmark the actual URL and it is fully reproducible (i.e. full state is stored)
You can also add custom columns to the experiment table (with the metrics) and sort / filter based on them, and create a summary dashboard (again like ll pages in the web app, URL is...
PompousBeetle71 so in one project the experiment works as expected, while in the other it fails on credentials ? both running on the same trains-agent machine ?