Reputation
Badges 1
25 × Eureka!@<1687643893996195840:profile|RoundCat60> I'm assuming we are still talking about the S3 credentials, sadly no π
Are you familiar with boto and IAM roles ?
Hi @<1687653458951278592:profile|StrangeStork48>
secrets manager per se,
Quick question, are you running the trains-server over http or https ?
As we canβt create keys in our AWS due to infosec requirements
Hmmm
I suggest a bump in the GitHub issue
Still not supported π
Hi FierceFly22
Hi, does anyone know where trains stores tensorboard data
Tesnorboard data is stored wherever you point your file-writer to π
What trains is doing is while tensorboard writes it's own data to disk, it takes the data (in-flight) and sends it to the trains-server. The trains-server puts everything in the DB, so later everything is viewable & searchable.
Basically you don't need to store your TB files after your experiment is done, you have all the data in the trains-s...
Just curious, if
is a value I can set, where is it used?
It is used when Creating a dataset from inside the cluster (i.e. when launching using the clearml k8s glue),
it will have No effect on what users have on their local machines
i.e. they can always point to a diff server.
That said, when users create their initial clearml.conf and copy paste the info from the web UI, this value (or it might be another one, I'll double check later) will set the initial configuration the c...
Welp, it's been a day with the new settings, and stats went up 140K for API calls
... going to check again tomorrow to see if any of that was spill over from yesterday
140K calls a day, how often are you sending scalars ? how long is it running? how many experiments are running ?
Hmm there was a commit there that was "fixing" some stuff , apparently it also broke it
Let me see if we can push an RC (I know a few things are already waiting to be released)
This will fix it, the issue is the "no default value" that breaks the casting@PipelineDecorator.component(cache=False) def step_one(my_arg=""):
100% of things withΒ
task_overrides
Β would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
@<1523707653782507520:profile|MelancholyElk85>
What's the clearml
version you are using ?
Just making sure... base_task_id has to point to a Task that is in "draft" mode, for the pipeline to use it
I ended up usingΒ
task_overrides
Β for every change, and this way I only need 2 tasks (a base task and a step task, thus I useΒ
clone_base_task=True
Β and it works as expected - yay!)
Very cool!
BTW: you can also provide a function to create the entire Task, see base_task_factory
argument in add_step
I think it's still an issue, not critical though, because we have another way to do it and it works
I could not reproduce it, I think the issue w...
Is "project_name" diff for diff steps ? i.e. PipelineController(..., target_project='my_new_project')
is not enough?
@<1523707653782507520:profile|MelancholyElk85> I just run a single step pipeline and it seemed to use the "base_task_id" without cloning it...
Any insight on how to reproduce ?
@<1523707653782507520:profile|MelancholyElk85> what are you trying to change ? maybe there is a better way?
BTW: if you do step_base_task.export_task()
you can use the parts that you need in the dict and pass them to:task_overrides
argument in add_step
(you might need to flatten the nested arguments with '.' , and thinking about it, maybe we should do that automatically?!)
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
Oh, did you try task.connect_configuration
?
https://allegro.ai/docs/examples/reporting/model_config/#using-a-configuration-file
Hi UptightBeetle98
The hyper parameter example assumes you have agents ( trains-agent
) connected to your account. These agents will pull the jobs from the queue (which they are now, aka pending) setup the environment for the jobs (venv or docker+venv) and execute the job with the specific arguments the optimizer chose.
Make sense ?
Hi PungentLouse55 ,
Yes we have integration with hydra on the todo list since it was first released, we actually know the guy behind Hydra, he is awesome!
What are your thoughts on integration, we would love to get feedback and pointers (Hydra itself is quite capable, and we waiting until we have multiple configuration support, and with v0.16 it was added, so now it is actually possible)
Hi ProudMosquito87
My apologies there is still no concrete ETA ...
That said I think a good toy example would really help accelerate this process.
How about opening a PR with a nice hydra example, then we can start discussing implementation details based on the toy example ?
GrievingTurkey78 Actually it is in progress, see the GitHub issue for details:
https://github.com/allegroai/trains/issues/219
TrickyRaccoon92 actually Click is on the to do list as well ...
Hi SmarmyDolphin68
You have two options:
Automatically upload the models when training pass output_uri
to Task.init. For example output_uri=True
will upload to the clearml-server, output_uri='
s3://bucket/folder '
will upload to S3 etc. Manually upload a model that you have locally: https://github.com/allegroai/clearml/blob/9ff52a8699266fec1cca486b239efa5ff1f681bc/examples/reporting/model_config.py#L37