Reputation
Badges 1
25 × Eureka!LovelyHamster1 NICE! π
My main issue with this approach is that it breaks the workflow into βa-syncβ set of tasks:
This is kind of the way you depicted it, meaning, there is an an initial dataset, "offline process" (i.e. external labeling) then, ingest process.
I was wondering if the βwaitingβ operator can actually be a part of the pipeline.
This way it will look more clear what is the workflow we are executing.
Hmm, so pipeline is "aborted", then the trigger relaunches the pipeline, and the pipeli...
Hi ElegantCoyote26 , in theory no limit, but that depends on how you spined the services queue agent:
https://clear.ml/docs/latest/docs/clearml_agent/clearml_agent_daemon
See services mode :
To limit the number of simultaneous tasks run in services mode, pass the maximum number immediately after the
--services-mode
option (e.g.
--services-mode 5
)
Hi @<1697056701116583936:profile|JealousArcticwolf24>
Can you run your pipeline on an agent (i.e. remotely) but launching it from the UI (not the taskscheduler)?
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
Could it be you have some custom SSL certificate installed, or policy ?
can you get other https sites? (for example your clearml-server)
file and redirect the public url to k8 dns url?
Yes! that would work, Nice!
You can add it into the extra_docker_shell_script it will be executed in any pod the clearml-glue will spin (obviously this needs to be configured on the pod running the clearml k8s glue)
https://github.com/allegroai/clearml-agent/blob/ba2db4e727b90e595df2b13f458d9580659bf12e/docs/clearml.conf#L152
SmarmySeaurchin8 I might be missing something in your description. The way the pipeline works,
the Tasks in the DAG are pre-executed (either with "execute_remotely" or actually fully executed once").
The DAG nodes themselves are executed on the trains-agent , which means they reproduce the code / env for every cloned Task in the DAG (not on the original Tasks).
WDYT?
SpotlessFish46 So the expected behavior is to have the single script inside the diff, but you get empty string ?
GrievingTurkey78 did you open the 8008 / 8080 / 8081 ports on your GCP instance (I have to admit I can't remember where exactly in the admin panel you do that, but I can assure you it is there :)
clearml should detect the "main" packages used in the repository (not just the main-script), the derivatives will be installed automatically by pip when the agent is installing the environment, once the agent is done setting the environment, it updates back the Task with the full list of packages including all required packages.
Follow-up question: how does clearML "inject" the argparse arguments before the task is initialized?
it patches the actual parse_args call, to make sure it works you just need to make sure it was imported before the actual call takes place
I had to do another workaround since when
torch.distributed.run
called it's
ArgumentParser
, it was getting the arguments from my script (and from my task) instead of the ones I passed it
Are you saying...
basically the idea is you do not need to configure the Experiment manually, it is created when you actually develop the code / run/debug it, or you have the CLI taking everything from your machine and populating it
Hi ItchyHippopotamus18
The iteration reporting is automatically detected if you are using tensorboard, matplotlib, or explicitly with trains.Logger
I'm assuming there were no reports, so the monitoring falls back to report every 30seconds where the iterations are seconds from start" (the thing is, this is a time series, so you have to have an X axis...)
Make sense ?
First let's verify with the manual change, but yes
RipeGoose2 yes that will work π
That said, we should probably fix the S3 credentials popup π
WhimsicalLion91
What would you say the use case for running an experiment with iterations
That could be loss value per iteration, or accuracy per epoch (iteration is just a name for the x-axis in a sense , this is equivalent to time series)
Make sense?
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
@<1523701079223570432:profile|ReassuredOwl55> did you try adding manually ?
./path/to/package
You can also do that from code:
Task.add_requirements("./path/to/package")
# notice you need to call Task.add_requirements before Task.init
task = Task.init(...)
BTW: if you feel like writing a wrapper it could be cool π
Yes, experiments are standalone as they do not have to have any connecting thread.
When would you say a new "run" vs a new "experiment" ? when you change a parameter ? change data ? change code ?
If you want to "bucket them" use projects π it is probably the easiest now that we have support for nested projects.
No worries, I'll see what I can do π
Just making sure i understand, you are to upload your models with clearml to the Yandex compatible s3 storage?
Could it be pandas was not installed on the local machine ?