![Profile picture](https://clearml-web-assets.s3.amazonaws.com/scoold/avatars/AgitatedDove14.png)
Reputation
Badges 1
25 × Eureka!Also what do you have in the "Configuration" section of the serving inference Task?
Is the agent idle ? it is running something else ?
It doesn't not seem to be related to the upload. The upload itself finished... What's your Trains version?
Okay this more complicated but possible.
The idea is to write a glue layer (service) that pulls from the (i.e UI) queue
sets the slurm job
and puts it in a pending queue (so you know the job s waiting in the slurm scheduler)
There is a template here:
https://github.com/allegroai/trains-agent/blob/master/trains_agent/glue/k8s.py
I would love to help and setup a slurm glue in a similar manner
what do you think?
RobustSnake79 this one seems like scalar type graph + summary table, correct?
BTW: I'm not sure how to include the "Recommendation" part π
Hi ReassuredOwl55
The easiest is to configure it as default output_uri in the clearml.conf of file the agent, wdyt?
https://github.com/allegroai/clearml-agent/blob/ebb955187dea384f574a52d059c02e16a49aeead/docs/clearml.conf#L430
I see what you mean.an_optimizer = HyperParameterOptimizer( base_task_id='39d2c27baa8145929b2e21f686a17046', hyper_parameters=[], objective_metric_title='epoch_accuracy', objective_metric_series='epoch_accuracy', objective_metric_sign='max', optimizer_class=aSearchStrategy, max_iteration_per_job=0, total_max_jobs=0, auto_connect_task=False, ) print(an_optimizer.get_top_experiments(top_k=5))
You might need to play around a bit, it might be that StorageHelper.get(' gs://bucket ') and then helper.list('folder/*')
Let me know what worked π
Yes, offline got broken in 1.3.0 π , RC fixed it:pip install clearml==1.3.1rc0
Stable release later this week
If the right properties are set can the profile tab be added?
I guess that is doable, that said some of the graphs are not straight forward to support like this one:
https://www.tensorflow.org/guide/images/tf_profiler/trace_viewer.png
GrievingTurkey78 short answer no π
Long answer, the files are stored as differentiable sets (think changes set from the previous version(s)) The collection of files is then compressed and stored as a single zip. The zip itself can be stored on Google but on their object storage (not the GDrive). Notice that the default storage for the clearml-data is the clearml-server, that said you can always mix and match (even between versions).
Hi GrievingTurkey78 ,
Yes this is a per file download, but I think you can list the bucket and download everything
Try:from trains import StorageManager from trains.storage.helper import StorageHelper helper = StorageHelper.get('gs://bucket/folder') remote_files = helper.list('*') for f in remote_files: StorageManager.get_local_copy(f)
Containers are not running
? but you are running the docker-compose, how come no containers are running ?
@<1560074028276781056:profile|HealthyDove84> if you want you can PR a fix, it should be very simple basically:
None
elif np_dtype == str:
return "STRING"
elif np_dtype == np.object_ or np_dtype.type == np.bytes_:
return "BYTES"
return None
Lambdaβs are designed to be short-lived, I donβt think itβs a fine idea to run it in a loop TBH.
Yeah, you are right, but maybe it would be fine to launch, have the lambda run for 30-60sec (i.e. checking idle time for 1 min, stateless, only keeping track inside the execution context) then take it down)
What I'm trying to solve here, is (1) quick way to understand if the agent is actually idling or just between Tasks (2) still avoid having the "idle watchdog" short lived, to that it can...
Hi @<1691620877822595072:profile|FlutteringMouse14>
In the latest project I created, Hydra conf is not logged automatically.
Any chance the Task.init call is not on the main script (where the Hydra is) ?
So this should be easier to implement, and would probably be safer.
You can basically query all the workers (i.e. agents) and check if they are running a Task, then if they are not (for a while) remove the "protection flag"
wdyt?
No, I just want to register a new model in the storage.
Is the model file is already uploaded, you can register it without a Task:InputModel.import_model(...)
https://github.com/allegroai/clearml/blob/b3a2b3425c5098ebfc0598c9dfb3e670d4a87706/clearml/model.py#L521
I need to create a separate task for this right?
If you want the model to be uploaded, then yes you have to create a Task.
So far my local and remote gitlab repositories are synchronized, I suspect, thatΒ
Failed applying git diff, see diff above
Β error is caused by cached repository from which clearml tries to run the process. I've cleaned the cache, but it haven't helped.
Hmm can you test with empty "uncommitted changes" ?
Just making sure when you say still does n't work, you are not trying to run the Task with the git diff that includes teh binary data right?
Yes, but where I can fi...
Bad news, there isn't a nice interface to get the table from the Optimizer object (I will make sure we add it, no reason not to).
But you can very easily get all the information you need and more:all_the_tasks = an_optimizer.get_top_experiments(top_k=100)
Then for every task in the list you can get All the information:for task in all_the_tasks: task_params_as_dict = task.get_parameters() task_scalars = task.get_last_scalar_metrics()
Basically the Task object enables you to que...
Hi @<1523702868694011904:profile|AbruptCow41>
Check what are you getting when running git status
inside the working directory, this is essentially how it works. Are you expecting to later run it with an agent?
Hi GrievingTurkey78
I'm assuming similar to https://github.com/pallets/click/
?
Auto connect and store/override all the parameters?
Hmmm, can you view the settings? that's the only thing I can think of at the moment that will be diff between your setup and the working one...
Also, is there a way for you to have the trains-server behind https (on your GCP)
EnviousStarfish54
it seems that if I don't use plt.show() it won't show up in Allegro, is this a must?
Yes , at plt.show / plt.save Trains will capture the plot and send it to the backend.
BTW: when you hover over the empty plot area, do you see the plotly objects, or is it all blank ?
EnviousStarfish54 Sure, see scatter2d
https://allegro.ai/docs/examples/reporting/scatter_hist_confusion_mat_reporting/#2d-scatter-plots
TenseOstrich47 notice:task.logger.report_matplotlib_figure( title=f"Performance Heatmap - {name}", series="Device Brand Predictions", iteration=0, figure=figure, **report_image=True,** )
report_image=True means it will be uploaded as an image not a plot (like imshow), the default is False , which would put it under Plots section
Code you add a few prints, and see where it hangs ? there's no reason for it to hang (even the plot upload is done ...
Hi UnsightlyShark53 I think you are absolutely right, there is no reason for the trains.errors.UsageError: ArgumentParser.parse_args() ...
Error.
As you mentioned, if auto_connect_arg_parser=False
is False, it should just ignore what it picked automatically.
I will make sure the error is resolved I will also make sure, you will still be able to connect the argparse manually with task.connect(parser)
after the Task has been created. Thanks for the reference! I took a look o...