Reputation
Badges 1
25 × Eureka!JitteryCoyote63 I think that without specifically adding torch to the requirements, the agent will not be able to automatically resolve the correct cuda/torch version. Basically you should add torch to the requirements.txt file, and provide it to Task create, or use Task.force_requirements_env_freeze
GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?
it means it should work in
~/clearml.conf
no?
Yes exactly
I was hoping to be able to set the default server-wide
I think this type of server-side wide defaults is not supported in the open-source version.
But in most cases, setting it up on the clearml-agents is probably the important thing. btw: you can also set it in an OS environment CLEARML_DEFAULT_OUTPUT_URI
if project_name is None and Task.current_task() is not None: project_name = Task.current_task().get_project_name()
This should have fixed it, no?
This would be my only improvement, otherwise awesome!!!output_model.update_weights(weights_filename=os.path.join(training_data_path, 'runs', 'train', 'yolov5s6_results', 'weights', 'best.onnx'))
Probably less secure though :)
I think this is the issue, it was search and replaced . The thing is I'm not sure the helm chart is updated to clearml. Let me check
Where can I find information about that? I'd love to join!
This awesome , we have a few things in mind that we would love to improve. Do you have a lot of experience working with Trains? If you do, what would be most appealing for you ?
(We should probably better state it in the GitHub readme)
LuckyRabbit93 We do!!!
Hi ConvolutedBee40
If we deploy a task to
clearml-server
, will it automatically scale?
The way it works is with agents and agent glue, basically using k8s as a resource allocator and the clearml agent as orchestrator, did that answer the question ?
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
See Args section in the screenshot
"Args/counter"
DistressedGoat23 notice the last argument in report_histogram, 'extra_layout'
https://clear.ml/docs/latest/docs/references/sdk/logger#report_histogram
You can then specify the plotly histogram orientation, full details here:
https://plotly.com/javascript/reference/bar/
I'm assuming the one you are after is 'orientation '
https://plotly.com/javascript/reference/bar/#bar-orientation
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
Could it be it checks the root target folder and you do not have permissions there only on subfolders?
Ok the doc needs fix (edited)
suggestion?
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
Hi NastyOtter17
"Project" is so ambiguous
LOL yes, this is something GCP/GS is using:
https://googleapis.dev/python/storage/latest/client.html#module-google.cloud.storage.client
So sharing with the agent is also not possible.
But they can see each others experiments, so why wouldn't the agent be able to have a read-only access ?
BTW:
ReassuredTiger98 you can put your user/pass into the git URL link, but I'm not sure this will solve the privacy issue 😉
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
Hi ShinyPuppy47
getting this error pretty sprotically
What do you mean by "sporadically" ? This should be consistent ,either there is access to the clearml.conf, file or not. no ?!
What is your setup? Is this coming from the agent or manual execution ?
I guess that was never the intention of the function, it just returns the internal representation. Actually my question would be, how do you use it, and why? :)
This only talks about bugs reporting and enhancement suggestions
I'll make sure this is fixed 🙂
Hi ReassuredTiger98
To separate between minio and S3 we use:
s3://bucket/file for AWS S3 service and s3://server :port/bucket/file
for minio.
this means if your S3 links would have been s3://<minio-address>:<port>/bucket/file.bin
the UI would have popped the cred window.
Make sense ?
I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')
get the second from last checkpoint
task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()
do some fpr loop and report the prev_scalars with logger.report_scalars
new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)
start the train `
For reporting the console logs you can use :logger.report_text("my log line here", print_console=False)
https://github.com/allegroai/clearml/blob/b4942321340563724bc16f60ea5dd78c9161778d/clearml/logger.py#L120