Reputation
Badges 1
25 × Eureka!Hi ExasperatedCrocodile76
It seems like it is using conda package manager, were you using conda when you run the code manually ?ERROR: This cross-compiler package contains no program /home/ivan/miniconda3/envs/clearML/bin/x86_64-conda_cos6-linux-gnu-gfortran
Why is it trying to install from source code?
BTW: can you test with the latest agent RC? ( pip install clearml-agent==1.4.0rc4
)
I think this is the issue, it was search and replaced . The thing is I'm not sure the helm chart is updated to clearml. Let me check
Where can I find information about that? I'd love to join!
This awesome , we have a few things in mind that we would love to improve. Do you have a lot of experience working with Trains? If you do, what would be most appealing for you ?
(We should probably better state it in the GitHub readme)
LuckyRabbit93 We do!!!
Hi ConvolutedBee40
If we deploy a task to
clearml-server
, will it automatically scale?
The way it works is with agents and agent glue, basically using k8s as a resource allocator and the clearml agent as orchestrator, did that answer the question ?
i think it can only run on multiple GPU at one node
Okay, the first step is to make sure your code is multi-node enabled, there is no magic for that 🙂
Hi DeliciousBluewhale87
This sounds like a great workflow to implement.
I guess my first question is how do you imagine the manager/director interacting with the system? What will they be shown, to allow them to approve/decline the model promotion ?
Think multiple hyper-paremter sections that we need to reference
(under the Tasks Configuration Tab, the Hyper parameters can have multiple sections)
See Args section in the screenshot
"Args/counter"
DistressedGoat23 notice the last argument in report_histogram, 'extra_layout'
https://clear.ml/docs/latest/docs/references/sdk/logger#report_histogram
You can then specify the plotly histogram orientation, full details here:
https://plotly.com/javascript/reference/bar/
I'm assuming the one you are after is 'orientation '
https://plotly.com/javascript/reference/bar/#bar-orientation
HealthyStarfish45 the pycharm plugin is mainly for remote debugging, you can of course use it for local debugging but the value is just to be able to configure your user credentials and trains-server.
In remote debbugging, it will make sure the correct git repo/diff are stored alongside the experiment (this is due to the fact that pycharm will no sync the .git folder to the remote machine, so without the plugin Trains will not know the git repo etc.)
Is that helpful ?
Could it be it checks the root target folder and you do not have permissions there only on subfolders?
Ok the doc needs fix (edited)
suggestion?
Ephemeral Dataset, I like that! Is this like splitting a dataset for example, then training/testing, when done deleting. Making sure the entire pipeline is reproducible, but without storing the data long term?
How can i get loaded model in Preporcess class in ClearML Serving?
ComfortableShark77
You mean your preprocess class needs a python package or is it your own module ?
Hi NastyOtter17
"Project" is so ambiguous
LOL yes, this is something GCP/GS is using:
https://googleapis.dev/python/storage/latest/client.html#module-google.cloud.storage.client
So sharing with the agent is also not possible.
But they can see each others experiments, so why wouldn't the agent be able to have a read-only access ?
BTW:
ReassuredTiger98 you can put your user/pass into the git URL link, but I'm not sure this will solve the privacy issue 😉
Thanks OutrageousGiraffe8
Any chance you can expand the example code to be a fully a reproducible toy code? (I would really like to make sure we fix it)
Hi ShinyPuppy47
getting this error pretty sprotically
What do you mean by "sporadically" ? This should be consistent ,either there is access to the clearml.conf, file or not. no ?!
What is your setup? Is this coming from the agent or manual execution ?
I guess that was never the intention of the function, it just returns the internal representation. Actually my question would be, how do you use it, and why? :)
This only talks about bugs reporting and enhancement suggestions
I'll make sure this is fixed 🙂
Hi ReassuredTiger98
To separate between minio and S3 we use:
s3://bucket/file for AWS S3 service and s3://server :port/bucket/file
for minio.
this means if your S3 links would have been s3://<minio-address>:<port>/bucket/file.bin
the UI would have popped the cred window.
Make sense ?
I see now.
Let's assume you know which snapshot that was:
` prev_task = Task.get_task(task_id='the_first_training_task_id')
get the second from last checkpoint
task.models['output'][-2].url
prev_scalars = prev_task.get_reported_scalars()
new_task = Task.init('example', 'new task')
logger = new_task.get_logger()
do some fpr loop and report the prev_scalars with logger.report_scalars
new_task.flush(wait_for_uploads=True)
new_task.set_initial_iteration(22000)
start the train `
For reporting the console logs you can use :logger.report_text("my log line here", print_console=False)
https://github.com/allegroai/clearml/blob/b4942321340563724bc16f60ea5dd78c9161778d/clearml/logger.py#L120
Parent makes sense if you are changing the data of the parent version, but some data is preserved. Which will make the delta-based storage only store the diff.
If everything is different, and you call sync
for example, then it will not reference any previous "snapshot", so there will be no redundancy in storage, but you still get a pointer to the "parent" version.
Make sense ?