
Reputation
Badges 1
25 × Eureka!Hi RotundSquirrel78
How did you end up with this command line?/home/sigalr/.clearml/venvs-builds/3.8/code/unet_sindiff_1_level_2_resblk --dataset humanml --device 0 --arch unet --channel_mult 1 --num_res_blocks 2 --use_scale_shift_norm --use_checkpoint --num_steps 300000
the arguments passed are odd (there should be none, they are passed inside the execution) and I suspect this is the issue
Hi LovelyHamster1
That is a good point, I think the safest / robust way is to configure both to use the same dns name/s so both (internal/external) are accessible.
Some background, the URL itself on the artifact is basically a standalone, once registered on the Task, the UI will not replace it but use it as is (The UI has no "understanding" on which server it is, it will just fetch the file).
Are you also using a diff port on the load balancer ?
(because the easiest fix is on your external ...
Hi NastyFox63
This seems like most of the reports are converted to pngs (which is what the automagic does if it fails to convert the matplotlib into interactive plot).
no more than 114 plots are shown in the plots tab.
Are you saying we have 114 limit on plots ?
Is this true for "full screen" mode (i.e. not in the experiments table but switch to full detailed view)
actually no
hmm, are those packages correct ?
Hi CharmingShrimp37
Go to Github to your newly forked repo, you should have a green button suggesting to take your branch and making it a PR. It is that simple π
Hi DeliciousBluewhale87
You mean per Task? Is it reporting? Is it like the project overview?
Yes including this. (There was a fix to an issue with trains-agent
and disabling frameworks, it is already part of 0.16.3 )
RoughTiger69
Apparently,
, doesnβt populate that dict with
any keys that donβt already exist in it
.
Are you saying new entries are not added to the Dict even if they are on the Task (i.e. only entries that already exist on the dict are populated ?
But you already have all the entries defined here:
https://github.com/allegroai/clearml/blob/721569bb77d89d89e5b4f32a0ed98311c4574650/examples/services/aws-autoscaler/aws_autoscaler.py#L22
Since all this is ha...
Sen the full Task log, you can DM it if it is easier
I can see that the data is reloaded each time, even if the machine was not shut down in between.
You can verify by looking into the Task's Log, it will contain all the docker arguments, one of them should be the cache folder mount
DepressedChimpanzee34
I am actually curious now, why is the default like this? maybe more people are facing similar bottlenecks?
On "regular" load there is no need for multiple processes, and the memory consumption might be more important than reply lag (at least before you start to scale)
DisturbedWalrus17
By spawning multiple processes for the API server, it looks like we utilise the CPU more now but the UI and API calls are still lagging a lot
Can you try with even more ...
SarcasticSquirrel56 when the process dies (i.e. killed) it does not have time not update the state, then the server watchdog will set the state to aborted after X amount of time of inactivity (default is 2 hours)
Martin, if you want, feel free to add your answer in the stackoverflow so that I can mark it as a solution.
Will do π give me 5
Let me know if there is an issue π
SmarmySeaurchin8 regarding the original question:task.set_project(project_id)
Task.get_projects() to get all the project names/ids
Ohh that's why you don't have it π
Wait, with the Port it does not work?
Notice that since this is external S3 you have to have the port specified so it Knows this is not an AWS S3 but a different compatible service
not sure if this is considered a bug or not! but Iβd happily make an issue on github if needed.
I think we should, at least for the sake of transparency and visibility π
thanks again for all your help.
My pleasure π
that must have been it. hereβs the installed packages when not usingΒ
-m
:
Hmm yes, can you open a GitHub issue on that? (this seems like a bug)
if I run my own ClearML self-hosted server?
Then you have everything on your end, it will not communicate with the saas offering. meaning no limits what so ever.
(That said some of the cloud auto-scaling and compute features are not part of the open source)
BTW: I tested the code you previously attached, and it showed the plot in the "Plots" section
(Tested with latest trains from GitHub)
I wanted to know what the best way to create and register the SSL keys is.
of I see, so basically you need to add it to add nginx with SSL certificates on top of the hosted service (or configure the dockercompose nginx container to add that)
Then you need to add the self signed SSL into any host machine (I'm assuming these are not "valid" SSL certificates generated by a reputable SSL provider)
But generally speaking if you are using self hosted clearml-server on a local machine that n...
TroubledHedgehog16 generally speaking you can expect about 10 api calls per minute if you have many reports, and about 3 per minute on low report. We just optimized the sdk so in cases there are lots of consequential reports they are better batched, I would recommend the latest RC
WickedGoat98 Actually the fileserver replied, so it all looks fine to me.
Try to run the text example again, see if you are still getting the fileserver error .
Could it be you have old OS environment overriding the configuration file ?
Can you change the IP of the server in the conf file, and make sure it has an effect (i.e. the error changed)?
Oh I see, what you need is to pass '--script script.py' as entry-point and ' --cwd folder' as working dir