Reputation
Badges 1
8 × Eureka!There are an astounding number of such channels actually. It probably depends on your style. Would you like me to recommend some?
Of course we can always create a channel here as well... One more can't hurt π
Okay, I'll make a channel here today, and the sticky post on it would be a list of other channels πͺ
For now here is one of my favorites:
Hi SubstantialElk6 , have a look at Task.execute_remotely, and it's especially for that. For instance in the recent webinar, I used pytorch-cpu on my laptop and task.execute_remotely. the agent automatically installs the GPU version. Example https://github.com/abiller/events/blob/webinars/webinars/flower_detection_rnd/A1_dataset_input.py
Another tip - if you have uncommitted changes on top of a commit, you will have to push that commit before the agent can successfully apply the diff in remote mode π
I need to check something for you EnviousStarfish54 , I think one of our upcoming versions should have something to "write home about" in that regard
EnviousStarfish54 lets refine the discussion - are you looking at structured data (tables etc.) or unstructured (audio, images etc)
EnviousStarfish54 that is the intention, it is cached. But you might need to manage your cache settings if you have many of those, since there is an initial sane setting for the cache size. Hope this helps.
This looks like a genuine git fetch issue. Trains would have problems figuring the diff if git cannot find the base commit...
Do you have submodules on the repo? did the DS push his/her commits?
submouldes == git submodules
that's how it is supposed to work π let us know if it does not.
There are several ways of doing what you need, but none of them are 'magical' like we pride ourselves for. For that, we would need user input like yours in order to find the commonalities.
That's interesting, how would you select experiments to be viewed by the dashboard?
WackyRabbit7 It is conceptually different than actually training, etc.
The service agent is mostly one without a gpu, runs several tasks each on their own container, for example: autoscaler, the orchestrators for our hyperparameter opt and/or pipelines. I think it even uses the same hardware (by default?) of the trains-server.
Also, if I'm not mistaken some people are using it (planning to?) to push models to production.
I wonder if anyone else can share their view since this is a relati...
Its built in π and Its for... "Services"
https://github.com/allegroai/trains-server#trains-agent-services--
Did you try archiving all the experiments and then deleting the project?
Hi, I was just answering your previous question. can you explain a bit what you mean by "under utilized"? e.g. do you have 2 gpus and are using only one of them for a task?
or are maxing out resources but do not get to 100% utilization (which might be a data pipeline issue)
EnviousStarfish54 I recognize this table π i'm glad you are already talking with the right person. I hope you will get all your questions answered.
SubstantialBaldeagle49 not at the moment, but it is just a matter of implementing an apiclient call. you can open a feauture request for a demo on github, it will help make it sooner than later
if she does not push, trains has a commit id for the task that does not exist on the git server. if she does not commit - trains will hold all the diff from the last commit on the server.
Hi SubstantialBaldeagle49 ,
certainly if you upload all the training images or even all the test images it will have a huge bandwidth/storage cost (I believe bandwidth does not matter e.g. if you are using s3 from ec2) If you need to store all the detection results (for example, QA, regression testing), you can always save the detections json as artifact and view them later in your dev environment when you need. The best option would be to only upload "control" images and "interesting" im...
I'm not sure I can help with the technicality, but here is a basic question you'll be aksed - are you able to download anything from your minio using ClearML?
BattyLion34 this is up to the discretion of the meetup organizers. At any case, I am going to use the same demos to create several of my stuffed animal videos (we can also upload the same videos without the stuffed animals if there is demand for that)
I think the most up-to-date documentation for that is currently on the github repo, right SuccessfulKoala55 ?
https://github.com/allegroai/clearml-server-helm
Hi! Looks like all the processes are calling torch.save so it's probably reflecting what Lightning did behind the curtain. Definitely not a feature though. Do you mind reporting this to our github repo? Also, are you also getting duplicate experiments?