Reputation
Badges 1
85 × Eureka!i mean linking more in UI.. as when i go to model detail page, i can see that a given experiment created this model and click on that to see its detail... so something similar to that for ensemble models
so as you say.. i don't think the issue i am seeing is due to this error
i don't need this right away.. i just wanted to know the possibility fo dividing the current machine into multiple workers... i guess if its not readily available then may be you guys can discuss to see if it makes sense to have it on roadmap..
i am simply proxying it using ssh port forwarding
TimelyPenguin76 also is there any reason for trating show
and imshow
differently
allegroai/trains
image hash f038c8c6652d
allegroai/trains-agent-service
image hash 03dc85869afe
or is there any plan to fix it in upcoming release
TimelyPenguin76 yeah when i run matplotlib with show
plots does land under Plot
section... so its mainly then the imshow
part.. i am wondering why the distinction and what is the usual way to emit plots to debug samples
i know its not magic... all linux subsystem underneath.. just to configure it in a way as needed 🙂 for now i think i will stick with current setup of cpu-only mode and co-ordinate with in the team. later one when need comes .. will see if we go for k8s or not
simply changing to show
doesn't work in my case as i am displaying CM.. what about if i use matshow
seems like CORS issue in the console logs
TimelyPenguin76 is there any way to do this using UI directly or as a schedule... otherwise i think i will run the cleanup_service as given in docs...
ok will give it a try and let you know
yeah i still see it.. but that seems to be due to dns address being blocked by our datacenter
whereas i am using simple matplotlib now
not just fairness but the scheduled workloads will be starved of resources if say someone run training which by default take all the available cpus
thanks for letting me know.. but it turns out after i have recreated my whole system environment from scratch, trains agent is working as expected..
Cross-Origin Request Blocked: The Same Origin Policy disallows reading the remote resource at
http://localhost:8081/Trains%20Test/LightGBM.56ca0c9c9ebf4800b7e4f537295d942c/metrics/LightGBM%20Feature%20Importance%20above%200.0%20threshold/plot%20image/LightGBM%20Feature%20Importance%20above%200.0%20threshold_plot%20image_00000000.png . (Reason: CORS request did not succeed).
seems like port forwarding had an issue.. fixed that.. now running test again to see if things workout as expected
i guess i was not so clear may be.. say e.g. you running lightgbm model training, by default it will take all the cpus available on the box and will run that many threads, now another task got scheduled on the same box now you have 2x threads with same amount of CPU to schedule on. So yes the jobs will progress but the progression will not be the same due to context switches which will happen way more than say if we have allowed on 1/2x threads for each job
AgitatedDove14 it seems i am having issues when i restart the agent... it fails in creating/setting up the env again... when i clean up the .trains/venv-builds
folder and run a job for agent.. it is able to create the env fine and run job successfully.. when i restart the agent it fails with messages like
` Requirement already satisfied: cffi@ file:///home/conda/feedstock_root/build_artifacts/cffi_1595805535531/work from file:///home/conda/feedstock_root/build_artifacts/cffi_1595805535...