Reputation
Badges 1
25 × Eureka!For setting trains-server I would recommend the docker-compose, it is very easy to setup, and you just need a single fixed compute instance, details https://github.com/allegroai/trains-server/blob/master/docs/install_linux_mac.md With regards to the "low prio clusters", are you asking how they could be connected with the trains-agent
or if running code that uses trains
will work on them?
and i found our lab seems only have shared user file because i installed trains on one node, but it doesnβt appear on the others
Do you mean there is no shared filesystem among the different machines ?
And do you need to run your code inside a docker, or is venv enough ?
ShaggyHare67 in the HPO the learning should be (based on the above):General/training_config/optimizer_params/Adam/learning_rate
Notice the "General" prefix (notice it is case sensitive)
Yes, this seems like the problem, you do not have an agent (trains-agent) connected to your server.
The agent is responsible for pulling the experiments and executing them.pip install trains-agent trains-agent init trains-agent daemon --gpus all
trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)
Hi SubstantialElk6
try:--docker "<image_name> --privileged"
Notice the quotes
i hope can run in same day too.
Fix should be in the next RC π
-rw------- 1 1000 1000 0 Feb 28 23:41 config
Yes (Mine isn't and it is working π )
Sure, venv mode
MagnificentPig49 that's a good question, I'll ask the guys π
BTW, I think the main issues is actually making sure there is enough documentation on how to compile it...
Anyhow I'll update here
Hi, is there a way to force the requirements.txt?
You mean to ignore the "Installed Packages" ?
Hmm, I think I need more to try and reproduce, what exactly did you do, what was the expected behavior vs reality ?
I found "scheduler" on allegroai github, is it something related to the case I want to make?
MoodyCentipede68 it is exactly what you are looking for π
Do notice that you need to make sure you have your services queue configured and running for that to work π
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
Hmm SuccessfulKoala55 any chance the nginx http was pushed to v1.1 on the latest cloud helm chart?
Hi JitteryCoyote63
could you check if the problem exists in the latest RC?pip install clearml==1.0.4rc1
Is the agent itself registered on the clearml-server (a.k.a can you see it in the UI?)
SweetGiraffe8 Works when I'm using plotly...
Can you please copy paste the code with the plotly, it's probably something I'm missing
Hi @<1523702786867335168:profile|AdventurousButterfly15>
I do not think they log more than that ?!
(what happens if you use TB?)
EnviousStarfish54 Sure, see scatter2d
https://allegro.ai/docs/examples/reporting/scatter_hist_confusion_mat_reporting/#2d-scatter-plots
how can I for example convert it back to a pandas dataframe?
You can always report csv file with report_media as well, or if this is not for debugging maybe an artifact ?
Your code should have worked, i.e. you should see the 'model.h5' in the artifacts tab. What do you have there?
It should look something like this one:
https://demoapp.trains.allegro.ai/projects/531785e122644ca5b85b2e19b0321def/experiments/e185cf31b2634e95abc7f9fbdef60e0f/artifacts/output-model
BTW:
To manually register any model:
from trains import Task, OutputModel task = Task.init('examples', 'my model') OutputModel().update_weights('my_best_model.h5')