Reputation
Badges 1
25 × Eureka!Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
Just making sure, the machine that you were running the "trains-init" on can access the API server ?
Hi ShakyJellyfish91
Check mount default here:
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docs/clearml.conf#L186
Is this what you are after, or do you actually want to change the start up script?
you should have something like 192.168... or 10.0 ....
AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.
Basic setup:
A machine running clearml-agent (this is the "remote machine") A machine running cleaml-session (let's call it laptop π )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default ), Once the agent is running on the remote machine, from your laptop ru...
I think the clearml-session CLI is missing the ability to add cutom port to the external address, does that make sense ?
but out of curiosity, whats the point on doing a hyperparam search on the value of the loss on the last epoch of the experiment
The problem is that you might end up with global min that is really nice, but it was 3 epochs ago, and you have the last checkpoint ...
BTW, global min and last min should not be very diff if the model converge, wdyt?
I understand, but then the toml file needs to be parsed to ensure poetry is used. It's just a tool entry in the pyproject.toml.
Probably too much for the agent... and specifically it seems poetry actually managed to parse it?! what are you getting in the log?
Well if we the "video" from TB is not in mp4/gif format than someone will have to encode it.
I was just pointing that for the encoding part we might need additional package
The problem is, the configuration is loaded at import time, so there is no "time" to pass anything other than environment variable.
That said if the only difference is server config you can useTask.set_credentials
Hi HandsomeCrow5 hmm interesting use case,
we have seen html reports as artifacts, then you can press "download" and it should open in another tab, what would you expect on "debug samples" ?
Can you copy the "Installed Packages" here, and point to the package causing the issue?
CleanPigeon16 Can you send also the "Configuration Object" "Pipeline" section ?
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
... grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything.
Nice. wouldn't it make more sense to "store" a link to the "winning" experiment. So you know how to reproduce it, and the set of HP that were chosen?
No that the model is bad, but how would I know how to reproduce it, or retrain when I have more data etc..
Here you go π
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
Hi PompousBeetle71
Could you test the latest RC, I think the warning were fixed:pip install trains==0.16.2rc0Let me know...
GiddyTurkey39
as others will also be running the same scripts from their own local development machine
Which would mean trains ` will update the installed packages, no?
his is why I was inquiring about theΒ
requirements.txt
Β file,
My apologies, of course this is supported π
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent will revert to installing the requirements.txt from the git repo itself, then it...
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
Hi PompousBeetle71 , what exactly is the scenario / problem we are trying to solve ?
A true mystery π
That said, I hardly think it is directly related to the trains-agent ...
Do you have any more insights on when / how it happens ?
Hi UnevenDolphin73
If you "remove" the lock file the agent will default to pip.
You can hack it with uncommitted changes section?
Hi HealthyStarfish45
- is there an advantage in using tensorboard over your reporting?
Not unless your code already uses TB or has some built in TB loggers.
html reporting looks powerfull, can one inject some javascript inside?
As long as the JS is self contained in the html script, anything goes :)
JitteryCoyote63 did you add the bash script here: https://github.com/allegroai/trains-agent/blob/master/docs/trains.conf#L99
So actually while weβre at it, we also need to return back a string from the model, which would be where the results are uploaded to (S3).
Is this being returned from your Triton Model? or the pre/post processing code?