Reputation
Badges 1
25 × Eureka!With pleasure, I'll make sure we officially release RC1 soon :)
PompousParrot44 the venv created in the docker always inherits form the docker system-wide packages, so in essence if you are using the same set pf python packages, nothing will get reinstalled.
Yep π but only in RC (or github)
Hi @<1691258563357315072:profile|ColorfulKitten60>
I think we need some context for this question π
Oh, then no, you should probably do the opposite π
What is the flow like now? (meaning what are you using kubeflow for and how)
Does adding external files not upload them ti the dataset output_uri?
@<1523704667563888640:profile|CooperativeOtter46> If you are adding the links with add_external_files these files are Not re-uploaded
Thank you @<1523720500038078464:profile|MotionlessSeagull22> always great to hear π
btw, if you feel like sharing your thoughts with us, consider filling our survey , it should not take more than 5min
Hi @<1541954607595393024:profile|BattyCrocodile47>
Does clearML have a good story for offline/batch inference in production?
Not sure I follow, you mean like a case study ?
Triggering:
We'd want to be able to trigger a batch inference:
- (rarely) on a schedule
- (often) via a trigger in an event-based system, like maybe from AWS lambda function(2) Yes there is a great API for that, checkout the github actions it is essentially the same idea (RestAPI also available) ...
Hi SubstantialElk6
No need for that, you can use the helm chart (or spin them once with kubctl) then they take care of scheduling by themselves.
You can also use the k8s glue (basically spinning kubernetes pods automatically for you, based on the Tasks that you push into the ClearML queue)
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
In short, two possible deployments
Static k8s pod running the agent (then the agent runs all the experiments inside t...
CrookedWalrus33 can you test what happens if you pass the credentials in the global scope as well, i.e. here:
https://github.com/allegroai/clearml/blob/397dcfacda8f133af0acc7d2f9a124dde38ecc4a/docs/clearml.conf#L80
if I encounter the need for that, I will adapt and open a PRΒ
Great!
Oh sorry:pip install clearml-agent==1.2.0rc4Also automatically detects if you have an active venv inside the container and uses it instead of the system wide python
Switching to process Pool might be a bit of an overkill here (I think)
wdyt?
Just making sure, the machine that you were running the "trains-init" on can access the API server ?
you should have something like 192.168... or 10.0 ....
AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.
Basic setup:
A machine running clearml-agent (this is the "remote machine") A machine running cleaml-session (let's call it laptop π )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default ), Once the agent is running on the remote machine, from your laptop ru...
but out of curiosity, whats the point on doing a hyperparam search on the value of the loss on the last epoch of the experiment
The problem is that you might end up with global min that is really nice, but it was 3 epochs ago, and you have the last checkpoint ...
BTW, global min and last min should not be very diff if the model converge, wdyt?
I understand, but then the toml file needs to be parsed to ensure poetry is used. It's just a tool entry in the pyproject.toml.
Probably too much for the agent... and specifically it seems poetry actually managed to parse it?! what are you getting in the log?
Can you copy the "Installed Packages" here, and point to the package causing the issue?
CleanPigeon16 Can you send also the "Configuration Object" "Pipeline" section ?
Ohh so the setup.py is the one containing these requirements, oops I totally missed that :( let me check what pep has to say about that ... (Basically this is not a clearml issue but a pip one...)
... grab the model artifacts for each, put them into the parent HPO model as its artifacts, and then go through the archive everything.
Nice. wouldn't it make more sense to "store" a link to the "winning" experiment. So you know how to reproduce it, and the set of HP that were chosen?
No that the model is bad, but how would I know how to reproduce it, or retrain when I have more data etc..
Here you go π
(using trains_agent for easier all data access)from trains_agent import APIClient client = APIClient() log_events = client.events.get_scalar_metric_data(task='11223344aabbcc', metric='valid_average_dice_epoch') print(log_events)
GiddyTurkey39
as others will also be running the same scripts from their own local development machine
Which would mean trains ` will update the installed packages, no?
his is why I was inquiring about theΒ
requirements.txt
Β file,
My apologies, of course this is supported π
If you have no "installed packages" (i.e. the field is empty in the UI) the trains-agent will revert to installing the requirements.txt from the git repo itself, then it...
Hi JitteryCoyote63
So the main issue is backing up the elastic & mongo DB while they are running, once they are backed/restored, the server will spin as is. (Let me check regrading the reddis, it might be that since it is used for caching there is no need to actually backup the content only the configuration)
A true mystery π
That said, I hardly think it is directly related to the trains-agent ...
Do you have any more insights on when / how it happens ?