Reputation
Badges 1
25 × Eureka!Hi @<1689808977149300736:profile|CharmingKoala14> , let me double check that
Obviously if you click on them you will be able to compare based on specific metric / parameters (either as table or in parallel coordinates)
Just to make sure I understand, running locally creates the Args/command correctly, then when actually executed on the remote machine (i.e. execute_remotely creates the correct Args/command But when the agent actually executes it) it updates back the Args/command as a list. Is that a correct description ?
BTW: we are now adding "datasets chunks for a more efficient large dataset storage"
Hi @<1730396272990359552:profile|CluelessMouse37>
However, the caching doesn't seem to be working correctly. Despite not changing the configuration, the first step runs every time.
How are you creating the cached component?
is this a standalone script or a git repo link?
These parameters are dictionaries of specific configurations (dict of dict) that are the same but might not be taken into account properly by the caching mechanism.
hmm for the component to be cached (or reuse...
or can I directly open a PR?
Open a direct PR and link to this thread, I will make sure it is passed along 🙂
compression=ZIP_DEFLATED if compression is None else compression
wdyt?
Just dropping this here but I've had some funky compressions with very small datasets!
Odd deflate behavior ...?!
FierceHamster54 what you are saying that Inside the container it took 20 min to run? or that spinning the GCP instance until it registered as an Agent took 20min ?
Most of the time is took by building wheels for
nympy
and
pandas
...
BTW: This happens if there is a version mismatch and pip decides it needs to build the numpy from source, Can you send the full logs of that? Maybe we can somehow avoid that?
Hi RoundMole15
What exactly triggers the "automagic" logging of the model and weights?
framework save call, for example torch.save or joblib.save
I've pulled my simple test project out of jupyter lab and the same problem still exists,
What is "the same problem" ?
Hi WickedBee96
Queue1 will take 3GPUs, Queue2 will take another 3GPUs, so in Queue3 can I put 2-4 GPUs??
Yes exactly !
if there are idle GPUs so take them to process the task? o
Correct, basically you are saying, this queue needs a minimum of 2 GPUs, but if you have more allocate them to the Task it pulled (with a maximum of 45 GPUs)
Make sense ?
Hi ExasperatedCrocodile76
It seems like it is using conda package manager, were you using conda when you run the code manually ?ERROR: This cross-compiler package contains no program /home/ivan/miniconda3/envs/clearML/bin/x86_64-conda_cos6-linux-gnu-gfortran
Why is it trying to install from source code?
BTW: can you test with the latest agent RC? ( pip install clearml-agent==1.4.0rc4
)
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
Agent works when I am running it from virtual environment but stucks in the same place all the time when I using Docker
Can you please provide a log? I'm not sure what it means stuck
Thank you @<1523720500038078464:profile|MotionlessSeagull22> always great to hear 🙂
btw, if you feel like sharing your thoughts with us, consider filling our survey , it should not take more than 5min
Yes, actually ensuring pip is there cannot be skipped (I think in the past it cased to many issues, hence the version limit etc.)
Are you saying it takes a lot of time when running? How long is the actual process that the Task is running (just to normalize times here)
So are you saying why do we need to install a specific pip version ?
You can "disable it" by selecting a very high versionpip_version: "<40"
https://github.com/allegroai/clearml-agent/blob/077148be00ead21084d63a14bf89d13d049cf7db/docs/clearml.conf#L67
BoredHedgehog47 you need to configure the clearml k8s glue to spin pods (instead of allocating agents per pods statically) does that make sense ?
SmarmySeaurchin8 could you test with the latest RCpip install clearml==0.17.5rc2
LOL, Let me look into it, could it be the calling file is somehow deleted ?
Quite hard for me to try this right
👍
How do I reproduce it ?
SubstantialElk6 is this the pip to install the agent, or the pip the agent is using to install the packages for the specific experiment ?
UnevenDolphin73 if the repo does not include a poetry file it will revert to pip
Hi AgitatedTurtle16
My question is how to use it to manage my experiments (docker containers). Simply put, let's say:
So basically once you see an experiment in the UI, it means you can launch it on an agent.
There is No need to containerize your experiment (actually that's kind of the idea, removing the need to always containerize everything).
The agent will clone the code, apply uncommitted changes & install the packages in the base-container-image at runtime.
This allows you to u...
does the clearml server is a worker i can serve on models?
The serving is done by one of the clearml-agents.
Basically you spin an agent, then this agent is spinning the model serving engine container (fully managed).
(1) install run run clearml-agent (2) run clearml-session CLI to configure and spin the serving engine
Hi @<1523703397830627328:profile|CrookedMonkey33>
If you click on the "Task Information" (on the Version Info panel, right hand-side). It will open the Task details page, there you have the "hamburger" menu top right, where you have publish
(Maybe we should add that to the main right click menu?!)
Hi SubstantialElk6 I believe you just need to use clearml 1.0.5 , and make sure you rae passing the correct OS environment to the agent