Reputation
Badges 1
25 × Eureka!Well it seems we forgot that one 😞 I'll quickly make sure it is there.
As a quick solution (no need to upgrade)task.models["output"]._models.keys()
MoodyCentipede68 seems you did not pass any configuration (os env or conf file) so it does nor know how to find the server and authenticate. Make sense?
But there is no need for 2FA for cloning repo
I am thinking about just installing this manually on the worker ...
If you install them system wide (i.e. with sudo) and add agent.package_manager.system_site_packages
then they will always be available for you 🙂
And then also use
priority_optional_packages: ["carla"]
This actually means that it will always try to install the package clara
first, but if it fails, it will no raise an error.
BTW: this would be a good use case for dockers, just saying :w...
Manually I was installing the
leap
package through
python -m pip install .
when building the docker container.
NaughtyFish36 what happnes if you add to your "installed packages" /opt/keras-hannd
? This should translate to "pip install /opt/keras-hannd" which seems like exactly what you want, no ?
Hmm I think this is not doable ... 😞
(the underlying data is stored in DBs and changing it is not really possible without messing about with the DB)
Seems the apiserver is out of connections, this is odd...
SuccessfulKoala55 do you have an idea ?
FlutteringWorm14 an RC is out (1.7.3dc1) with the ability to configure from clearml.conf
you can now setsdk.development.worker.report_event_flush_threshold
from clearml.conf
might it be related to the docker socket not being mounted to the agent daemon running inside a docker container?
Oh yes, if the daemon is running Inside a docker container than you need both --privileged and mounting of the docker socket, to get it to work
... training script was set to upload every epoch. Seems like this resulted in a torrent of metrics being uploaded.
oh that makes sense, so basically you were bombarding the server with requests, and ending with kind of denial of service
Hi @<1658281093108862976:profile|EncouragingPenguin15>
Should work, I'm assuming multiple nodes are running agents ? or are you saying Ray spins the jobs and clearml logs them ?
Sure thing, thanks FlutteringWorm14 !
task = Task.init(...) if task.running_locally(): # wait for the repo detection and requirements update task._wait_for_repo_detection() # reset requirements task._update_requirements(None)
🙂
S3 access would return a different error...
Can you do:
` from clearml.storage.helper import StorageHelper
helper = StorageHelper.get("s3://<bucket>/<foo>/local/<env>/<project-name>/v0-0-1/2022-05-12-30-9-rocketclassifier.7b7c02c4dac946518bf6955e83128bc2/models/2022-05-12-30-9-rocketclassifier.pkl.gz")
print("helper", helper) `
What do you mean? every Model has a unique ID, what do you consider a version?
one can containerise the whole pipeline and run it pretty much anywhere.
Does that mean the entire pipeline will be running on the instance spinning the container ?
From here: this is what I understand:
https://kedro.readthedocs.io/en/stable/10_deployment/06_kubeflow.html
My thinking was I can use one command and run all steps locally while still registering all "nodes/functions/inputs/outputs etc" with clearml such that I could also then later go into the interface and clone an...
GrumpyPenguin23 could you help and point us to an overview/getting-started video?
LOL, if this is important we probably could add some support (meaning you will be able to specify it in the "installed packages" section, per Task).
If you find an actual scenario where it is needed, I'll make sure we support it 🙂
I see, so basically fix old links that are now not accessible? If this is the case you might need to manually change the document on the mongodb running in the backend
Hi GiganticTurtle0
ClearML will only list the directly imported packaged (not their requirements), meaning in your case it will only list "tf_funcs" (which you imported).
But I do not think there is a package named "tf_funcs" right ?
And is "requirements-dev.txt" in your git root folder?
What is your clearml-agent version?
Hmm make sense, then I would call the export_task once (kind of the easiest to get the entire Task object description pre-filled for you) with that, you can just create as many as needed by calling import_task.
Would that help?
There is a version coming out next week, the one after it (probably 2/3 weeks later) will have this feature
Okay that might explain the issue...
MysteriousBee56 so what you are saying ispython3 -m trains-agent --help
does NOT work
but trains-agent --help
does work?
I want to run only that sub-dag on all historical data in ad-hoc manner
But wouldn't that be covered by the caching mechanism ?
Hi @<1556450111259676672:profile|PlainSeaurchin97>
While testing the migration, we found that all of our models had their
MODEL URL
set to the IP of the old server.
Yes all the artifacts/models/debug-samples are stored "as is" , this means that if you configured your original setup with IP, it is kind of stuck there, this is why it is always preferred to use host-name ...
you apparently also need to rename
all
model URLs
Yes 😞