
Reputation
Badges 1
25 × Eureka!WickedGoat98
for such pods instantiating additional workers listening on queues
I would recommend to create a "devops" user and have its credentials spread across all agents. sounds good?
EDIT:
There is no limit on number of users on the system, so login as a new one and create credentials in the "profile" page :)
ReassuredTiger98 no, but I might be missing something.
How do you mean project-specific?
Could you test if this is working:
https://github.com/allegroai/clearml/blob/master/examples/reporting/matplotlib_manual_reporting.py
LOL totally π
Basically it hooks into any torch.save function (monkey patching in realtime)
If you cannot change the "TrainerState" (i.e. inherit and pass it into the code)
you cloud also monkey-patch it, something like
` class OurTrainerState(TrainerState):
def init(...)
...
def load_from_json(cls, json_path: str):
super().load_from_json(json_path))
Task.current_task().upload_artifact(...)
trainer.state = OurTrainerState(trainer.state) `
but could you try with the latest RC?
is everything on the same network?
what do you mean? the same env for all components ? if they are using/importing exactly the same packages, and using the same container, then yes it could
The task pod (experiment) started reaching out to an IP associated with malicious activity. The IP was associated with 1000+ domain names. The activity was identified in AWS guard duty with a high severity level.
BoredHedgehog47 What is the pod container itself ?
EDIT:
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
https://hub.docker.com/layers/library/ubuntu/18.04/images/sha256-d5c260797a173fe5852953656a15a9e58ba14c5306c175305b3a05e0303416db?context=explore
Is there any better way to avoid the upload of some artifacts of pipeline steps?
How would you pass "huge datasets (some GBs)" between different machines without storing it somewhere?
(btw, I would also turn on component caching so if this is the same code with the same arguments the pipeline step is reused instead of reexecuted all over again)
BTW: you can always set different config files by with an environment variable:CLEARML_CONFIG_FILE="path/to/cobfig/file
Are you suggesting the default "ubuntu:18.04" is somehow contaminated ?
This is an official Ubuntu container (nothing to do with ClearML), this is Very Very odd...
- Be able to trigger the βpureβ function (e.g. train()) locally, without anyΒ
Β code running, while driving it from a configuration e.g. path to the data.
When you say " without anyΒ http://clear.ml Β code" do mean without the agent, or without using the Clearml.Dataset ?
Be able to trigger the β
Β decoratorβ (e.g.Β train_clearml()) while driving it from configuration e.g. dataset_id
Hmm I can think of:
` def train_clearml(local_folder=None, dataset_id=None):
...
Unfortunately not, the queues tab shows only the number of tasks, but not resources used
in the queue
Oh, yes, that makes sense to add, I like that π
(the main question is what data is there in the backend DBs, let me know what I can get)
Hi SmarmyDolphin68
I see this in between my training epochs, what could be causing this?
This is basically saying we are saving a second model on the same Task and even though both are logged, only the last is stored on the Task itself.
This will change as in the next version a Task will be able to hold reference to multiple models in the artifactory π
LazyLeopard18 nice. maybe we should add it in the FAQ / Install. Could you send the exact docker-compose you used and command line, I'll ask the guys to add it π
It's in my local conda environment though.
Meaning this is a wheel installed manually in conda? or is it a folder inside the conda environment ?
Ohh okay something seems to half work in terms of configuration, the agent has enough configuration to register itself, but fails to pass it to the task.
Can you test with the latest agent RC:0.17.2rc4
in Your Additional ClearML Configuration
(which is basically clearml.conf configuration)
Add the following:environment { GOOGLE_APPLICATION_CREDENTIALS="~/gs.cred" } files { gsc { contents: "<this is your GCP storage credentials file>" path: "~/gs.cred" } }
Reference:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L421
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a...
Thanks! Let me check something
ClumsyElephant70 yes there is πclearml-agent build --id <task id> --target <folder>
(I might have a typo there, but you can basically check the full help clearml-agent build --help
)
but I still need the laod ballancer ...
No you are good to go, as long as someone will register the pods IP automatically on a dns service (local/public) you can use the regsitered address instead of the IP itself (obviously with the port suffix)
Thanks for your support
With pleasure!
I think it should look something like:files { gsc { contents: """{"type": "service_account", "project_id": "ai-platform", "private_key_id": "9999", "private_key": "-----BEGIN PRIVATE KEY-----==\n-----END PRIVATE KEY-----\n", "client_email": "a@ai.iam.gserviceaccount.com", "client_id": "111", "auth_uri": "
", "token_uri": "
", "auth_provider_x509_cert_url": "
", "client_x509_cert_url": "
"}""" path: "~/gs.cred" } }
Hi WackyRabbit7
I have a pipeline controller task, which launches 30 tasks. Semantically there are 10 applications, and I run 3 tasks for each (those 3 are sequential, so in the UI it looks like 10 lines of 3 tasks).
π
In one of those 3 tasks that run for every app, I save a dataframe under the name "my_dataframe".
I'm assuming as an artifact:
What I want to achieve is once all tasks are over, to collect all those "my_dataframe" artifacts (10 in number), extract a sin...
Specifically notice step (1) and (2) they are important for Windows docker service to be able to run the elastic container and mongo container