Reputation
Badges 1
25 × Eureka!others from the local environment and this causes a conflict when importing the attr module
Inside the docker ? " local environment" ?
This is all under "root" no?
Hmm worked now...
When Task.init called with output_uri='
s3://my_bucket/sub_folder '
s3://my_bucket/sub_folder/examples/upload issue.4c746400d4334ec7b389dd6232082313/artifacts/test/test.json
I lost you SmallBluewhale13 is this the Task init call you used:task = Task.init( project_name="examples", task_name="load_artifacts", output_uri="s3://company-clearml/artifacts/bethan/sales_journeys/", )
ReassuredTiger98 oh wow I did not realize you actually call importlib to import your libraries (any reason not to call import
?)
And yes, I think we will miss it as the package analysis is actually static text analysts of the code
Task.current_task().get_logger().flush(wait=True). # <-- WILL HANG HERE
Okay a bit of theoretical "how it actually works" (and I might be mistaken here...)
Console logging is being reported because the underlining DDP infra (gloo) is pipeline stdout to the main process, where clearml will catch it (I think) The scalars not working on the subprocesss & the flush wait stuck I think are related, as the wait actually waits for the flush process, and it seems it cannot actually "talk" to i...
now realise that the ignite events callbacks seem to not be fired
So this is an ignite issue ?
Okay that seems to explain it. Now the question is why it installed it in the wrong place.
See here:
https://pip.pypa.io/en/stable/user_guide/#environment-variables
Pass these environment variables as part of the YAML template you are using with the k8s.
Should work for both π
Was trying to figure out how the method knows that the docker image ID belongs to ECR. Do you have any insight into that?
Basically you should have the docker service login before running the agent, then the agent uses docker to run the image from the ECR.
Make sense ?
You do not need the cudatoolkit package, this is automatically installed if the agent is using conda as package manager. See your clearml.conf for the exact configuration you are running
https://github.com/allegroai/clearml-agent/blob/a56343ffc717c7ca45774b94f38bd83fe3ce1d1e/docs/clearml.conf#L79
The additional edges in the graph suggest that these steps somehow contain dependencies that I do not wish them to have.
PanickyMoth78 I think I understand what you are saying, but it is hard to see if there is a "bug" here or a feature...
Can you post the full code of the pipline?
I tried to export them to json and they don't take more than 50KB each, but maybe they take more memory internally?
Ballpark should be the same.
I'm already at 300MB of usage with just 15 tasks
Maybe it was not updated yet? meaning you had more and deleted? (I think this is updated asynchronously, with max of 24h)
Hi ShinyRabbit94
system_site_packages: true
This is set automatically when running in "docker mode" no need to worry π
What is exactly the error you are getting ?
Could it be the container itself has the python packages installed in a venv not as "system packages" ?
Yes this seems like it is stuck, could you test with the demo server ?
(basically remove the clearml.conf it will connect automatically)
So the TB issue was reported images were not logged.
We are now talking about the caching, which is actually a UI thing which clearml-server version are you using ?
And where are the images stored (the default files server or is it S3/GS etc.) ?
` @PipelineDecorator.component(
name="my step", return_values=['data_frame'], cache=True, task_type=TaskTypes.data_processing)
def step_one(pickle_data_url: str, extra: int = 43):
stuff here `This seemed to work for me
You described getting a secret key pair from the UI and feeding it back into the compose file. Does this mean it's not possible to seed the secrets in the compose file, starting from clean state? If so, that would explain why I can't get it to work.
Long story short, no. This would basically mean you have a pre-build credentials in the docker, this sounds dangerous π
I'm not sure I'm following the use case here, what exactly are we trying to do?
(or maybe I missed something here?)
so it would be better just to use the original code files and the same conda env. if possibleβ¦
Hmm you can actually run your code in "agent mode" assuming you have everything else setup.
This basically means you set a few environment variables prior to launching the code:
Basically:export CLEARML_TASK_ID=<The_task_id_to_run> export CLEARML_LOG_TASK_TO_BACKEND=1 export CLEARML_SIMULATE_REMOTE_TASK=1 python my_script_here.py
Hi ElegantCoyote26
what's the clearml version you are using?
Hi GrievingTurkey78
Turning of pytorch auto-logging:Task.init(..., auto_connect_frameworks={'pytorch': False})
To manually log a model:from clearml import OutputModel OutputModel().update_weights('my_best_model.pt')
on the host machine or inside the containers that are spinning on the host machine ?
hen, in the bash console, after some time, I see some messages being logged from clearml
JitteryCoyote63 Hmm that is strange, let me check something
@<1651395720067944448:profile|GiddyHedgehong81> just to be clear, Dataset.get_local_copy returns a path to your files,
You have to Manually add the additional path to the specific files you need to use. It does Not know that in advance.
That was the initial issue you had, and I assume it is the same one here. does that make sense ?
but then the error occurs, after the training und the validating where succesfuly completed
It seems it is failing on the last eval ? could it be testing is missing? is it the same dataset ? can you verify the file is there? (notice I see a mix of / and \ in the file name, this is odd Windows is \ and linux/mac are / , you should never have a mix)
Is there any way to make that increment from last run?
pipeline_task = Task.clone("pipeline_id_here", name="new execution run here")
Task.enqueue(pipeline_task, queue_name="services")
wdyt?
New RC hopefully solves it @<1643060801088524288:profile|HarebrainedOstrich43> could you check if it works for you now?
pip install clearml==1.14.0rc0
Ohh that cannot be pickled... how would you suggest to store it into a file?
Hi @<1643423185791619072:profile|DashingCentipede5>
Notice that you called "start_locally", it tries to run the code locally inside your jupter notebook, it assumes everything including code already exists, is that your case ?