NastySeahorse61 I would try to open in incognito mode (i.e. no cookies etc.), did you also change the address of the server?
So basically development on a "shared" GPU?
but I cannot compare between them
I think we noticed it, and this will be fixed in the next server update (again, some plotly.js issue there)
Hmm I see, add this for example
extra_docker_shell_script: ["rm ~/.bashrc", "echo removed bashrc"]
CrookedWalrus33 any chance you can think of a sample code to reproduce?
Here this new entry in the log is 2 min after env completed =>
1702378941039 box132 DEBUG 2023-12-12 11:02:16,112 - clearml.model - INFO - Selected model id: 9be79667ca644d7dbdf26732345f5415
This seems to be something in your code, just add print("starting") in your entry python file, Before any imports (because they might actually do something)
Because form the agent's perspective after printing Starting Task Execution: it literally calls the python script, nothing else...
In order to clone the Task it needs to complete sync, which implies closing. I guess the use case for execute remotely while still running was not considered. How / why is this your workflow? Specifically how does Jupyter get into the picture?
For future readers, see discussion here:
https://clearml.slack.com/archives/CTK20V944/p1629840257158900?thread_ts=1629091260.446400&cid=CTK20V944
Hi ScaryBluewhale66
TaskScheduler I created. The status is still
running
. Any idea?
The TaskScheduler needs to actually run in order to trigger the jobs (think cron daemon)
Usually it will be executed on the clearml-agent services queue/mahine.
Make sense ?
using only a subset of the features
ShallowGoldfish8 if you have some parameter that controls it (i.e. select different features) then you can launch it with two sets f parameters.
Am I missing something?
for example:
` my_features_select = {"type": "set_a"}
Task.current_task().connect(my_features_select)
if my_features_select["type"] == "set_a":
do something
else
do something else `wdyt?
Verified, and already fixed with 1.0.6rc2
I don't want a new task every 5 minutes as that will create a lot of tasks over a day. It would be better if I had just one task.
Oh you mean the Task that will be launched will override the previous "instance", correct ?
https://stackoverflow.com/questions/60860121/plotly-how-to-make-an-annotated-confusion-matrix-using-a-heatmap
MagnificentSeaurchin79 see plotly example here:
https://allegro.ai/clearml/docs/docs/examples/reporting/plotly_reporting.html
JitteryCoyote63
are the calls from the agents made asynchronously/in a non blocking separate thread?
You mean like request processing on the apiserver are multi-threaded / multi-processed ?
Correct, but do notice that (1) task names are not unique and you can change them after the Task was executed (2) when you clone the Task, you can actually rename it, when an agent is running the Task, basically the init function is ignored, because the Task already exists. Make sense ?
As we can’t create keys in our AWS due to infosec requirements
Hmmm
😢 any chance you have a toy pytest that replicates it ?
it will constantly try to resend logs
Notice this happens in the background, in theory you will just get stderr messages when it fails to send but the training should continue
Thanks @<1523701868901961728:profile|ReassuredTiger98>
From the log this is what conda is installing, it should have worked
/tmp/conda_env1991w09m.yml:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- decorator~=4.4.2
- ffmpeg~=4.3
- freetype~=2.10.4
- gmp~=6.2.1
- gnutls~=3.6.13
- imageio~=2.9.0
-...
It's in my local conda environment though.
Meaning this is a wheel installed manually in conda? or is it a folder inside the conda environment ?
@<1523701079223570432:profile|ReassuredOwl55> did you try adding manually ?
./path/to/package
You can also do that from code:
Task.add_requirements("./path/to/package")
# notice you need to call Task.add_requirements before Task.init
task = Task.init(...)
But once i see it on the UI means it is already launched somewhere so i didn't quite get you.
The idea is you run it locally once (think debugging your code, or testing it)
While running the code the Task is automatically created, then once in the system you can clone / launch it.
Also, I want to launch my experiments on a kubernetes cluster and i don't actually have any docs on how to do that, so an example can be helpful here.
We are working on documenting the full process, ...
SmugDog62 so on plain vanilla Jupyter/lab everything seems to work.
What do you think is different in your setup ?
. However, despite having imported the required types from the
typing
library in the script where the function decorated with
PipelineDecorator.component
is defined, later in the generated script the
typing
library is not imported outside the scope of the function
Actually the typing part is not passed to the "created step" , because there are no global imports, for eexample:
` def step(a: pd.DataFrame):
import pandas as pd
...
My bad you have to pass it to the container itself:
https://github.com/allegroai/clearml-agent/blob/a5a797ec5e5e3e90b115213c0411a516cab60e83/docs/clearml.conf#L149extra_docker_arguments: ["-e", "CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1"]