Reputation
Badges 1
25 × Eureka!PompousBeetle71 , basically reset experiment will clear all the outputs, and input model model is well, input, it is not cleared. In the next execution it will be overridden. There is actually a way to change it from the UI, and override the initial model weights.
One way to circumvent this btw would be to also add/use theΒ
--python
Β flag forΒ
virtualenv
Notice that when creating the venv , the cmd that is used is basically pythonx.y -m virtualenv ...
By definition this will create a new venv based on the python that executes the venv.
With all that said, it might be there is a bug in virtualenv and in some cases, it does Not adhere to this restriction
@<1523701868901961728:profile|ReassuredTiger98> if you use the latest RC! i sent and run with --debug
in the log you will see the full /tmp/conda_envaz1ne897.yml
content
Here it is copied from your log, do you want to see if this one works:
channels:
- defaults
- conda-forge
- pytorch
dependencies:
- blas~=1.0
- bzip2~=1.0.8
- ca-certificates~=2020.10.14
- certifi~=2020.6.20
- cloudpickle~=1.6.0
- cudatoolkit~=11.1.1
- cycler~=0.10.0
- cytoolz~=0.11.0
- dask-core~=2021.2.0
- de...
(This is why we recommend using pip, because it is stable and clearml-agent takes care of pytorch/cuda verions)
but who exactly executes agent in this case?
with both execute
/ build
commands, you execute it on your machine, for debugging purposes. make sense ?
Hi SlimyElephant79
As you can imagine, wandb's tracking code would be present across the code modules and I was hoping for a structured approach that would help me transition to ClearMLs experiment tracking.
Do you guys a have a layer in between that does the reporting, or is the codebase riddled with direct reporting calls ? if the latter, then I guess search and replace ? or maybe a module that "converts" wandb call to clearml call ? wdyt?
Hi VexedCat68
Could it be you are trying to update a committed dataset?
RobustGoldfish9 do you see the trains-agent listed as a machine in the UI (under workers)
The way ClearML thinks about it is the execution graph would be something like:
script_1 -> script_2 -> script_3 ->
Where each script would have in/out, so that you can trace the usage.
Trying to combine the two into a single "execution" graph might not represent the orchestration process.
That said visualizing them could be done.
I mean in theory there is no reason why we could add those "datasets" as other types of building blocks, for visualization purposes only
(Of course this would o...
Thanks SubstantialElk6 !
I believe an initial a fix was pushed π A full one (merging Task --env with k8s template) will be added soon
Hi @<1545216070686609408:profile|EnthusiasticCow4>
The auto detection of clearml is based on the actual imported packages, not the requirements.txt of your entire python environment. This is why some of them are missing.
That said you can always manually add them
Task.add_requirements("hydra-colorlog") # optional add version="1.2.0"
task = Task.init(...)
(notice to call before Task.init)
@<1671689437261598720:profile|FranticWhale40> could you test the fix? just pull & run
allegroai/clearml-serving-triton:1.3.1
allegroai/clearml-serving-inference:1.3.1
I can't seem to figure out what the names should be from the pytorch example - where did INPUT__0 come from
This is actually the latyer name in the model:
https://github.com/allegroai/clearml-serving/blob/4b52103636bc7430d4a6666ee85fd126fcb49e2e/examples/pytorch/train_pytorch_mnist.py#L24
Which is just the default name Pytorch gives the layer
https://discuss.pytorch.org/t/how-to-get-layer-names-in-a-network/134238
it appears I need to converted into TorchScript?
Yes, this ...
UnevenDolphin73 following the discussion https://clearml.slack.com/archives/CTK20V944/p1643731949324449 , I suggest this change in the pseudo code
` # task code
task = Task.init(...)
if not task.running_locally() and task.is_main_task():
# pre-init stage
StorageManager.download_folder(...) # Prepare local files for execution
else:
StorageManager.upload_file(...) # Repeated for many files needed
task.execute_remotely(...) `Now when I look at is, it kinds of make sense to h...
Hi @<1532532498972545024:profile|LittleReindeer37>
Yes you are correct it should capture the entire jupyter notebook in sagemaker studio.
Just verifying this is the use case, correct ?
The issue is uploading reporting fro http uploads (object storage will report upload). Basically the http upload is post with urllib that does not support upload callbacks for progress report. If you have an idea here, we will gladly add it (as you mentioned it can be quite annoying to have to open network manager to verify the upload is progressing)
Hmm that is odd.
Can you verify with the latest from GitHub?
Is this reproducible with the pipeline example code?
The problem is that clearml installsΒ
cudatoolkit=11.0
Β butΒ
cudatoolkit=11.1
Β is needed.
You suggested this fix earlier, but I am not sure why it didnt work then.
Hmm , could you test with the clearml-agent 0.17.2 ? making surethis actually solves the problem
where is it running? could you restart all the dockers ? Is it running on your machine?
DilapidatedDucks58 You might be able to, check the links, they might be embedded into the docker, so you can map diff png file from the host π
BTW: what would you change the icons to?
I think this is great! That said, it only applies when you are spining agents (the default helm is for the server). So maybe we need another one? or an option?
So you are saying 156 chunks, with each chunk about ~6500 files ?
an implementation of this kind is interesting for you or do you suggest to fork
You mean adding a config map storing a default trains.conf for the agent?