Reputation
Badges 1
25 × Eureka!AstonishingSeaturtle47 that's awesome! Could you explain the hack, it might be helpful for others (I assume :))
MelancholyBeetle72 it will be great if you could also open an issue on Trains and reference the pytorch lightning issue, could you please?
MelancholyBeetle72 there is an RC with a fix, check the GitHub issue for details :)
Hi MelancholyBeetle72 , that's a very interesting case. I can totally understand how storing a model and then immediately renaming it breaks the upload. A few questions, is there a way for pytorch lightning not to rename the model? Also I wonder if this scenario happens a lot (storing model and changing it) . I think the best solution is for Trains to create a copy of the file and upload it in the background. That said the name will still end with .part What do you think?
@<1523704157695905792:profile|VivaciousBadger56>
Is the idea here the following? You want to use inversion-of-control such that I provide a function
f
to a component that takes the above dict an an input. Then I can do whatever I like inside the function
f
and return a different dict as output. If the output dict of
f
changes, the component is rerun; otherwise, the old output of the component is used?
Yes exactly ! this way you...
Yes, the same will work with artifacts, use pass the full url to the artifact_object
it should just register it as is.
I think it would make sense to have one task per run to make the comparison on hyper-parameters easier
I agree. Could you maybe open a GitHub issue on it, I want to make sure we solve this issue 🙂
Right, I think the naming is a by-product of Hydra / TB
I assume it is reported into TB, right ?
It's a running number because PL is creating the same TB file for every run
Hi @<1668427971179843584:profile|GrumpySeahorse51>
Could you provide the full stack log?
this erros seems to originate from psutil (which is used) but it lacks the clearml-session context
GloriousPanda26 wouldn't it make more sense that multi run would create multiple experiments ?
GloriousPanda26 Are you getting multiple Tasks or is it a single Task ?
Oh, so the pipeline basically makes itself their parent, this means you can get their IDs:steps_ids = Task.query_tasks(task_filter=dict(parent=<pipeline_id_here)) for task_id in steps_ids: task = Task.get_task(task_id)
If a Task is in the 'Completed' I think the only option is to 'Reset' it (see image).
In the UI yes, in code you can do task.mark_aborted(force=True)
You do clear the previous run execution but I think for a repetitive task this is fine.
I would avoid that, no?
WARNING:root:Could not lock cache folder /home/ronslos/.clearml/venvs-cache: [Errno 11] Resource temporarily unavailable
Hi @<1549927125220331520:profile|ZealousHare78>
could it be you are also working on the same machine ? are you running the agent in docker mode or venv mode ?
Hi DilapidatedDucks58 ,
Are you running in docker or venv mode?
Do the works share a folder on the host machine?
It might be syncing issue (not directly related to the trains-agent but to the facts you have 4 processes trying to simultaneously access the same resource)
BTW: the next trains-agent RC will have a flag (default off) for torch-nightly repository support 🙂
See the log:
Collecting keras-contrib==2.0.8
File was already downloaded c:\users\mateus.ca\.clearml\pip-download-cache\cu0\keras_contrib-2.0.8-py3-none-any.whl
so it did download it, but it failed to pass it correctly ?!
Can you try with clearml-agent==1.5.3rc2
?
ConvolutedChicken69
basically the cleamrl-data needs to store an immutable copy of the delta changes per version, if the files are already uploaded, there is a good chance they could be modified...
So in order to make sure we you have a clean immutable copy, it will always upload the data (notice it also packages everything into a single zip file, so it is easy to manage).
@<1564422644407734272:profile|DistressedCoyote60> could you open a GitHub issue on it in clearml-agent, just so we know of the problem and fix it for next version ?
Basically it gives it direct access to the host, this is why it is considered less safe (access on other levels as well, like network)
GentleSwallow91 how come it does not already find the correct pytorch version inside the docker ? whats the clearml-agent version you are using ?
and you have clearml v0.17.2 installed on the "system" packages level, and 0.17.5rc6 installed inside the pyenv venv ?
Hi JitteryCoyote63
I would like to switch to using a single auth token.
What is the rationale behind to that ?
So it seems decorator is simply the superior option?
Kind of yes 😊
In which case would we use add_task() option?
When you have existing Tasks, and the piping is very straight forward (i.e. input / output in the code is basically referencing other Tasks/artifacts, and there is no real need to do any magic for serializing/deserializing data between steps
Should work with report surface, notice that this is not triangles, assumption is this is a fixed sampling of the surface, sample size is the numpy array matrix and the sample value (i.e. Z ) is the value on the matrix. This means that if you have a set of mesh triangles , you have to projects and sample it.
I think this is what you are after https://trimsh.org/trimesh.voxel.base.html?highlight=matrix#trimesh.voxel.base.VoxelGrid.matrix
ConfusedPig65 could you send the full log (console) of this execution?