Reputation
Badges 1
25 × Eureka!Oh I think that I understand what's going on, @<1523701260895653888:profile|QuaintJellyfish58> let me check how to update the cron scheduler while it is running (I really like this idea, so if this is not already supported I'l like us to add this capability π )
WickedGoat98 nice!!
Can you also pass the login screen (i.e. can you access the api server)
I see the problem now: conda is failing to install the package from the git, then it reverts to pip install, and pip just fails... " //github.com/ajliu/pytorch_baselines "
This is why we recommend using pip and not conda ...
PunySquid88 after removing the "//gihub" package is it working ?
with conda ?!
Hi @<1709015393701466112:profile|ScatteredPeacock14>
I get 3 tasks created in total. Any ideas?
Could it be an old instance of the same Task?
Notice the for loop starts from 1 so it does include the master node:
None
Check the log to see exactly where it downloaded the torch from. Just making sure it used the right repository and did not default to the pip, where it might have gotten a CPU version...
strange ...
See if this helps
PunySquid88 RC1 is out with a fix:pip install trains-agent==0.14.2rc1
That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?
PompousBeetle71 the code is executed without arguments, in run-time trains / trains-agent will pass the arguments (as defined on the task) to the argparser. This means you that you get the ability to change them and also type checking π
PompousBeetle71 if you are not using argparser how do you parse the arguments from sys.argv? manually?
If that's the case, post parsing, you can connect a dictionary to the Task and you will have the desired behavior
` task.connect(dict_with_arguments...
Should work with report surface, notice that this is not triangles, assumption is this is a fixed sampling of the surface, sample size is the numpy array matrix and the sample value (i.e. Z ) is the value on the matrix. This means that if you have a set of mesh triangles , you have to projects and sample it.
I think this is what you are after https://trimsh.org/trimesh.voxel.base.html?highlight=matrix#trimesh.voxel.base.VoxelGrid.matrix
Hi @<1707565838988480512:profile|MeltedLizard16>
Maybe I'm missing something but gust add to your YOLO code :
from clearml import Dataset
my_files_folder = Dataset.get("dataset_id_here").get_local_copy()
what am I missing?
Hi @<1695969549783928832:profile|ObedientTurkey46>
Use --services-mode in the agent , it will run many Tasks on the same machine, this is usually associated with the services queue, but can be run on any queue. This way you could have the same machine easily running those multiple "control" tasks.
wdyt?
Please send the full log, I just tested it here, and it seems to be working
Try adding this environment variable:export TRAINS_CUDA_VERSION=0
ElegantCoyote26 can you browse to http://localhost:8080 on the machine that was running the trains-init ?
Hi Team,Can i clone experiment shared by some one, via link?
You mean someone that is not in your workspace ? (I'm assuming app.clear.ml ?)
if I want to run the experiment the first time without creating theΒ
template
?
You mean without manually executing it once ?
but I still need the laod ballancer ...
No you are good to go, as long as someone will register the pods IP automatically on a dns service (local/public) you can use the regsitered address instead of the IP itself (obviously with the port suffix)
Thanks for your support
With pleasure!
Hi @<1523703472304689152:profile|UpsetTurkey67>
You mean https://github.com/Lightning-AI/torchmetrics
?
Where are those stored?
I'm really for adding an interface, but I was not able to locate a simple integration option with basically anything, Wdyt ?
But pytorch has no specific backend, it uses TB.
No?! Can you point me to an example? What I mostly find is how to calc metrics not standard way to then store them...
Hi @<1557899668485050368:profile|FantasticSquid9>
There is some backwards compatibility issue with 1.2 (I think).
Basically what you need it to spin a new one on a new session ID and rergister the endpoints
When you have a bit of experience, please suggest a path forward, it will be great to integrate
Hi @<1663354518726774784:profile|CrookedSeal85>
I am trying to optimize storage on my ClearML file server when doing a lot of experiments.
This is not straight forward, you will need to get a list of all the events via
None
filter on image events
and then delete the the URL you are getting via the StorageManager.
But to be honest, why not just direct it to S3 or something like that ?
Hi SpotlessFish46 ,
Is the artifact already in S3 ?
Is the S3 configured as the default files_server in the trains.conf
?
You can always use the StorageManager upload to wherever and register the url on the artifacts.
You can also programmatically change the artifact destination server to S3, then upload the artifact as usual.
What would be the best natch for you?
Hi @<1639799308809146368:profile|TritePigeon86>
Sounds awesome, how can we help?