Reputation
Badges 1
25 × Eureka!Does adding external files not upload them ti the dataset output_uri?
@<1523704667563888640:profile|CooperativeOtter46> If you are adding the links with add_external_files
these files are Not re-uploaded
OutrageousGrasshopper93tensorflow-gpu
is not needed, it will convert tensorflow to tensorflow-gpu based on the detected cuda version (you can see it in the summary configuration when the experiment sins inside the docker)
How can i set the base python version for the newly created conda env?
You mean inside the docker ?
StorageManager is what you need, if you want to download/upload files to any server (this is a utility class the takes care of the DL/uL + adds caching) storage helper is used internally
I was thinking mainly about AWS.
Meaning S3?
hey, that worked! what library is being used that reads that configuration?
It's passed to boto3, but the pyhon interface and aws cli use different configuration, I guess, because otherwise it should have worked...
You can try direct API call for all the Tasks together:Task._query_tasks(task_ids=[IDS here], only_fields=['last_metrics'])
CharmingBeetle38 try adding "General/" before the arguments. This means batch_size becomes General/batch_size. This is only because we are accessing the parameters externally, when the task is executed it is resolved automatically
(as i see the services worker is only in the services-queue, and not my default queue (where my other servers/workers are)
So basically the service-mode is just a flag passed to the agent, and the services queue is the name of the queue it will pull from.
If i want a normal worker also
You can just add another section to the docker-compose, or run it manually after you spin the docker-compose.
LazyFox65 wdyt ?
Could you disable the windows anti-virus firewall and test?
it knows itβs a notebook and automatically adds the notebook as an artifact right?
correct
and the uncommited changes becomes the nottebook converted to a script?
correct
In one case I am seeing actual git diff coming in instead of the notebook.
it might be there is both a git repository and a notebook and the git diff will show before the notebook is detected and shown instead ? (there is a watchdog refreshing the notebook every 30sec or so)
Hi OutrageousGrasshopper93
When the Task is executed on a worker, the presence of spaces breaks the URLs and from the UI I cannot access to the resources on the bucket
You are saying the URLs generated in a remote execution are "broken" and on local execution are working, even though it is the same project/task name ?
Hi JitteryCoyote63
Could it be a python mismatch ? can you send the full log?
BTW: when I dopip3.8 install pytorch3d==
I get the following versions:pytorch3d== (from versions: 0.0.1, 0.1.1, 0.2.0, 0.2.5, 0.3.0)
I think you cannot change it for a running process, do you want me to check for you if this can be done ?
Thanks VivaciousPenguin66 !
BTW: if you are running the local code with conda, you can set the agent to use conda as well (notice that if you are running locally with pip, the agent's conda env will use pip to install the packages to avoid version mismatch)
Okay great, so we do have the Args section there.
What do you have in the "Execution" tab?
What's the trains-server version ?
Notice you have configure the shared driver for the docker, as the volume mount doesn't work without it. https://stackoverflow.com/a/61850413
SoreDragonfly16 could you test with Task.init using reuse_last_task_id=False
for example:task = Task.init('project', 'experiment', reuse_last_task_id=False)
The only thing that I can think of is running two experiments with the same project/name on the same machine, this will ensure every time you run the code, you create a new experiment.
This is odd because the screen grab point to CUDA 10.2 ...
sdk.storage.cache.size.cleanup_margin_percent
Hi ReassuredTiger98
This is actually future proofing the cache mechanism and allowing it be "smarter" i.e. clean based on cache folder size instead of cache folder entries, this is currently not available
sdk.storage.cache
Β parameters for the agent?
For both local execution and with an agent
When are datasets deleted if I run local execution?
When you hit the cache entry limit (100 if I recall). This can a...
Hmm reading this: None
How are you checking the health of the serving pod ?
Hmm maybe we should add a test once the download is done, comparing the expected file size and the actual file size, and if they are different we should redownload ?
copy paste the trains.conf from any machine, it just need the definition of the trains-server address.
Specifically if you run in offline mode, there is no need for the trains.conf and you can just copy the one on GitHub
No, I think it might be a glitch in the way the calculate the upload speed, nothing we can do π